| |
Last updated on May 1, 2024. This conference program is tentative and subject to change
Technical Program for Tuesday May 14, 2024
|
TuAA1-CC Award Session, CC-Main Hall |
Add to My Program |
Automation |
|
|
Chair: Wang, Michael Yu | Mywang@gbu.edu.cn |
Co-Chair: Nishi, Tatsushi | Okayama University |
|
10:30-12:00, Paper TuAA1-CC.1 | Add to My Program |
TinyMPC: Model-Predictive Control on Resource-Constrained Microcontrollers |
|
Nguyen, Khai | Carnegie Mellon University |
Schoedel, Samuel | Carnegie Mellon University |
Alavilli, Anoushka | Carnegie Mellon University |
Plancher, Brian | Barnard College, Columbia University |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Optimization and Optimal Control, Embedded Systems for Robotic and Automation
Abstract: Model-predictive control (MPC) is a powerful tool for controlling highly dynamic robotic systems subject to complex constraints. However, MPC is computationally demanding, and is often impractical to implement on small, resource-constrained robotic platforms. We present TinyMPC, a high-speed MPC solver with a low memory footprint targeting the microcontrollers common on small robots. Our approach is based on the alternating direction method of multipliers (ADMM) and leverages the structure of the MPC problem for efficiency. We demonstrate TinyMPC’s effectiveness by benchmarking against the state-of-the-art solver OSQP, achieving nearly an order of magnitude speed increase, as well as through hardware experiments on a 27 gram quadrotor, demonstrating high-speed trajectory tracking and dynamic obstacle avoidance.
|
|
10:30-12:00, Paper TuAA1-CC.2 | Add to My Program |
A Movable Microfluidic Chip with Gap Effect for Manipulation of Oocytes |
|
Liang, Shuzhang | The University of Tokyo |
Amaya, Satoshi | The University of Tokyo |
Sugiura, Hirotaka | The University of Tokyo |
Mo, Hao | The University of Tokyo |
Dai, Yuguo | The University of Tokyo |
Arai, Fumihito | The University of Tokyo |
Keywords: Biological Cell Manipulation, Mobile Manipulation, Automation at Micro-Nano Scales
Abstract: This study proposes a novel movable microfluidic chip in which a microfluidic chip is integrated into a robotic manipulator for manipulating oocytes. The microfluidic device has the ability to release a single cell with gap effect. The robotic manipulator can control the position of the microfluidic chip. The microfluidic chip with a pipette tip is directly fabricated using 3D printing. Xenopus oocyte was used in the experiment. When oocytes move from bottom side of the channel to the tip side, they generate gaps between each other. The gap distance can reach about 16 times the diameter of oocyte. In addition, a capacitive sensor was used to detect oocytes in the manipulation processes. The results showed that oocytes were successfully released one by one with no deformation in shape using the movable microfluidic chip. The method has significant advantages in biomedicine engineering and micro-nano-manipulation.
|
|
10:30-12:00, Paper TuAA1-CC.3 | Add to My Program |
Under Pressure: Learning-Based Analog Gauge Reading in the Wild |
|
Reitsma, Maurits | ETH Zurich |
Keller, Julian | ETH Zurich |
Blomqvist, Kenneth | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Keywords: Robotics in Hazardous Fields, Industrial Robots, Computer Vision for Automation
Abstract: We propose an interpretable framework for reading analog gauges that is deployable on real world robotic systems. Our framework splits the reading task into distinct steps, such that we can detect potential failures at each step. Our system needs no prior knowledge of the type of gauge or the range of the scale and is able to extract the units used. We show that our gauge reading algorithm is able to extract readings with a relative reading error of less than 2%.
|
|
10:30-12:00, Paper TuAA1-CC.4 | Add to My Program |
Efficient Composite Learning Robot Control under Partial Interval Excitation |
|
Shi, Tian | Sun Yat-Sen University |
Li, Weibing | Sun Yat-Sen University |
Yu, Haoyong | National University of Singapore |
Pan, Yongping | Sun Yat-Sen University |
Keywords: Robust/Adaptive Control
Abstract: Parameter convergence in adaptive control is crucial for improving the stability and robustness of robotic systems. Nevertheless, a stringent condition named persistent excitation (PE) needs to be satisfied to ensure parameter convergence in the conventional adaptive robot control. Composite learning robot control (CLRC) is an innovative methodology that guarantees parameter convergence under a condition of interval excitation (IE) that is strictly weaker than PE. This paper puts forward a time-division multi-channel (TDMC) CLRC strategy such that parameter convergence is achieved even without the IE condition. In the TDMC mechanism, a filtered regressor is integrated with multiple time intervals to generate a generalized prediction error for parameter update, such that excitation information of regressor channels at different instants is exploited more effectively and efficiently to achieve fast and accurate parameter estimation. Global exponential stability with parameter convergence of the closed-loop system is achieved under a partial IE condition that is much weaker than IE. Experiments on a collaborative robot with 7 degrees of freedom have demonstrated the superiority of the proposed approach in both parameter estimation and trajectory tracking compared to start-of-the-art approaches.
|
|
10:30-12:00, Paper TuAA1-CC.5 | Add to My Program |
MORALS: Analysis of High-Dimensional Robot Controllers Via Topological Tools in a Latent Space |
|
Vieira, Ewerton | Rutgers University |
Sivaramakrishnan, Aravind | Rutgers University |
Tangirala, Sumanth | Rutgers University, New Brunswick |
Granados, Edgar | Rutgers |
Mischaikow, Konstantin | Rutgers University |
Bekris, Kostas E. | Rutgers, the State University of New Jersey |
Keywords: Dynamics, Hybrid Logical/Dynamical Planning and Verification, Motion Control
Abstract: Estimating the region of attraction RoA for a robot controller is essential for safe application and controller composition. Many existing methods require a closed-form expression that limit applicability to data-driven controllers. Methods that operate only over trajectory rollouts tend to be data-hungry. In prior work, we have demonstrated that topological tools based on Morse Graphs (directed acyclic graphs that combinatorially represent the underlying nonlinear dynamics) offer data-efficient RoA estimation without needing an analytical model. They struggle, however, with high-dimensional systems as they operate over a state-space discretization. This paper presents Morse Graph-aided discovery of Regions of Attraction in a learned Latent Space (MORALS). The approach combines auto-encoding neural networks with Morse Graphs. MORALS shows promising predictive capabilities in estimating attractors and their RoAs for data-driven controllers operating over high-dimensional systems, including a 67-dim humanoid robot and a 96-dim 3-fingered manipulator. It first projects the dynamics of the controlled system into a learned latent space. Then, it constructs a reduced form of Morse Graphs representing the bistability of the underlying dynamics, i.e., detecting when the controller results in a desired versus an undesired behavior. The evaluation on high-dimensional robotic datasets indicates data efficiency in RoA estimation.
|
|
TuAA2-CC Award Session, CC-301 |
Add to My Program |
Cognitive Robotics |
|
|
Chair: Ogata, Tetsuya | Waseda University |
Co-Chair: Beetz, Michael | University of Bremen |
|
10:30-12:00, Paper TuAA2-CC.1 | Add to My Program |
Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-To-End |
|
Zhang, Chong | ETH Zurich |
Jin, Jin | Tongji University |
Frey, Jonas | ETH Zurich |
Rudin, Nikita | ETH Zurich, NVIDIA |
Mattamala, Matias | University of Oxford |
Cadena Lerma, Cesar | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Legged Robots, Sensorimotor Learning, Task and Motion Planning
Abstract: Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh.
|
|
10:30-12:00, Paper TuAA2-CC.2 | Add to My Program |
Vision-Language Frontier Maps for Zero-Shot Semantic Navigation |
|
Yokoyama, Naoki | Georgia Institute of Technology |
Ha, Sehoon | Georgia Institute of Technology |
Batra, Dhruv | Georgia Tech / Facebook AI Research |
Wang, Jiuguang | Boston Dynamics AI Institute |
Bucher, Bernadette | University of Michigan |
Keywords: Vision-Based Navigation, AI-Enabled Robotics, Semantic Scene Understanding
Abstract: Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM builds occupancy maps from depth observations to identify frontiers, and leverages RGB observations and a pre-trained vision-language model to generate a language-grounded value map. VLFM then uses this map to identify the most promising frontier to explore for finding an instance of a given target object category. We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator. Remarkably, VLFM achieves state-of-the-art results on all three datasets as measured by success weighted by path length (SPL) for the Object Goal Navigation task. Furthermore, we show that VLFM's zero-shot nature enables it to be readily deployed on real-world robots such as the Boston Dynamics Spot mobile manipulation platform. We deploy VLFM on Spot and demonstrate its capability to efficiently navigate to target objects within an office building in the real world, without any prior knowledge of the environment. The accomplishments of VLFM underscore the promising potential of vision-language models in advancing the field of semantic navigation. Videos of real world deployment can be viewed at naoki.io/vlfm.
|
|
10:30-12:00, Paper TuAA2-CC.3 | Add to My Program |
Learning Continuous Control with Geometric Regularity from Robot Intrinsic Symmetry |
|
Yan, Shengchao | University of Freiburg |
Zhang, Baohe | University of Freiburg |
Zhang, Yuan | University of Freiburg |
Boedecker, Joschka | University of Freiburg |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Deep Learning Methods
Abstract: Geometric regularity, which leverages data symmetry, has been successfully incorporated into deep learning architectures such as CNNs, RNNs, GNNs, and Transformers. While this concept has been widely applied in robotics to address the curse of dimensionality when learning from high-dimensional data, the inherent reflectional and rotational symmetry of robot structures has not been adequately explored. Drawing inspiration from cooperative multi-agent reinforcement learning, we introduce novel network structures for single-agent control learning that explicitly capture these symmetries. Moreover, we investigate the relationship between the geometric prior and the concept of Parameter Sharing in multi-agent reinforcement learning. Last but not the least, we implement the proposed framework in online and offline learning methods to demonstrate its ease of use. Through experiments conducted on various challenging continuous control tasks on simulators and real robots, we highlight the significant potential of the proposed geometric regularity in enhancing robot learning capabilities.
|
|
10:30-12:00, Paper TuAA2-CC.4 | Add to My Program |
Learning Vision-Based Bipedal Locomotion for Challenging Terrain |
|
Duan, Helei | Oregon State University |
Pandit, Bikram | Oregon State University |
Gadde, Mohitvishnu S. | Oregon State University |
van Marum, Bart Jaap | Oregon State University |
Dao, Jeremy | Oregon State University |
Kim, Chanho | Oregon State University |
Fern, Alan | Oregon State University |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Reinforcement Learning
Abstract: Reinforcement learning (RL) for bipedal locomotion has recently demonstrated robust gaits over moderate terrains using only proprioceptive sensing. However, such blind controllers will fail in environments where robots must anticipate and adapt to local terrain, which requires visual perception. In this paper, we propose a fully-learned system that allows bipedal robots to react to local terrain while maintaining commanded travel speed and direction. Our approach first trains a controller in simulation using a heightmap expressed in the robot's local frame. Next, data is collected in simulation to train a heightmap predictor, whose input is the history of depth images and robot states. We demonstrate that with appropriate domain randomization, this approach allows for successful sim-to-real transfer with no explicit pose estimation and no fine-tuning using real-world data. To the best of our knowledge, this is the first example of sim-to-real learning for vision-based bipedal locomotion over challenging terrains.
|
|
10:30-12:00, Paper TuAA2-CC.5 | Add to My Program |
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration |
|
Sridhar, Ajay | University of California, Berkeley |
Shah, Dhruv | University of California, Berkeley |
Glossop, Catherine | University of California, Berkeley |
Levine, Sergey | UC Berkeley |
Keywords: Deep Learning Methods, Vision-Based Navigation
Abstract: Robotic learning for navigation in unfamiliar environments needs to provide policies for both task-oriented navigation (i.e., reaching a goal that the robot has located), and task-agnostic exploration (i.e., searching for a goal in a novel setting). Typically, these roles are handled by separate models, for example by using subgoal proposals, planning, or separate navigation strategies. In this paper, we describe how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration, with the latter providing the ability to search novel environments, and the former providing the ability to reach a user-specified goal once it has been located. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments, as compared to approaches that use subgoal proposals from generative models, or prior methods based on latent variable models. We instantiate our method by using a large-scale Transformer- based policy trained on data from multiple ground robots, with a diffusion model decoder to flexibly handle both goal- conditioned and goal-agnostic navigation. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods, and demonstrate significant improvements in performance and lower collision rates, despite utilizing smaller models than state-of-the-art approaches.
|
|
TuAT1-CC Oral Session, CC-303 |
Add to My Program |
Planning under Uncertainty I |
|
|
Chair: Scherer, Sebastian | Carnegie Mellon University |
Co-Chair: Indelman, Vadim | Technion - Israel Institute of Technology |
|
10:30-12:00, Paper TuAT1-CC.1 | Add to My Program |
MUI-TARE: Cooperative Multi-Agent Exploration with Unknown Initial Position |
|
Yan, Jingtian | Carnegie Mellon University |
XingQiao, Lin | Carnegie Mellon University |
Ren, Zhongqiang | Shanghai Jiao Tong University |
Zhao, Shiqi | University of California San Diego |
Yu, Jieqiong | Carnegie Mellon University |
Cao, Chao | Carnegie Mellon University |
Yin, Peng | City University of Hong Kong |
Zhang, Ji | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Planning under Uncertainty, Integrated Planning and Learning, Multi-Robot SLAM
Abstract: Multi-agent exploration of a bounded 3D environment with unknown initial positions of agents is a challenging problem. It requires quickly exploring the environments and robustly merging the sub-maps built by the agents. Most of the current exploration strategies directly merge two sub-maps built by different agents when a single frame of overlap is detected, which can lead to incorrect merging due to the false-positive detection of the overlap and is thus not robust. Meanwhile, some state-of-the-art place recognition methods use sequence matching for more robust data association. However, naively applying these sequence-based matching methods to multi-agent exploration may require one agent to repeat a large amount of another agent's history trajectory so that a sequence of matched observation can be established, which reduces the overall exploration time efficiency. To intelligently balance the robustness of sub-map merging and exploration efficiency, we develop a new approach for lidar-based multi-agent exploration, which can direct one agent to repeat another agent's trajectory adaptively based on the quality indicator of the sub-map merging process. Additionally, our approach extends the recent single-agent hierarchical exploration strategy to multiple agents cooperatively by planning for agents with merged sub-maps to further improve exploration efficiency. Our experiments show that our approach is up to 50% more efficient than the baselines on average while merging
|
|
10:30-12:00, Paper TuAT1-CC.2 | Add to My Program |
Safe Planning in Dynamic Environments Using Conformal Prediction |
|
Lindemann, Lars | University of Southern California |
Cleaveland, Matthew | University of Pennsylvania |
Shim, Gihyun | University of Pennsylvania |
Pappas, George J. | University of Pennsylvania |
Keywords: Planning under Uncertainty, Constrained Motion Planning, Collision Avoidance
Abstract: We propose a framework for planning in unknown dynamic environments with probabilistic safety guarantees using conformal prediction. Particularly, we design a model predictive controller (MPC) that uses i) trajectory predictions of the dynamic environment, and ii) prediction regions quantifying the uncertainty of the predictions. To obtain prediction regions, we use conformal prediction, a statistical tool for uncertainty quantification, that requires availability of offline trajectory data - a reasonable assumption in many applications such as autonomous driving. The prediction regions are valid, i.e., they hold with a user-defined probability, so that the MPC is provably safe. We illustrate the results in the self-driving car simulator CARLA at a pedestrian-filled intersection. The strength of our approach is compatibility with state of the art trajectory predictors, e.g., RNNs and LSTMs, while making no assumptions on the underlying trajectory-generating distribution. To the best of our knowledge, these are the first results that provide valid safety guarantees in such a setting.
|
|
10:30-12:00, Paper TuAT1-CC.3 | Add to My Program |
Wasserstein Distributionally Robust Chance Constrained Trajectory Optimization for Mobile Robots within Uncertain Safe Corridor |
|
Xu, Shaohang | Huazhong University of Science and Technology |
Ruan, Haolin | City University of Hong Kong |
Zhang, Wentao | Huazhong University of Science and Technology |
Wang, Yi'an | Huazhong University of Science and Technology |
Zhu, Lijun | Huazhong University of Science and Technology |
Ho, Chin Pang | City University of Hong Kong |
Keywords: Planning under Uncertainty, Optimization and Optimal Control, Robot Safety
Abstract: Safe corridor-based Trajectory Optimization (TO) presents an appealing approach for collision-free path planning of autonomous robots, offering global optimality through its convex formulation. The safe corridor is constructed based on the perceived map, however, the non-ideal perception induces uncertainty, which is rarely considered in trajectory generation. In this paper, we propose Distributionally Robust Safe Corridor Constraints (DRSCCs) to consider the uncertainty of the safe corridor. Then, we integrate DRSCCs into the trajectory optimization framework using Bernstein basis polynomials. Theoretically, we rigorously prove that the trajectory optimization problem incorporating DRSCCs is equivalent to a computationally efficient, convex quadratic program. Compared to the nominal TO, our method enhances navigation safety by significantly reducing the infeasible motions in presence of uncertainty. Moreover, the proposed approach is validated through two robotic applications, a micro Unmanned Aerial Vehicle (UAV) and a quadruped robot Unitree A1.
|
|
10:30-12:00, Paper TuAT1-CC.4 | Add to My Program |
Shield Model Predictive Path Integral: A Computationally Efficient Robust MPC Method Using Control Barrier Functions |
|
Yin, Ji | Georgia Institute of Technology |
Dawson, Charles | MIT |
Fan, Chuchu | Massachusetts Institute of Technology |
Tsiotras, Panagiotis | Georgia Tech |
Keywords: Planning under Uncertainty, Collision Avoidance, Constrained Motion Planning
Abstract: Model Predictive Path Integral (MPPI) control is a type of sampling-based model predictive control that simulates thousands of trajectories and uses these trajectories to synthesize optimal controls on-the-fly. In practice, however, MPPI encounters problems limiting its application. For instance, it has been observed that MPPI tends to make poor decisions if unmodeled dynamics or environmental disturbances exist, preventing its use in safety-critical applications. Moreover, the multi-threaded simulations used by MPPI require significant onboard computational resources, making the algorithm inaccessible to robots without modern GPUs. To alleviate these issues, we propose a novel (Shield-MPPI) algorithm that provides robustness against unpredicted disturbances and achieves real-time planning using a much smaller number of parallel simulations on regular CPUs. The novel Shield-MPPI algorithm is tested on an aggressive autonomous racing platform both in simulation and in hardware. The results show that the proposed controller greatly reduces the number of constraint violations compared to state-of-the-art robust MPPI variants and stochastic MPC methods.
|
|
10:30-12:00, Paper TuAT1-CC.5 | Add to My Program |
Distributionally Robust CVaR-Based Safety Filtering for Motion Planning in Uncertain Environments |
|
Safaoui, Sleiman | The University of Texas at Dallas |
Summers, Tyler | University of Texas at Dallas |
Keywords: Planning under Uncertainty, Robot Safety, Collision Avoidance
Abstract: Safety is a core challenge of autonomous robot motion planning, especially in the presence of dynamic and uncertain obstacles. Many recent results use learning and deep learning-based motion planners and prediction modules to predict multiple possible obstacle trajectories and generate obstacle-aware ego robot plans. However, planners that ignore the inherent uncertainties in such predictions incur collision risks and lack formal safety guarantees. In this paper, we present a computationally efficient safety filtering solution to reduce the collision risk of ego robot motion plans using multiple samples of obstacle trajectory predictions. The proposed approach reformulates the collision avoidance problem by computing safe halfspaces based on obstacle sample trajectories using distributionally robust optimization (DRO) techniques. The safe halfspaces are used in a model predictive control (MPC)-like safety filter to apply corrections to the reference ego trajectory thereby promoting safer planning. The efficacy and computational efficiency of our approach are demonstrated through numerical simulations.
|
|
10:30-12:00, Paper TuAT1-CC.6 | Add to My Program |
Monte Carlo Planning in Hybrid Belief POMDPs |
|
Barenboim, Moran | Technion - Israel Institute of Technology |
Shienman, Moshe | Israel Institute of Technology |
Indelman, Vadim | Technion - Israel Institute of Technology |
Keywords: Planning under Uncertainty, Autonomous Agents
Abstract: Real-world problems often require reasoning about hybrid beliefs, over both discrete and continuous random variables. Yet, such a setting has hardly been investigated in the context of planning. Moreover, existing online Partially Observable Markov Decision Processes (POMDPs) solvers do not support hybrid beliefs directly. In particular, these solvers do not address the added computational burden due to an increasing number of hypotheses with the planning horizon, which can grow exponentially. As part of this work, we present a novel algorithm, Hybrid Belief Monte Carlo Planning (HB-MCP) that utilizes the Monte Carlo Tree Search (MCTS) algorithm to solve a POMDP while maintaining a hybrid belief. We illustrate how the upper confidence bound (UCB) exploration bonus can be leveraged to guide the growth of hypotheses trees alongside the belief trees. We then evaluate our approach in highly aliased simulated environments where unresolved data association leads to multi-modal belief hypotheses.
|
|
10:30-12:00, Paper TuAT1-CC.7 | Add to My Program |
Data Association Aware POMDP Planning with Hypothesis Pruning Performance Guarantees |
|
Barenboim, Moran | Technion - Israel Institute of Technology |
Lev Yehudi, Idan | Technion - Israel Institute of Technology |
Indelman, Vadim | Technion - Israel Institute of Technology |
Keywords: Planning under Uncertainty, Autonomous Agents
Abstract: Autonomous agents that operate in the real world must often deal with partial observability, which is commonly modeled as partially observable Markov decision processes (POMDPs). However, traditional POMDP models rely on the assumption of complete knowledge of the observation source, known as fully observable data association. To address this limitation, we propose a planning algorithm that maintains multiple data association hypotheses, represented as a belief mixture, where each component corresponds to a different data association hypothesis. However, this method can lead to an exponential growth in the number of hypotheses, resulting in significant computational overhead. To overcome this challenge, we introduce a pruning-based approach for planning with ambiguous data associations. Our key contribution is to derive bounds between the value function based on the complete set of hypotheses and the value function based on a pruned subset of the hypotheses, enabling us to establish a trade-off between computational efficiency and performance. We demonstrate how these bounds can both be used to certify any pruning heuristic in retrospect and propose a novel approach to determine which hypotheses to prune in order to ensure a predefined limit on the loss. We evaluate our approach in simulated environments and demonstrate its efficacy in handling multi-modal belief hypotheses with ambiguous data associations.
|
|
10:30-12:00, Paper TuAT1-CC.8 | Add to My Program |
Safe POMDP Online Planning Via Shielding |
|
Sheng, Shili | University of Virginia |
Parker, David | University of Oxford |
Feng, Lu | University of Virginia |
Keywords: Planning under Uncertainty, Formal Methods in Robotics and Automation
Abstract: Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning.
|
|
10:30-12:00, Paper TuAT1-CC.9 | Add to My Program |
Generating Sparse Probabilistic Graphs for Efficient Planning in Uncertain Environments |
|
Veys, Yasmin | Massachusetts Institute of Technology |
Stadler, Martina | Massachusetts Institute of Technology |
Roy, Nicholas | Massachusetts Institute of Technology |
Keywords: Planning under Uncertainty, Motion and Path Planning
Abstract: Environments with regions of uncertain traversability can be modeled as roadmaps with probabilistic edges for efficient planning under uncertainty. We would like to generate roadmaps that enable planners to efficiently find paths with expected low costs through uncertain environments. The roadmap must be sparse so that the planning problem is tractable, but still contain edges that are likely to contribute to low-cost plans under various realizations of the environmental uncertainty. Determining the optimal set of edges to add to the roadmap without considering an exponential number of traversability scenarios is challenging. We propose the use of a heuristic that bounds the ratio between the expected path cost in our graph and the expected path cost in an optimal graph to determine whether a given edge should be added to the roadmap. We test our approach in several environments, demonstrating that our uncertainty-aware roadmaps effectively trade off between plan quality and planning efficiency for uncertainty-aware agents navigating in the graph.
|
|
TuAT2-CC Oral Session, CC-311 |
Add to My Program |
Mechanism Design I |
|
|
Chair: Tadakuma, Riichiro | Yamagata University |
Co-Chair: Jeong, Seokhwan | Mechanical Eng., Sogang University |
|
10:30-12:00, Paper TuAT2-CC.1 | Add to My Program |
Magnetic Gear-Based Actuator: A Framework of Design, Optimization, and Disturbance Observer-Based Torque Control |
|
Song, Hangyeol | Sogang University |
Lee, Edgar | Sogang University |
Seo, Hyung-Tae | Kyonggi University |
Jeong, Seokhwan | Mechanical Eng., Sogang University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Force Control
Abstract: This letter presents a design framework and novel control strategy for a compact coaxial magnetic-gear-based actuation module suitable for small-to-mid-sized mechanical and robotic applications. The proposed actuation module adopts a non-contact magnetic coupling mechanism to transmit rotational power with a predetermined gear ratio, in contrast to traditional mechanical gear-based transmissions. This approach offers several advantages such as enhanced backdrivability, hardware safety, and transparency when compared to conventional contact-based transmissions. Furthermore, the magnetic coupling effect provides a spring-like characteristic that can be utilized to implement a series elastic actuation enabling sensorless torque control. The design of the magnetic gear was optimized using a differential evolution method, and a dynamic model was formulated to specify its dynamic characteristics. Finally, a composite disturbance observer-based torque control algorithm was developed, which capitalizes on the features of the magnetic spring. The proposed control algorithm was validated through several experiments.
|
|
10:30-12:00, Paper TuAT2-CC.2 | Add to My Program |
Johnsen-Rahbek Capstan Clutch: A High Torque Electrostatic Clutch |
|
Amish, Timothy | University of Washington |
Auletta, Jeffrey | US DEVCOM Army Research Laboratory |
Kessens, Chad C. | United States Army Research Laboratory |
Smith, Joshua R. | University of Washington |
Lipton, Jeffrey | Northeastern University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Underactuated Robots
Abstract: In many robotic systems, the holding state con- sumes power, limits operating time, and increases operating costs. Electrostatic clutches have the potential to improve robotic performance by generating holding torques with low power consumption. A key limitation of electrostatic clutches has been their low specific shear stresses which restrict gen- erated holding torque, limiting many applications. Here we show how combining the Johnsen-Rahbek (JR) effect with the exponential tension scaling capstan effect can produce clutches with the highest specific shear stress in the literature. Our system generated 31.3 N/cm2 sheer stress and a total holding torque of 7.1 N·m while consuming only 2.5 mW/cm2 at 500 V. We demonstrate a theoretical model of an electrostatic adhesive capstan clutch and demonstrate how large angle (θ > 2π) designs increase efficiency over planar or small angle (θ < π) clutch designs. We also report the first unfilled polymeric material, polybenzimidazole (PBI), to exhibit the JR-effect
|
|
10:30-12:00, Paper TuAT2-CC.3 | Add to My Program |
Research on Bionic Foldable Wing for Flapping Wing Micro Air Vehicle |
|
Xiao, Shengjie | Beijing University of Aeronautics and Astronautics |
Hu, Kai | Beihang University |
Sun, Yuhong | Beihang University |
Wang, Yun | Beihang University |
Qin, Bo | Beihang University |
Deng, Huichao | Beihang University |
Wu, Xuan | China Nanhu Academy of Electronics and Information Technology |
Ding, Xilun | Beijing Univerisity of Aeronautics and Astronautics |
Keywords: Mechanism Design, Biomimetics, Biologically-Inspired Robots
Abstract: This paper presents a bionic foldable wing that imitates the hind wing of ladybirds. Based on the folding mechanism of the hind wing of ladybirds and the theory of origami, the motion model of the bionic foldable wing is established, yield the motion law of the crease angles and the variation relationship between the panels are obtained. Bionic foldable wings utilise shape memory alloy to drive wings to fold, and embedded torsion springs to release energy to realize the function of wing unfolding. In the experiments of the vehicle equipped with foldable wings, the lift and attitude torque of bionic foldable wings are measured by the F/T sensor. The experimental results indicated that its aerodynamic performance is basically close to that of our optimized non-foldable wings. Moreover, the vehicle with foldable wings has been able to overcome gravity to achieve flight, which provides a novel concept for the research on flapping wing.
|
|
10:30-12:00, Paper TuAT2-CC.4 | Add to My Program |
Magnetic Field-Driven Bristle-Bots |
|
Supik, Lukáš | Czech Technical University in Prague |
Stránská, Kateřina | Czech Institute of Informatics, Robotics and Cybernetics Czech T |
Kulich, Miroslav | Czech Technical University in Prague |
Preucil, Libor | Czech Technical University in Prague, CIIRC |
Somr, Michael | Czech Technical University in Prague, Faculty of Civil Engineeri |
Kosnar, Karel | Czech Technical University in Prague |
Keywords: Mechanism Design, Calibration and Identification, Kinematics
Abstract: Widespread applications for mobile robots are creating a large demand for developing new driving mechanisms that can handle diverse environments. Bristle-bot-like robot designs, mainly studied over the past decade, are based on vibration mechanisms built on flexible legs that enable motion on the ground. However, creating scalable and steerable bristle-bots remains a challenge. Here, we focus on developing a new kind of magnetically-driven bristle-bots with a wireless control and power supply that can be steered and downscaled. In experiments, we verified our concept with 3D-printed bristle-bot units equipped with body-embedded permanent magnets actuated via torque imposed by an external magnetic field. An AC-powered Helmholtz coil generated the bristle-bot’s driving field, providing 2D input control, field amplitude, and frequency. A variable number of legs on each side of a bristle-bot’s body was used to ensure that each side of the bristle-bot's body has a different frequency response. This asymmetry introduces steerability by a rich set of control commands: rotations with simultaneous forward and backward locomotion. We also observed and controlled a new side locomotion phenomenon not yet described in previous studies. The results presented were supported with data from numerous experiments and thorough statistical analysis, which indicated promising directions for future developments.
|
|
10:30-12:00, Paper TuAT2-CC.5 | Add to My Program |
A Scalable Monolithic 3D Printable Variable Stiffness Mechanism |
|
Baisamy, Paul | The University of Edinburgh |
Stokes, Adam Andrew | University of Edinburgh |
Giorgio-Serchi, Francesco | University of Edinburgh |
Keywords: Mechanism Design, Compliance and Impedance Control, Additive Manufacturing
Abstract: Variable Stiffness Mechanisms (VSM) are becoming ubiquitous in mechatronics given the benefit they provide in terms of safety and performance. Despite these assets, VSMs remain fairly complex mechanical devices lacking in compactness, ease of manufacturing and accessibility. In addition, the scarcity of commercially available VSMs requires that such systems are mostly designed in-house. We propose a new type of VSM that improves on the pre-existing Jack Spring concept by making it more compact and robust. The new concept, which we refer to as the Compact Modifier of Active Coils (C-MAC) mechanism, is specifically designed to be manufactured through a monolithic 3D print. This approach enables to modify a minimal set of design features, namely the spring diameter and the coil diameter, to achieve the desired range of stiffness variation. We test the proposed design on six configurations; these show hysteretic energy losses no larger than 35% over the stiffness variation and confirm stiffness to scale according to theory. Stiffness ranging from 0.15 N/mm to 1.02N /mm were measured for an overall device length of 140 mm, including a maximal stroke length of 22 mm. The results confirm excellent scalability and manufacturability of the proposed design, providing a versatile mechanism for fast prototyping.
|
|
10:30-12:00, Paper TuAT2-CC.6 | Add to My Program |
Modular Growing Mechanism with Multi-Axis Deformation |
|
Du, Dongdong | Zhejiang University |
Del Dottore, Emanuela | Istituto Italiano Di Tecnologia |
Mondini, Alessio | Istituito Italiano Di Tecnologia |
Sinibaldi, Edoardo | Istituto Italiano Di Tecnologia |
Mazzolai, Barbara | Istituto Italiano Di Tecnologia |
Keywords: Mechanism Design, Compliant Joints and Mechanisms, Biologically-Inspired Robots
Abstract: Plant cells expand and elongate. Their cumulative actuation defines organ morphing. Inspired by this modular transformability, this study proposes a modular concept for growing robots that will be able to grow by adding at their tip Transformable Modules (TMs). We provide a two-module implementation to evaluate the concept viability. We designed and characterized Shape-Retention Bellows (SRBs) that constitute the TM and are used to maintain the shape once the extension force is relaxed. We demonstrate module radial expansion and axial elongation in a straight and bent configuration (up to ∼4◦). This is the first concept of growing robots to enact the robot’s modularity and Transferability for future deployment in distributed growing systems capable of acting in various scenarios.
|
|
10:30-12:00, Paper TuAT2-CC.7 | Add to My Program |
Design and Experimental Characterisation of a Novel Quasi-Direct Drive Actuator for Highly Dynamic Robotic Applications |
|
Perez Diaz, Carlos Adrian | Arquimea Research Center |
Muñoz Planelles, Ignacio | Arquimea Research Center |
Martin Hernandez, Luis Daniel | Arquimea Research Center |
Candelo Zuluaga, Carlos Andres | Arquimea Research Center |
Torres-Rodríguez, Iván Jesús | Institut De Robòtica I Informàtica Industrial, CSIC-UPC |
Marsa, Jordi | ARC |
Sanz-Merodio, Daniel | Arquimea Research Center |
López Estévez, Miguel | Arquimea Research Center |
Keywords: Mechanism Design, Dynamics, Actuation and Joint Mechanisms
Abstract: This paper presents the design and experimen- tal results of a proprioceptive, high-bandwidth quasi-direct drive (QDD) actuator for highly dynamic robotic applications. A comprehensive review of the mechanical design of the PULSE115-60 actuator is presented, with particular focus on the design parameters affecting the dynamic performance of the actuator and a full specification is provided. Fundamental parameters to describe the dynamic behaviour of an actuator are discussed, and an experimental method to determine speed and torque bandwidth of the actuator is presented. A rigorous method to determine backdrive torque is also explained. Finally, experimental results quantifying the dynamic performance of the PULSE115-60 actuator are discussed. The PULSE115-60 actuator has a highly dynamic response, surpassing the torque bandwidth at low torque amplitudes showcased in state-of- the-art literature. The differences between current and torque bandwidth, two concepts often conflated in literature, are elucidated. Experimental procedures detailed in previous work are discussed and a novel standardised procedure is proposed for robust characterisation and fair comparison of different actuation systems. Finally, performance results for PULSE115- 60 are presented, demonstrating a torque bandwidth of 66.3 Hz at an amplitude of 6 N·m, ±0.11◦ of backlash and 0.37 N·m of backdrive torque.
|
|
10:30-12:00, Paper TuAT2-CC.8 | Add to My Program |
Design and Evaluation of a Reconfigurable 7-DOF Upper Limb Rehabilitation Exoskeleton with Gravity Compensation |
|
Zheng, Linliang | Nanjing University of Aeronautics and Astronautics |
Wu, Qingcong | Nanjing University of Aeronautics and Astronautics |
Zhu, Yanghui | Nanjing University of Aeronautics and Astronautics |
Zhang, Qiang | Nanjing University of Aeronautics and Astronautics |
Keywords: Mechanism Design, Engineering for Robotic Systems, Kinematics
Abstract: With the development of society, aging population and the number of stroke patients is increasing year by year. Rehabilitation exoskeleton can help patients to carry out rehabilitation training and improve their activities of daily living (ADL). First of all, a reconfigurable exoskeleton for upper limb rehabilitation is designed in this paper. The exoskeleton combines gravity compensation with left-right arm switching function through its reconfigurability. Secondly, the motion space and singular configuration of the exoskeleton are analyzed. By changing the working mode of the gravity compensation device, the control experiment of the motor is carried out. The influence of gravity compensation device on motor driving torque and energy consumption is analyzed. Finally, the results of experiment show that, in the best case, the gravity compensation device can reduce the energy consumption by 41.15% and the maximum motor current by 33.56% of the driving element.
|
|
10:30-12:00, Paper TuAT2-CC.9 | Add to My Program |
Flexible Omnidirectional Driving Gear Mechanism with Adaptation Over Arbitrary Curvatures |
|
Selvamuthu, Moses Gladson | Yamagata University |
Abe, Kazuki | Tohoku University |
Tadakuma, Kenjiro | Tohoku University |
Tadakuma, Riichiro | Yamagata University |
Keywords: Mechanism Design, Flexible Robotics, Haptics and Haptic Interfaces
Abstract: A support structure for flexible displays such as OLED or flexible LEDs was developed using the flexible omnidirectional driving gear mechanism. It is a gear mechanism having two degrees of freedom on one surface. This flexible display mechanism is expected to be placed inside a car dashboard as a human interface and for workspace optimization. In this study, we propose a novel flexible omnidirectional driving gear for supporting flexible displays discussing its design, motion range, repeatability, positional accuracy, and adaptability to any guide surface through magnetic coupling. The experiments showed satisfactory results for positional accuracy and repeatability with adaptability over a wide range of curvatures.
|
|
TuAT3-CC Oral Session, CC-313 |
Add to My Program |
Formal Methods in Robotics and Automation I |
|
|
Chair: Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Co-Chair: Vasile, Cristian Ioan | Lehigh University |
|
10:30-12:00, Paper TuAT3-CC.1 | Add to My Program |
Tactile Robot Programming: Transferring Task Constraints into Constraint-Based Unified Force-Impedance Control |
|
Karacan, Kübra | Technical University of Munich |
Kirschner, Robin Jeanne | TU Munich, Institute for Robotics and Systems Intelligence |
Sadeghian, Hamid | Technical University of Munich |
Wu, Fan | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Formal Methods in Robotics and Automation, Force Control, Disassembly
Abstract: Flexible manufacturing lines are required to meet the demand for customized and small batch-size products. Even though state-of-the-art tactile robots may provide the versatility for increased adaptability and flexibility, their potential is yet to be fully exploited. To support robotics deployment in manufacturing, we propose a task-based tactile robot programming paradigm that uses an object-centric tactile skill definition that directly links identified object constraints of the task to the definition of constraint-based unified force-impedance control. In this study, we first explain the basic concept of abstracting the task constraints experienced by the object and transferring them to the robot's operational space frame. Second, using the object-centric tactile skill definition, we synthesize unified force-impedance control and formalized holonomic constraints to enable flexible task execution. Later, we propose the quantified analysis metrics for the process by analyzing them as a typical example of flexible manipulation disassembly skills, e.g., levering and unscrew-driving regarding their object requirements. Supported by realistic experimental evaluation using a Franka Emika robot, our tactile robot programming approach for the direct translation between task-level constraints and robot control parameter design is shown to be a viable solution for increased robotic deployment in flexible manufacturing lines.
|
|
10:30-12:00, Paper TuAT3-CC.2 | Add to My Program |
Online Modifications for Event-Based Signal Temporal Logic Specifications |
|
Gundana, David | Cornell University |
Kress-Gazit, Hadas | Cornell University |
Keywords: Formal Methods in Robotics and Automation, Hybrid Logical/Dynamical Planning and Verification
Abstract: In this paper we present a grammar and control synthesis framework for online modification of Event-based Signal Temporal Logic (STL) specifications, during execution. These modifications allow a user to change the robots' task in response to potential future violations, changes to the environment, or user-defined task design changes. In cases where a modification is not possible, we provide feedback to the user and suggest alternative modifications. We demonstrate our task modification process using a Hello Robot Stretch satisfying an Event-based STL specification.
|
|
10:30-12:00, Paper TuAT3-CC.3 | Add to My Program |
Sampling-Based Reactive Synthesis for Nondeterministic Hybrid Systems |
|
Ho, Qi Heng | University of Colorado Boulder |
Sunberg, Zachary | University of Colorado |
Lahijanian, Morteza | University of Colorado Boulder |
Keywords: Formal Methods in Robotics and Automation, Hybrid Logical/Dynamical Planning and Verification, Motion and Path Planning
Abstract: This paper introduces a sampling-based strategy synthesis algorithm for nondeterministic hybrid systems with complex continuous dynamics under temporal and reachability constraints. We model the evolution of the hybrid system as a two-player game, where the nondeterminism is an adversarial player whose objective is to prevent achieving temporal and reachability goals. The aim is to synthesize a winning strategy -- a reactive (robust) strategy that guarantees the satisfaction of the goals under all possible moves of the adversarial player. The approach is based on growing a (search) game-tree in the hybrid space by combining a sampling-based planning method with a novel bandit-based technique to select and improve on partial strategies. We provide conditions under which the algorithm is probabilistically complete, i.e., if a winning strategy exists, the algorithm will almost surely find it. The case studies and benchmark results show that the algorithm is general and consistently outperforms the state of the art.
|
|
10:30-12:00, Paper TuAT3-CC.4 | Add to My Program |
Safety Verification of Closed-Loop Control System with Anytime Perception |
|
Gupta, Lipsy | Kansas State University |
Choton, Jahid Chowdhury | Kansas State University |
Prabhakar, Pavithra | Kansas State University |
Keywords: Formal Methods in Robotics and Automation, Hybrid Logical/Dynamical Planning and Verification, Robot Safety
Abstract: In this paper, we consider the problem of safety analysis of a closed-loop control system with anytime perception sensor. We formalize the framework and present a general procedure for safety analysis using reachable set computation. We instantiate the procedure for two concrete classes, namely, the classical discrete-time linear system with linear state feedback controller and an extension with variable update rates. We present an exact computational method based on polyhedral manipulations for the first class and an over-approximate method for the second class. Our experimental results demonstrate the feasibility of the approach.
|
|
10:30-12:00, Paper TuAT3-CC.5 | Add to My Program |
Model Predictive Robustness of Signal Temporal Logic Predicates |
|
Lin, Yuanfei | Technical University of Munich |
Li, Haoxuan | Technical University of Munich |
Althoff, Matthias | Technische Universität München |
Keywords: Formal Methods in Robotics and Automation, Integrated Planning and Learning, Model Learning for Control
Abstract: The robustness of signal temporal logic not only assesses whether a signal adheres to a specification but also provides a measure of how much a formula is fulfilled or violated. The calculation of robustness is based on evaluating the robustness of underlying predicates. However, the robustness of predicates is usually defined in a model-free way, i.e., without including the system dynamics. Moreover, it is often nontrivial to define the robustness of complicated predicates precisely. To address these issues, we propose a notion of model predictive robustness, which provides a more systematic way of evaluating robustness compared to previous approaches by considering model-based predictions. In particular, we use Gaussian process regression to learn the robustness based on precomputed predictions so that robustness values can be efficiently computed online. We evaluate our approach for the use case of autonomous driving with predicates used in formalized traffic rules on a recorded dataset, which highlights the advantage of our approach compared to traditional approaches in terms of expressiveness. By incorporating our robustness definitions into a trajectory planner, autonomous vehicles obey traffic rules more robustly than human drivers in the dataset.
|
|
10:30-12:00, Paper TuAT3-CC.6 | Add to My Program |
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning |
|
Jaquier, Noémie | Karlsruhe Institute of Technology |
Rozo, Leonel | Bosch Center for Artificial Intelligence |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Formal Methods in Robotics and Automation, Learning from Demonstration, Probability and Statistical Methods
Abstract: In the realm of robotics, numerous downstream robotics tasks leverage machine learning methods for processing, modeling, or synthesizing data. Often, this data comprises variables that inherently carry geometric constraints, such as the unit-norm condition of quaternions representing rigid-body orientations or the positive definiteness of stiffness and manipulability ellipsoids. Handling such geometric constraints effectively requires the incorporation of tools from differential geometry into the formulation of machine learning methods. In this context, Riemannian manifolds emerge as a powerful mathematical framework to handle such geometric constraints. Nevertheless, their recent adoption in robot learning has been largely characterized by a mathematically-flawed simplification, hereinafter referred to as the "single tangent space fallacy". This approach involves merely projecting the data of interest onto a single tangent (Euclidean) space, over which an off-the-shelf learning algorithm is applied. This paper provides a theoretical elucidation of various misconceptions surrounding this approach and offers experimental evidence of its shortcomings. Finally, it presents valuable insights to promote best practices when employing Riemannian geometry within robot learning applications.
|
|
10:30-12:00, Paper TuAT3-CC.7 | Add to My Program |
Optimal Control Synthesis with Relaxed Global Temporal Logic Specifications for Homogeneous Multi-Robot Teams |
|
Kamale, Disha | Lehigh University |
Vasile, Cristian Ioan | Lehigh University |
Keywords: Formal Methods in Robotics and Automation, Multi-Robot Systems
Abstract: In this work, we address the problem of control synthesis for a homogeneous team of robots given a global temporal logic specification and formal user preferences for relaxation in case of infeasibility. The relaxation preferences are represented as a Weighted Finite-state Edit System and are used to compute a relaxed specification automaton that captures all allowable relaxations of the mission specification and their costs. For synthesis, we introduce a Mixed Integer Linear Programming (MILP) formulation that combines the motion of the team of robots with the relaxed specification automaton. Our approach combines automata-based and MILP-based methods and leverages the strengths of both approaches, while avoiding their shortcomings. Specifically, the relaxed specification automaton explicitly accounts for the progress towards satisfaction, and the MILP-based optimization approach avoids the state-space explosion associated with explicit product-automata construction, thereby efficiently solving the problem. The case studies highlight the efficiency of the proposed approach
|
|
10:30-12:00, Paper TuAT3-CC.8 | Add to My Program |
An Iterative Approach for Heterogeneous Multi-Agent Route Planning with Temporal Logic Goals and Travel Duration Uncertainty |
|
Liang, Kaier | Lehigh University |
Cardona, Gustavo A. | Lehigh University |
Vasile, Cristian Ioan | Lehigh University |
Keywords: Formal Methods in Robotics and Automation, Multi-Robot Systems
Abstract: This paper introduces an iterative approach to multi-agent route planning under chance constraints. A heterogeneous team of agents with various capabilities is tasked with a Capability Temporal Logic (CaTL) mission, a fragment of Signal Temporal Logic. The agents' motion is modeled as a finite weighted graph, where the weights represent travel durations. Given the probability distribution over the durations of each edge's traversal, we want to find paths for all agents such that (a) the specification robustness is maximized, (b) travel time is minimized, and (c) the success probability is maximized. We tackle the problem using an iterative approach. In each stage, it selects edges' traversal duration and success probabilities and then solves a multi-agent route planning problem. We use an efficient Mixed-Integer Linear Programming (MILP) encoding for the latter. Our method provides a framework for agents to make informed decisions in choosing the most suitable edge attributes (travel durations and success probabilities) that consider agents' capabilities to perform tasks in the environment. The proposed iterative method leverages graph structure to generate a more efficient search space. The effectiveness of our method is demonstrated through simulated case studies where obtaining the optimal solution would otherwise be computationally expensive. Our approach efficiently explores the solution space, generating better solutions and improving the performance of multi-agent route planning with uncertain travel durations.
|
|
10:30-12:00, Paper TuAT3-CC.9 | Add to My Program |
Safe Networked Robotics with Probabilistic Verification |
|
Narasimhan, Sai Shankar | The University of Texas at Austin |
Bhat, Sharachchandra | University of Texas at Austin |
Chinchali, Sandeep | The University of Texas at Austin |
Keywords: Formal Methods in Robotics and Automation, Networked Robots, Telerobotics and Teleoperation
Abstract: Autonomous robots must utilize rich sensory data to make safe control decisions. To process this data, compute-constrained robots often require assistance from remote computation, or the cloud, that runs compute-intensive deep neural network perception or control models. However, this assistance comes at the cost of a time delay due to network latency, resulting in past observations being used in the cloud to compute the control commands for the present robot state. Such communication delays could potentially lead to the violation of essential safety properties, such as collision avoidance. This paper develops methods to ensure the safety of robots operated over communication networks with stochastic latency. To do so, we use tools from formal verification to construct a shield, i.e., a run-time monitor, that provides a list of safe actions for any delayed sensory observation, given the expected and maximum network latency. Our shield is minimally intrusive and enables networked robots to satisfy key safety constraints, expressed as temporal logic specifications, with desired probability. We demonstrate our approach on a real F1/10th autonomous vehicle that navigates in indoor environments and transmits rich LiDAR sensory data over congested WiFi links.
|
|
TuAT4-CC Oral Session, CC-315 |
Add to My Program |
Multi-Robot Systems I |
|
|
Chair: Sukhatme, Gaurav | University of Southern California |
Co-Chair: Kanezaki, Asako | Tokyo Institute of Technology |
|
10:30-12:00, Paper TuAT4-CC.1 | Add to My Program |
Phasic Diversity Optimization for Population-Based Reinforcement Learning |
|
Jiang, Jingcheng | Dalian University of Technology |
Piao, Haiyin | Northwestern Polytechnical University |
Fu, Yu | Dalian University of Technology |
Hao, Yihang | Yangzhou Collaborative Innovation Research Institute CO., LTD |
Jiang, Chuanlu | Dalian University of Technology |
Wei, Ziqi | Chinese Academy of Science |
Yang, Xin | Dalian University of Technology |
Keywords: Reinforcement Learning, Aerial Systems: Perception and Autonomy, Aerial Systems: Applications
Abstract: Looking back at previous diversity work on Rein-forced learning, diversity is often achieved through the augmented loss function, which is required in the context of reward and diversity. Usually, the diversity optimization algorithm uses the multi-armed bandit algorithm to select the coefficient that predefined space. However, the dynamic distribution of the reward signal or quality of the MAB with diversity limits the performance of these methods. We introduce the Phase Diversity Optimization (PDO) algorithm, a population-based training-based framework that combines reward and diversity training to different stages, rather than optimizing multi-objective functions. In the secondary phase, having poor performance diversification through determinants does not replace the better agents in the archive. Rewards and diversity allow us to use the diversity of positive optimization in the secondary phase, where performance does not degrade. Furthermore, we built an aerial melee scenario agent.
|
|
10:30-12:00, Paper TuAT4-CC.2 | Add to My Program |
VO-Safe Reinforcement Learning for Drone Navigation |
|
Lin, Feiqiang | Cardiff University |
Wei, Changyun | Hohai University |
Grech, Raphael | Spirent Communications |
Ji, Ze | Cardiff University |
Keywords: Reinforcement Learning, Aerial Systems: Perception and Autonomy, Vision-Based Navigation
Abstract: This work is focused on reinforcement learning (RL)-based navigation for drones, whose localisation is based on visual odometry (VO). Such drones should avoid flying into areas with poor visual features, as this can lead to deteriorated localization or complete loss of tracking. To achieve this, we propose a hierarchical control scheme, which uses an RL-trained policy as the high-level controller to generate waypoints for the next control step and a low-level controller to guide the drone to reach subsequent waypoints. For the high-level policy training, unlike other RL-based navigation approaches, we incorporate awareness of VO performance into our policy by introducing pose estimation-related punishment. To aid robots in distinguishing between perception-friendly areas and unfavoured zones, we instead provide semantic scenes, as input for decision-making instead of raw images. This approach also helps minimise the sim-to-real application gap.
|
|
10:30-12:00, Paper TuAT4-CC.3 | Add to My Program |
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models |
|
Zhao, Mandi | Stanford University |
Jain, Shreeya | Columbia University |
Song, Shuran | Columbia University |
Keywords: Multi-Robot Systems, Deep Learning Methods, Human-Robot Collaboration
Abstract: We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset that evaluates LLMs’ agent representation and reasoning capability. We experimentally demonstrate the effectiveness of our approach — it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility — in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. Project website: https://project-roco.github.io
|
|
10:30-12:00, Paper TuAT4-CC.4 | Add to My Program |
Collision Avoidance and Navigation for a Quadrotor Swarm Using End-To-End Deep Reinforcement Learning |
|
Huang, Zhehui | University of Southern California |
Yang, Zhaojing | University of Southern California |
Krupani, Rahul | University of Southern California |
Şenbaşlar, Baskın | NVIDIA |
Batra, Sumeet | USC |
Sukhatme, Gaurav | University of Southern California |
Keywords: Multi-Robot Systems, Reinforcement Learning, Collision Avoidance
Abstract: End-to-end deep reinforcement learning (DRL) for quadrotor control promises many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Website: https://sites.google.com/view/obst-avoid-swarm-rl.
|
|
10:30-12:00, Paper TuAT4-CC.5 | Add to My Program |
C3F: Constant Collaboration and Communication Framework for Graph-Representation Dynamic Multi-Robotic Systems |
|
Jia, Hongda | National University of Defense Technology |
Gao, Zijian | National University of Defense Technology |
Yang, Cheng | National University of Defense Technology |
Ding, Bo | National University of Defense Technology |
Zhai, Yuanzhao | National University of Defense Technology |
Wang, Huaimin | National University of Defense Technology |
Keywords: Multi-Robot Systems, Reinforcement Learning, Distributed Robot Systems
Abstract: Deep reinforcement learning (DRL) methods have been widely applied in distributed multi-robotic systems and successfully realized autonomous learning in many fields. In these fields, robots need to communicate and collaborate with other robots in real time, and reach agreed cognition for task assignment, which puts high requirements on efficiency and stability. However, robots may often get damaged even crash in complex environments, and have to be dynamically substituted. It seems not robust enough for most existing DRL works to make new robots fast adapt to current team policies, causing performance degradation. In this work, we get inspired by the genetic mechanism of social animals' instincts, and propose a robust multi-robotic collaboration and communication framework, C3F. It introduces graph-based representation to discover more features on the relevance among robots, and takes advantage of meta learning mechanism to conclude the general meta policy. When some robots crash and get replaced by new ones, this meta policy will be reused to guide new robots on how to quickly follow the existing collaboration and communication rules, and fast adapt to their roles in the team. The experiments on both the Webots simulator and the Starcraft II platform indicate that our methods have better performance compared with some SOTA methods, showing strong robustness and remarkable adaptability to the dynamic substitution in multi-robotic systems.
|
|
10:30-12:00, Paper TuAT4-CC.6 | Add to My Program |
Multi-Level Action Tree Rollout (MLAT-R): Efficient and Accurate Online Multiagent Policy Improvement |
|
Henshall, Andrea | MIT |
Karaman, Sertac | Massachusetts Institute of Technology |
Keywords: Multi-Robot Systems, Reinforcement Learning, Optimization and Optimal Control
Abstract: Rollout algorithms are renowned for their abilities to correct for the suboptimalities of offline-trained base policies. In the multiagent setting, performing online rollout can require an exponentially large number of optimizations with respect to the number of agents. One-agent-at-a-time algorithms offer computationally efficient approaches to guaranteed policy improvement; however, this improvement is with respect to a state value estimate derived from a potentially poor base policy. Monte Carlo tree search (MCTS) provably converges to the true state value estimates; however, the exponentially large search space often makes its online use limited. Here, we present the Multi-Level Action Tree Rollout (MLAT-R) algorithm. MLAT-R provides 1) provable improvement over a base policy, 2) policy improvement with respect to the true state value, 3) applicability to any number of agents, and 4) an action space that grows linearly with the number of agents rather than exponentially. In this paper, we outline the algorithm, sketch a proof of its improvement over a base policy, and evaluate its performance on a challenging problem for which the base policy cannot reach a terminal state. Despite the challenging experimental setup, our algorithm reached a terminal state in 86% of all experiments, compared to 31% for state-of-the-art one-agent-at-a-time algorithms. In experiments involving MCTS, MLAT-R reached a terminal state in 99% of experiments compared to 92% for MCTS. MLAT-R achieved these results while considering an exponentially smaller action space than MCTS.
|
|
10:30-12:00, Paper TuAT4-CC.7 | Add to My Program |
Stimulate the Potential of Robots Via Competition |
|
Huang, Kangyao | Tsinghua University |
Guo, Di | Beijing University of Posts and Telecommunications |
Zhang, Xinyu | Tsinghua University |
Ji, Xiangyang | Tsinghua University |
Liu, Huaping | Tsinghua University |
Keywords: Multi-Robot Systems, Reinforcement Learning, Transfer Learning
Abstract: It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help individual robot to acquire knowledge from the competition, fully stimulating its dynamics potential in the race. Specifically, the competition information among competitors is introduced as the additional auxiliary signal to learn advantaged actions. We further build a Multiagent-Race environment, and extensive experiments are conducted, demonstrating that robots trained in competitive environments outperform ones that are trained with SoTA algorithms in single robot environment.
|
|
10:30-12:00, Paper TuAT4-CC.8 | Add to My Program |
Multi-Agent Visual Coordination Using Optical Wireless Communication |
|
Nakagawa, Haruyuki | Tokyo Institute of Technology |
Kanezaki, Asako | Tokyo Institute of Technology |
Keywords: Multi-Robot Systems, Vision-Based Navigation, Reinforcement Learning
Abstract: Communication is a key element in applying multi-agent reinforcement learning to a wide range of real-world scenarios. We focus on optical wireless communication (OWC), which is a practical solution to be used in various real situations where radio communication is not available, such as underwater or in a lot of radio noise environment. OWC is a method of communicating only with other agents in visual range using light, unlike radio wave like communication which is mostly assumed in existing research on multi-agent reinforcement learning. Due to limited communication, when OWC is used, overall performance is generally degraded from the case with full communication. In this paper, we propose a reinforcement learning method that learns visual coordination behavior using OWC. Our proposed visually cooperative behavior enables agents equipped with limited field of view (FOV) cameras to efficiently comprehend and imagine their surrounding environment through cooperative communication. Experimental results in simulation demonstrated that, using the proposed visual coordination method, multi-agents using OWC with general FOV show comparable performance to those with radio wave like full communication. Additionally, it has been demonstrated that this method can improve performance in various multi-agent reinforcement learning algorithms. We also implement OWC devices on real mobile robots and demonstrated the proposed multi-agent operation.
|
|
TuAT5-CC Oral Session, CC-411 |
Add to My Program |
Sensors and Audition |
|
|
Chair: Tahara, Kenji | Kyushu University |
Co-Chair: Chaumette, Francois | Inria Center at University of Rennes |
|
10:30-12:00, Paper TuAT5-CC.1 | Add to My Program |
Multi-Modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer |
|
Xin, Shuo | Zhejiang University |
Zhang, Zhen | Zhejiang University |
Wang, Mengmeng | Zhejiang University |
Hou, Xiaojun | Zhejiang University |
Guo, Yaowei | Zhejiang University |
Kang, Xiao | China North Vehicle Research Institute |
Liu, Liang | Zhejiang University |
Liu, Yong | Zhejiang University |
Keywords: Sensor Fusion, Human Detection and Tracking, Legged Robots
Abstract: Tracking a specific person in 3D scene is gaining momentum due to its numerous applications in robotics. Currently, most 3D trackers focus on driving scenarios with neglected jitter and uncomplicated surroundings, which results in their severe degeneration in complex environments, especially on jolting robot platforms (only 20-60% success rate). To improve the accuracy, a Point-Video-based Transformer Tracking model (PVTrack) is presented for robots. It is the first multi-modal 3D human tracking work that incorporates point clouds together with RGB videos to achieve information complementarity. Moreover, PVTrack proposes the Siamese Point-Video Transformer for feature aggregation to overcome dynamic environments, which captures more target-aware information through the hierarchical attention mechanism adaptively. Considering the violent shaking on robots and rugged terrains, a lateral Human-ware Proposal Network is designed together with an Anti-shake Proposal Compensation module. It alleviates the disturbance caused by complex scenes as well as the particularity of the robot platform. Experiments show that our method achieves state-of-the-art performance on both KITTI/Waymo datasets and a quadruped robot for various indoor and outdoor scenes.
|
|
10:30-12:00, Paper TuAT5-CC.2 | Add to My Program |
Efficient Gesture Recognition on Spiking Convolutional Networks through Sensor Fusion of Event-Based and Depth Data |
|
Steffen, Lea | FZI Research Center for Information Technology, 76131 Karlsruhe, |
Trapp, Thomas | FZI Research Center for Information Technology |
Roennau, Arne | FZI Forschungszentrum Informatik, Karlsruhe |
Dillmann, Rüdiger | FZI - Forschungszentrum Informatik - Karlsruhe |
Keywords: Sensor Fusion, Neurorobotics, Multi-Modal Perception for HRI
Abstract: As intelligent systems become increasingly important in our daily lives, new ways of interaction are needed. Classical user interfaces pose issues for the physically impaired and are partially not practical or convenient. Gesture recognition is an alternative, but often not reactive enough when conventional cameras are used. This work proposes a Spiking Convolutional Neural Network, processing event- and depth data for gesture recognition. The network is simulated using the open-source neuromorphic computing framework LAVA for offline training and evaluation on an embedded system. For the evaluation three open source data sets are used. Since these do not represent the applied bi-modality, a new data set with synchronized event- and depth data was recorded. The results show the viability of temporal encoding on depth information and modality fusion, even on differently encoded data, to be beneficial to network performance and generalization capabilities.
|
|
10:30-12:00, Paper TuAT5-CC.3 | Add to My Program |
Smoothly Connected Preemptive Impact Reduction and Contact Impedance Control |
|
Arita, Hikaru | Kyushu University |
Nakamura, Hayato | Kyushu University |
Fujiki, Takuto | Kyushu University |
Tahara, Kenji | Kyushu University |
Keywords: Sensor-based Control, Optical Proximity Sensor, Reactive and Sensor-Based Planning, Physical Human-Robot Interaction
Abstract: This study proposes novel control methods that lower impact force by preemptive movement and smoothly transition to conventional contact-based impedance control. These techniques are suggested for application in force control-based robots and position/velocity control-based robots. Strong impact forces have a negative influence on multiple robotic tasks. Recently, preemptive impact reduction techniques that expand conventional contact impedance control using proximity sensors have been examined. However, a seamless transition from impact reduction to contact impedance control has yet to be demonstrated. In contrast, our proposed methods solve this problem. The preemptive impact reduction feature can be added to an already-implemented impedance controller because the parameter design is divided into impact reduction and contact impedance control. There is no abrupt alteration in the contact force during the transition. Furthermore, although the preemptive impact reduction uses a crude optical proximity sensor, the influence of reflectance is minimized. Analyses and real-world experiments confirm these features, which are useful for many robots performing contact tasks.
|
|
10:30-12:00, Paper TuAT5-CC.4 | Add to My Program |
Point Cloud-Based Control Barrier Function Regression for Safe and Efficient Vision-Based Control |
|
de Sa, Massimiliano | University of California, Berkeley |
Kotaru, Venkata Naga Prasanth | University of California Berkeley |
Sreenath, Koushil | University of California, Berkeley |
Keywords: Sensor-based Control, Robot Safety, Aerial Systems: Perception and Autonomy
Abstract: Control barrier functions have become an increasingly popular framework for safe real-time control. In this work, we present a computationally low-cost framework for synthesizing barrier functions over point cloud data for safe vision-based control. We take advantage of surface geometry to locally define and synthesize a quadratic CBF over a point cloud. This CBF is used in a CBF-QP for control and verified in simulation on quadrotors and in hardware on quadrotors and the TurtleBot3. This technique enables safe navigation through unstructured and dynamically changing environments and is shown to be significantly more efficient than current methods.
|
|
10:30-12:00, Paper TuAT5-CC.5 | Add to My Program |
Stability Analysis of Plane-To-Plane Positioning by Proximity-Based Control |
|
Thomas, John | Inria Rennes |
Chaumette, Francois | Inria Center at University of Rennes |
Keywords: Sensor-based Control, Robust/Adaptive Control
Abstract: In this paper, we discuss the stability analysis of Plane-to-Plane positioning task when the task is designed in proximity sensor space. We utilize a multi-sensor arrangement of proximity sensors that forms a proximity array to obtain the necessary information in sensor space. For the task considered, we provide closed-form equations for the closed-loop system by obtaining the analytical form of pseudo-inverse for the interaction matrix involved. This further enables us to suggest a new control law producing a decoupled exponential decrease of the sensor errors in perfect conditions, while being more robust to estimation errors in the surface normal. By applying Gershgorin’s theorem to the closed-form matrices, we are able to provide conditions for stability with respect to errors in extrinsic parameters and surface normal. Simulation results are provided to discuss the robustness of the task with respect to these modeling parameters.
|
|
10:30-12:00, Paper TuAT5-CC.6 | Add to My Program |
An Image Acquisition Scheme for Visual Odometry Based on Image Bracketing and Online Attribute Control |
|
Zhang, Shuyang | The Hong Kong University of Science and Technology |
He, Jinhao | The Hong Kong University of Science and Technology (Guangzhou) |
Xue, Bohuan | HKUST |
Jin, Wu | UESTC |
Yin, Pengyu | Nanyang Technological University |
Jiao, Jianhao | University College London |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Sensor-based Control, Vision-Based Navigation, SLAM
Abstract: Visual odometry system is challenged by complex illumination environments. Image quality and its consistency in the time domain directly determine feature detection and tracking performance, which further affect the robustness and accuracy of the entire system. In this paper, an image acquisition scheme with image bracketing patterns is proposed. Images with different exposure levels are continuously captured to explore the scene under varying illumination sufficiently. An attribute control method is designed to adjust image exposures within the brackets online. Gaussian process regression fits the relationship between image quality metric and exposure via image synthesis technique, and the optimal exposures for the next bracket are obtained directly without attempts to ensure a quick response. Experiments show our acquisition system's effectiveness and performance improvement for VO tasks in complex illumination scenes.
|
|
10:30-12:00, Paper TuAT5-CC.7 | Add to My Program |
MagicTip: A Novel High-Resolution 3D Multi-Layer Grid-Based Tactile Sensor |
|
Fan, Wen | University of Bristol |
Li, Haoran | University of Bristol |
Zhang, Dandan | Imperial College London |
Keywords: Additive Manufacturing
Abstract: Accurate robotic control over interactions with the environment is fundamentally grounded in understanding tactile contacts. In this paper, we introduce MagicTac, a novel high-resolution grid-based tactile sensor. This sensor employs a 3D multi-layer grid-based design, inspired by the Magic Cube structure. This structure can help increase the spatial resolution of MagicTac to perceive external interaction contacts. Moreover, the sensor is produced using the multi-material additive manufacturing technique, which simplifies the manufacturing process while ensuring repeatability of production. Compared to traditional vision-based tactile sensors, it offers the advantages of i) high spatial resolution, ii) significant affordability, and iii) fabrication-friendly construction that requires minimal assembly skills. We evaluated the proposed MagicTac in the tactile reconstruction task using the deformation field and optical flow. Results indicated that MagicTac could capture fine textures and is sensitive to dynamic contact information. Through the grid-based multi-material additive manufacturing technique, the affordability and productivity of MagicTac can be enhanced with a minimum manufacturing cost of £4.76 and a minimum manufacturing time of 24.6 minutes.
|
|
10:30-12:00, Paper TuAT5-CC.8 | Add to My Program |
Microphone Pair Training for Robust Sound Source Localization with Diverse Array Configurations |
|
An, Inkyu | ETRI |
An, Guoyuan | KAIST |
Kim, Taeyoung | KAIST |
Yoon, Sung-eui | KAIST |
Keywords: Robot Audition, Localization
Abstract: We present a novel sound source localization method that leverages microphone pair training, designed to deliver robust performance in various real-world environments. Existing deep learning (DL)-based approaches face scalability issues when dealing with various types of microphone arrays. To address these issues, our approach has been structured into two training steps: the first step focuses on microphone pair training, while the second step is designed for array geometry-aware training. The first training step enables our model to learn from multiple datasets covering various real-world situations, allowing it to robustly estimate the time difference of arrival (TDoA). Our robust-TDoA model incorporates a Mel scale learnable filter bank (MLFB) and a hierarchical frequency-to-time attention network (HiFTA-net). This allows it to effectively learn from various situations in multiple datasets, including those involving simultaneous sources and various sound events. The second training step enables our approach to estimate the direction of arrival (DoA) of sound based on TDoA information computed by our robust-TDoA model, which begins with parameters acquired during the first training step. During this process, our approach can be trained to accommodate geometry information of the target microphone array, which can span diverse array types. As a result, our method demonstrates robust performance across two DoA estimation tasks using three different types of arrays.
|
|
10:30-12:00, Paper TuAT5-CC.9 | Add to My Program |
Mobile Bot Rotation Using Sound Source Localization and Distant Speech Recognition |
|
Sontakke, Swapnil | Indian Institute of Information Technology Dharwad |
Hegde, Pradyoth | Indian Institute of Information Technology Dharwad |
Bannulmath, Prashant | Indian Institute of Information Technology Dharwad |
K T, Deepak | Indian Institute of Information Technology Dharwad |
Keywords: Robot Audition, Software-Hardware Integration for Robot Systems, Physical Human-Robot Interaction
Abstract: In the last few years, mobile robots such as floor cleaners, assistive robots, and home telepresence have become an essential part of our day-to-day activities. In human-computer interaction, speech is the preferred way of communication, especially in indoor environments. This paper proposes a speech module to rotate the mobile robot. It has two components, namely, a distant automatic speech recognizer and a sound source localizer. To build distant speech recognizer, far-field speech data is collected at 1, 3, and 5-meter distances. The model performs well even at a 5-meter distance with a Word Error Rate of 40.38% and a Character Error Rate of 28.85%. The direction of arrival of the speech signal is computed from the 4-mic circular array microphone. The speech module is integrated with the Robot Operating System and physically demonstrated on Turtlebot3 Waffle Pi. It is observed that the speech recognizer and sound source localizer work well in the reverberant indoor environment with a small single-board computer.
|
|
TuAT6-CC Oral Session, CC-414 |
Add to My Program |
2D/3D Visual Perception |
|
|
Chair: Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Co-Chair: Pathak, Sarthak | Chuo University |
|
10:30-12:00, Paper TuAT6-CC.1 | Add to My Program |
Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes |
|
Das, Alloy | Indian Statistical Institute, Kolkata |
Biswas, Sanket | Computer Vision Center, Universitat Autònoma De Barcelona |
Pal, Umapada | Indian Statistical Institute, Kolkata |
Lladós, Josep | Computer Vision Center, Universitat Autònoma De Barcelona |
Keywords: Recognition, Computer Vision for Automation, Transfer Learning
Abstract: When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of domain-agnostic scene text spotting, i.e., training a model on multi-domain source data such that it can directly generalize to target domains rather than being specialized for a specific domain or scenario. In this regard, we present the community a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes to establish an important case study. Moreover, we also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter which achieves comparable or superior performance over existing text spotting architectures for both regular and arbitrary-shaped scene text spotting benchmarks in terms of both accuracy and model efficiency. The dataset, code and pre-trained models have been released in our GitHub.
|
|
10:30-12:00, Paper TuAT6-CC.2 | Add to My Program |
Masked Local-Global Representation Learning for 3D Point Cloud Domain Adaptation |
|
Xing, Bowei | Peking University |
Ying, Xianghua | Peking University |
Wang, Ruibin | Peking University |
Keywords: Recognition, Deep Learning for Visual Perception, Visual Learning
Abstract: Point cloud is a popular and widely used geometric representation, which has attracted significant attention in 3D vision. However, the geometric variability of point cloud representations across different datasets can cause domain discrepancies, which hinder knowledge transfer and model generalization, resulting in degraded performance in target domain. In this paper, we present a novel approach to improve point cloud domain adaptation by employing masked representation learning in a self-supervised manner. Specifically, our method combines masked feature prediction and masked sample consistency to encode both local structure and global semantic information for learning invariant point cloud representation across domains. Moreover, to learn domain-specific representation and transfer knowledge from source to target, we propose prototype-calibrated self-training. By exploiting class-wise prototypes in the shared feature space, the soft pseudo labels can be adaptively denoised, which benefits the decision boundary learning in target domain. We conduct experiments on PointDA-10 and PointSegDA for 3D point cloud shape classification and semantic segmentation, respectively. The results demonstrate the effectiveness of our method and show that we can achieve the new state-of-the-art performance on point cloud domain adaptation.
|
|
10:30-12:00, Paper TuAT6-CC.3 | Add to My Program |
Continuous Adaptation in Person Re-Identification for Robotic Assistance |
|
Rollo, Federico | Leonardo S.p.A |
Zunino, Andrea | Leonardo |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Mingo Hoffman, Enrico | INRIA Nancy - Grand Est |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Recognition, Human Detection and Tracking, AI-Enabled Robotics
Abstract: In scenarios of Human-Robot Interaction (HRI), it is often assumed that the robot should cooperate with the closest individual or that only one person is present. However, in real-life situations, such as shop floor operations, this assumption may not hold. Thus, it becomes necessary for a robot to recognize a specific target in a crowded environment. To address this problem, we propose a person re-identification module that uses continuous visual adaptation techniques. This module ensures that the robot can seamlessly cooperate with the appropriate individual despite its appearance changes or partial or total occlusions. We used both a laboratory environment and an HRI scenario where the robot followed a person to test our framework. During the test, the targets were asked to change their appearance and disappear from the camera's field of view to test the module's ability to handle challenging cases of occlusion and outfit variations. We compared our framework with a state-of-the-art Multi-Object Tracking (MOT) method, and the results showed that our module, shortly named CARPE-ID, accurately tracked each selected target throughout the experiments in all cases except for two cases. In contrast, the MOT had an average of 4 tracking errors for each video.
|
|
10:30-12:00, Paper TuAT6-CC.4 | Add to My Program |
Spectral Geometric Verification: Re-Ranking Point Cloud Retrieval for Metric Localization |
|
Vidanapathirana, Kavisha | Queensland University of Technology |
Moghadam, Peyman | CSIRO |
Sridharan, Sridha | Queensland University of Technology |
Fookes, Clinton | Queensland University of Technology |
Keywords: Recognition, Localization, Computer Vision for Automation
Abstract: In large-scale metric localization, an incorrect result during retrieval will lead to an incorrect pose estimate or loop closure. Re-ranking methods propose to take into account all the top retrieval candidates and re-order them to increase the likelihood of the top candidate being correct. However, state-of-the-art re-ranking methods are inefficient when re-ranking many potential candidates due to their need for resource intensive point cloud registration between the query and each candidate. In this work, we propose an efficient spectral method for geometric verification (named SpectralGV) that does not require registration. We demonstrate how the optimal inter-cluster score of the correspondence compatibility graph of two point clouds represents a robust fitness score measuring their spatial consistency. This score takes into account the subtle geometric differences between structurally similar point clouds and therefore can be used to identify the correct candidate among potential matches retrieved by global similarity search. SpectralGV is deterministic, robust to outlier correspondences, and can be computed in parallel for all potential candidates. We conduct extensive experiments on 5 large-scale datasets to demonstrate that SpectralGV outperforms other state-of-the-art re-ranking methods and show that it consistently improves the recall and pose estimation of 3 state-of-the-art metric localization architectures while having a negligible effect on their runtime.
|
|
10:30-12:00, Paper TuAT6-CC.5 | Add to My Program |
Incorporating Scene Graphs into Pre-Trained Vision-Language Models for Multimodal Open-Vocabulary Action Recognition |
|
Wei, Chao | Tsinghua University |
Deng, Zhidong | Tsinghua University |
Keywords: Recognition, Semantic Scene Understanding, Multi-Modal Perception for HRI
Abstract: This paper presents Action-SGFA, a novel action feature alignment approach to learn unified joint embeddings across four action modalities incorporating scene graph (SG) comprehension. A new training paradigm for Action-SGFA is also devised to improve pre-trained VL models using datasets with SG annotation. When learning from image-SG pairs, it captures structure-associated action knowledge for visual and textual encoders. SG supervision generates fine-grained captions based on various graph augmentations highlighting different compositional aspects of action scenes. Furthermore, our research reveals that all combinations of paired data are unnecessary to train such unified embeddings, and only image-paired data is sufficient to bind all action modalities together. Our Action-SGFA can leverage existing large VL models, enhancing their zero-shot capabilities of new modalities due to their natural pairings with images. The open-vocabulary zero-shot performance improves with the strength of the pre-trained VL model and the SG comprehension. We establish a new state-of-the-art in several zero-shot action recognition tasks across modalities, significantly surpassing the vanilla skeleton zero-shot method by 27.0% and 19.7% on NTU-60 and NTU-120, respectively. Additionally, in the context of RGB videos, we surpass the state-of-the-art method on Kinetics-400 by 2.1%.
|
|
10:30-12:00, Paper TuAT6-CC.6 | Add to My Program |
LPS-Net: Lightweight Parameter-Shared Network for Point Cloud-Based Place Recognition |
|
Liu, Chengxin | Shandong University |
Chen, Guiyou | ShanDong University |
Song, Ran | Shandong University |
Keywords: Recognition, Vision-Based Navigation, Computer Vision for Transportation
Abstract: With innovation in fields such as autonomous driving and augmented reality, point cloud-based place recognition has gained significant attention. Many methods try to address this problem by extracting and matching global descriptors in a database, but they often must balance the extraction of comprehensive contextual information and large model sizes. To overcome this challenge, we propose a lightweight parameter-shared network (LPS-Net), which includes multiple bidirectional perception units (BPUs) to extract multi-scale long-range contextual information and parameter-shared NetVLADs (PS-VLADs) to aggregate descriptors. A BPU includes a parameter-shared convolution module (SharedConv) that significantly compresses the model and enhances its ability to capture informative features. In PS-VLADs, we replace half the parameters used in the original NetVLAD with trainable scalars, which further reduces the model size, and theoretically prove their equivalence. Experimental results demonstrate that LPS-Net achieves state-of-the-art performance at the task of point cloud-based place recognition while maintaining a small model size. Code and supplementary materials can be found at https://github.com/Yavinr/LPS-Net.
|
|
10:30-12:00, Paper TuAT6-CC.7 | Add to My Program |
Joint Response and Background Learning for UAV Visual Tracking |
|
Wang, Biao | Beihang University |
Li, Wenling | Beihang University |
Zhang, Bin | Beijing University of Posts and Telecommunications |
Liu, Yang | Beihang University |
Keywords: Visual Tracking, Computer Vision for Automation, Aerial Systems: Perception and Autonomy
Abstract: Correlation filter (CF)-based approaches have gained widespread attention in the field of unmanned aerial vehicle (UAV) visual tracking due to their light-weight characteristics. However, CFs are prone to generating low-quality response in challenging UAV scenarios, e.g., fast motion and background clutter. In this paper, in order to model the tracker more robustly, we first conduct an effective regularization analysis from the perspectives of response- and background-learning. Specifically, to address response degradation, we propose a module for learning temporal consistency and reversibility of response, supplemented by a novel background-aware module to enhance the ability to learn from negative samples. In addition, we propose a fast coarse-to-fine scale search strategy, which alleviates the challenges in estimating bounding boxes under non-uniform aspect ratios. We have developed two tracker versions, namely RBLT and DeepRBLT, based on the depth of the features. Comprehensive experiments on four UAV benchmarks and one generic benchmark have indicated the superiority of our trackers compared to other state-of-the-art trackers, with enough speed for real-time applications.
|
|
10:30-12:00, Paper TuAT6-CC.8 | Add to My Program |
ZS6D: Zero-Shot 6D Object Pose Estimation Using Vision Transformers |
|
Ausserlechner, Philipp | TU Wien |
Haberger, David Dylan | TU Wien |
Thalhammer, Stefan | TU Wien |
Weibel, Jean-Baptiste | TU Wien |
Vincze, Markus | Vienna University of Technology |
Keywords: Deep Learning for Visual Perception, Recognition, Visual Learning
Abstract: As robotic systems increasingly encounter complex and unconstrained real-world scenarios, there is a demand to recognize diverse objects. The state-of-the-art 6D object pose estimation methods rely on object-specific training and therefore do not generalize to unseen objects. Recent novel object pose estimation methods are solving this issue using task-specific fine-tuned CNNs for deep template matching. This adaptation for pose estimation still requires expensive data rendering and training procedures. MegaPose for example is trained on a dataset consisting of two million images showing 20,000 different objects to reach such generalization capabilities. To overcome this shortcoming we introduce ZS6D, for zero-shot novel object 6D pose estimation. Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are used for matching rendered templates against query images of objects and for establishing local correspondences. These local correspondences enable deriving geometric correspondences and are used for estimating the object's 6D pose with RANSAC-based Ptextit{n}P. This approach showcases that the image descriptors extracted by pre-trained ViTs are well-suited to achieve a notable improvement over two state-of-the-art novel object 6D pose estimation methods, without the need for task-specific fine-tuning. Experiments are performed on LMO, YCBV, and TLESS. In comparison to MegaPose, we improve the Average Recall on all three datasets and compared to OSOP we improve on two datasets. The code is available at https://github.com/PhilippAuss/ZS6D.
|
|
10:30-12:00, Paper TuAT6-CC.9 | Add to My Program |
Fluxformer: Flow-Guided Duplex Attention Transformer Via Spatio-Temporal Clustering for Action Recognition |
|
Hong, Younggi | Chonnam National University |
Kim, Min Ju | Chonnam National University |
Lee, Isack | Chonnam National University |
Yoo, Seok Bong | Chonnam National University |
Keywords: Recognition, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Vision transformers have demonstrated impressive performance in various robotics and automation applications, such as classification automation and action recognition. However, the drawback of transformers is their quadratic increase in computing resources with larger inputs and dependence on considerable data for training. Most action recognition models using the transformer structure rely on a few frames from the original video to reduce computation, so temporal information is compromised by low frame rates. Spatial information is also compromised by reducing the number of embeddings as the transformer layer iterates. The letter proposes a robust model for action recognition that overcomes the limitations of most action recognition models with the transformer structure using the duplex attention function, flow-guided information, RGB information, and spatial support tokens. The proposed duplex attention mechanism leverages optical flow and RGB to address the lack of temporal information. The method employs spatial interest clustering to convert input data into tokens, improving the preservation of spatial information. Finally, meaningful action event frames are extracted by analyzing the flow and clustering to distinguish scenes. The experimental results reveal that the proposed model outperforms state-of-the-art methods in action recognition accuracy.
|
|
TuAT7-CC Oral Session, CC-416 |
Add to My Program |
Continual Learning |
|
|
Chair: Ariki, Yuka | Sony Group Corporation |
Co-Chair: Agrawal, Pulkit | MIT |
|
10:30-12:00, Paper TuAT7-CC.1 | Add to My Program |
LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation |
|
Cheng, Shuo | Gatech |
Xu, Danfei | Georgia Institute of Technology |
Keywords: Reinforcement Learning, Task and Motion Planning, Continual Learning
Abstract: To assist with everyday human activities, robots must solve complex long-horizon tasks and generalize to new settings. Recent deep reinforcement learning (RL) methods show promise in fully autonomous learning, but they struggle to reach long-term goals in large environments. On the other hand, Task and Motion Planning (TAMP) approaches excel at solving and generalizing across long-horizon tasks, thanks to their powerful state and action abstractions. But they assume predefined skill sets, which limits their real-world applications. In this work, we combine the benefits of these two paradigms and propose an integrated task planning and skill learning framework named LEAGUE (Learning and Abstraction with Guidance). LEAGUE leverages the symbolic interface of a task planner to guide RL- based skill learning and creates abstract state space to enable skill reuse. More importantly, LEAGUE learns manipulation skills in-situ of the task planning system, continuously growing its capability and the set of tasks that it can solve. We evaluate LEAGUE on four challenging simulated task domains and show that LEAGUE outperforms baselines by large margins. We also show that the learned skills can be reused to accelerate learning in new tasks domains and transfer to a physical robot platform.
|
|
10:30-12:00, Paper TuAT7-CC.2 | Add to My Program |
Test-Time Adaptation in the Dynamic World with Compound Domain Knowledge Management |
|
Song, Junha | KAIST |
Park, Kwanyong | KAIST |
Shin, InKyu | KAIST |
Woo, Sanghyun | KAIST |
Zhang, Chaoning | KAIST |
Kweon, In So | KAIST |
Keywords: Continual Learning, Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: Prior to the deployment of robotic systems, pre-training the deep-recognition models on all potential visual cases is infeasible in practice. Hence, test-time adaptation (TTA) allows the model to adapt itself to novel environments and improve its performance during test time (i.e., lifelong adaptation). Several works for TTA have shown promising adaptation performances in continuously changing environments. However, our investigation reveals that existing methods are vulnerable to dynamic distributional changes and often lead to overfitting of TTA models. To address this problem, this paper first presents a robust TTA framework with compound domain knowledge management. Our framework helps the TTA model to harvest the knowledge of multiple representative domains (i.e., compound domain) and conduct the TTA based on the compound domain knowledge. In addition, to prevent overfitting of the TTA model, we devise novel regularization which modulates the adaptation rates using domain-similarity between the source and the current target domain. With the synergy of the proposed framework and regularization, we achieve consistent performance improvements in diverse TTA scenarios, especially on dynamic domain shifts. We demonstrate the generality of proposals via extensive experiments including image classification on ImageNet-C and semantic segmentation on GTA5, C-driving, and corrupted Cityscapes datasets.
|
|
10:30-12:00, Paper TuAT7-CC.3 | Add to My Program |
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference |
|
Banerjee, Soumya | IIT Kanpur |
Verma, Vinay Kumar | IIT Kanpur |
Mukherjee, Avideep | IIT Kanpur |
Gupta, Deepak | AMAZON |
Namboodiri, Vinay | University of Bath |
Rai, Piyush | University of Utah |
Keywords: Continual Learning, Incremental Learning, Deep Learning for Visual Perception
Abstract: Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is streaming (observes each training example only once), requires a single pass over the data, can learn in a class-incremental manner, and can be evaluated on-the-fly (anytime inference). To accomplish these, we propose a novel emph{virtual gradients} based approach for continual representation learning which adapts to each new example while also generalizing well on past data to prevent catastrophic forgetting. Our approach also leverages an exponential-moving-average-based semantic memory to further enhance performance. Experiments on diverse datasets with temporally correlated observations demonstrate our method's efficacy and superior performance over existing methods.
|
|
10:30-12:00, Paper TuAT7-CC.4 | Add to My Program |
Experience Consistency Distillation Continual Reinforcement Learning for Robotic Manipulation Tasks |
|
Zhao, Chao | Xi'an Jiaotong University |
Xu, Jie | Xi'an Jiaotong University |
Peng, Ru | Xi'an Jiaotong University |
Chen, Xingyu | Xi'an Jiaotong University |
Mei, Kuizhi | Xi'an Jiaotong University |
Lan, Xuguang | Xi'an Jiaotong University |
Keywords: Continual Learning, Reinforcement Learning, Incremental Learning
Abstract: Continual reinforcement learning, which aims to help robots acquire skills without catastrophic forgetting, obviating the need to re-learn all tasks from scratch. In order to enable lifelong acquisition of skills in robots, replay-based continual reinforcement learning has emerged as a promising research direction. These techniques replay data from previous tasks to mitigate forgetting when learning new skills. However, existing replay-based methods store poor representative experience, and the experience utilization of old tasks is inefficient. To address these issues, we propose a experience consistency distillation method for robot continual reinforcement learning to improve the data efficiency of the experience. Specifically, the experience of old tasks are distilled to obtain Markov Decision Process (MDP) data with high compression ratio and information content. To ensure consistent data distributions before and after distillation, we further utilize a Fréchet Inception Distance (FID) loss as a regularization constraint. In order to improve experience utilization efficiency, the policy is then trained using both the distilled data and current task data, with policy distillation performed based on uncertainty metrics. Our method is validated in the continual reinforcement learning simulation platform and real scene with a UR5e robot arm. Experimental results indicate that our method achieves higher success and lower buffer size requirement compared to other methods.
|
|
10:30-12:00, Paper TuAT7-CC.5 | Add to My Program |
Adapting to the “Open World”: The Utility of Hybrid Hierarchical Reinforcement Learning and Symbolic Planning |
|
Lorang, Pierrick | AIT Austrian Institute of Technology GmbH - Tufts University |
Horvath, Helmut | Technische Universität Wien, TUW |
Kietreiber, Tobias | University of Applied Sciences St. Poelten |
Zips, Patrik | AIT Austrian Institute of Technology GmbH |
Heitzinger, Clemens | TU Wien |
Scheutz, Matthias | Tufts University |
Keywords: Integrated Planning and Learning, Continual Learning, Reinforcement Learning
Abstract: Open-world robotic tasks such as autonomous driving pose significant challenges to robot control due to unknown and unpredictable events that disrupt task performance. Neural network-based reinforcement learning (RL) techniques (like DQN, PPO, SAC, etc.) struggle to adapt in large domains and suffer from catastrophic forgetting. Hybrid planning and RL approaches have shown some promise in handling environmental changes but lack efficiency in accommodation speed. To address this limitation, we propose an enhanced hybrid system with a nested hierarchical action abstraction that can utilize previously acquired skills to effectively tackle unexpected novelties. We show that it can adapt faster and generalize better compared to state-of-the-art RL and hybrid approaches, significantly improving robustness when multiple environmental changes occur at the same time.
|
|
10:30-12:00, Paper TuAT7-CC.6 | Add to My Program |
Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models |
|
Tziafas, Georgios | University of Groningen |
Kasaei, Hamidreza | University of Groningen |
Keywords: Continual Learning, Learning from Experience, Learning from Demonstration
Abstract: Large Language Models (LLMs) have emerged as a new paradigm for embodied reasoning and control, most recently by generating robot policy code that utilizes a custom library of vision and control primitive skills. However, prior arts fix their skills library and steer the LLM with carefully hand-crafted prompt engineering, limiting the agent to a stationary range of addressable tasks. In this work, we introduce LRLL, an LLM-based lifelong learning agent that continuously grows the robot skill library to tackle manipulation tasks of ever-growing complexity. LRLL achieves this with four novel contributions : 1) a soft memory module that allows dynamic storage and retrieval of past experiences to serve as context, 2) a self-guided exploration policy that proposes new tasks in simulation, 3) a skill abstractor that distills recent experiences into new library skills, and 4) a lifelong learning algorithm for enabling human users to bootstrap new skills with minimal online interaction. LRLL continuously transfers knowledge from the memory to the library, building composable, general and interpretable policies, while bypassing gradient-based optimization, thus relieving the learner from catastrophic forgetting. Empirical evaluation in a simulated tabletop environment shows that LRLL outperforms end-to-end and vanilla LLM approaches in the lifelong setup, while learning skills that are transferable to the real world. Project material will become available at the webpage https://gtziafas.github.io/LRLL_project/
|
|
10:30-12:00, Paper TuAT7-CC.7 | Add to My Program |
Lifelong Robot Learning with Human Assisted Language Planners |
|
Parakh, Meenal | Princeton University |
Fong, Alisha | Massachusetts Institute of Technology |
Simeonov, Anthony | Massachusetts Institute of Technology |
Chen, Tao | Massachusetts Institute of Technology |
Gupta, Abhishek | University of Washington |
Agrawal, Pulkit | MIT |
Keywords: Continual Learning, Manipulation Planning, Integrated Planning and Learning
Abstract: Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions. However, current LLM-based planners are only able to operate with a fixed set of skills. We overcome this critical limitation and present a method for using LLM-based planners to query new skills and teach robots these skills in a data and time-efficient manner for rigid object manipulation. Our system can re-use newly acquired skills for future tasks, demonstrating the potential of open world and lifelong learning. We evaluate the proposed framework on multiple tasks in simulation and the real world. Videos are available at: url{https://sites.google.com/view/halp-submission}.
|
|
10:30-12:00, Paper TuAT7-CC.8 | Add to My Program |
Probabilistic Spiking Neural Network for Robotic Tactile Continual Learning |
|
Fang, Senlin | Shenzhen Institute of Advanced Technology |
Liu, Yi Wen | Shenzhen Institute of Advanced Technology, University of Chinese |
Liu, Chengliang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Wang, Jingnan | Shenzhen Institute of Advanced Technology |
Su, Yuanzhe | University of Chinese Academy of Sciences |
Zhang, Yupo | Southern University of Science and Technology |
Kong, Hoiio | City University of Macau |
Yi, Zhengkun | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Wu, Xinyu | CAS |
Keywords: Continual Learning, Probabilistic Inference, Incremental Learning
Abstract: The sense of touch is essential for robots to perform various daily tasks. Artificial Neural Networks have shown significant promise in advancing robotic tactile learning. However, due to the changing of tactile data distribution as robots encounter new tasks, ANN-based robotic tactile learning suffers from catastrophic forgetting. To solve this problem, we introduce a novel continual learning (CL) framework called the Probabilistic Spiking Neural Network with Variational Continual Learning (PSNN-VCL). In this framework, PSNN introduces uncertainty during spike emission and can apply fast Variational Inference by optimizing the uncertainty through backpropagation, which significantly reduces the required model parameters for VCL. We establish a robotic tactile CL benchmark using publicly available datasets to evaluate our method. Experimental results demonstrated that, compared to other CL methods, PSNN-VCL not only achieves superior performance in terms of widely used CL metrics but also achieves at least a 50% reduction in model parameters on the robotic tactile CL benchmark.
|
|
10:30-12:00, Paper TuAT7-CC.9 | Add to My Program |
LOTUS: Continual Imitation Learning for Robot Manipulation through Unsupervised Skill Discovery |
|
Wan, Weikang | Peking University |
Zhu, Yifeng | The University of Texas at Austin |
Shah, Rutav | The University of Texas at Austin |
Zhu, Yuke | The University of Texas at Austin |
Keywords: Imitation Learning, Continual Learning, Deep Learning in Grasping and Manipulation
Abstract: We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan. The core idea behind LOTUS is constructing an ever-growing skill library from a sequence of new tasks with a small number of corresponding task demonstrations. LOTUS starts with a continual skill discovery process using an open-vocabulary vision model, which extracts skills as recurring patterns presented in unstructured demonstrations. Continual skill discovery updates existing skills to avoid catastrophic forgetting of previous tasks and adds new skills to exhibit novel behaviors. LOTUS trains a meta-controller that flexibly composes various skills to tackle vision-based manipulation tasks in the lifelong learning process. Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in average success rates, showing its superior knowledge transfer ability compared to prior methods. More experimental videos and results can be found on the project website: https://ut-austin-rpl.github.io/Lotus/
|
|
TuAT8-CC Oral Session, CC-418 |
Add to My Program |
Learning |
|
|
Chair: Aksoy, Eren Erdal | Halmstad University |
Co-Chair: Ramirez-Amaro, Karinne | Chalmers University of Technology |
|
10:30-12:00, Paper TuAT8-CC.1 | Add to My Program |
Synthesize Efficient Safety Certificates for Learning-Based Safe Control Using Magnitude Regularization |
|
Zheng, Haotian | Tsinghua University |
Ma, Haitong | Harvard University |
Zheng, Sifa | Tsinghua University |
Li, Shengbo Eben | Tsinghua University |
Wang, Jianqiang | Tsinghua University |
Keywords: Robot Safety, Reinforcement Learning, Autonomous Agents
Abstract: Safety certificates based on energy functions can provide demonstrable safety for complex robotic systems. However, all recent studies on learning-based energy function synthesis only consider the feasibility of the control policy, which might cause over-conservativeness and even fail to achieve the control goal. To solve the problem of over-conservative controllers, we proposed the magnitude regularization technique to improve the controller performance of safe controllers by reducing the conservativeness inside the energy function, while keeping the promising provable safety guarantees. Specifically, we quantify the conservativeness by the magnitude of the energy function, and we reduce the conservativeness by adding a magnitude regularization term to the synthesis loss. We propose an algorithm using reinforcement learning (RL) for synthesis to unify the learning process of safe controllers and energy functions. We conducted simulation experiments on Safety Gym and real-robot experiments using small quadrotors. Simulation results show that the proposed algorithm does reduce the conservativeness of the energy function and outperforms baselines in terms of controller performance while maintaining safety. Real-robot experiments have shown that the proposed algorithm indeed reduce conservativeness on the small quadrotors.
|
|
10:30-12:00, Paper TuAT8-CC.2 | Add to My Program |
On the Optimality, Stability, and Feasibility of Control Barrier Functions: An Adaptive Learning-Based Approach |
|
Chriat, Alaa Eddine | Mississippi State University |
Sun, Chuangchuang | Mississippi State University |
Keywords: Robot Safety, Reinforcement Learning, Optimization and Optimal Control
Abstract: Safety has been a critical issue for the deployment of learning-based approaches in real-world applications. To address this issue, control barrier function (CBF) and its variants have attracted extensive attention for safety-critical control. However, due to the myopic one-step nature of CBF and the lack of principled methods to design the class-mathcal{K} functions, there are still fundamental limitations of current CBFs: optimality, stability, and feasibility. In this paper, we proposed a novel and unified approach to address these limitations with Adaptive Multi-step Control Barrier Function (AM-CBF), where we parameterize the class-K function by a neural network and train it together with the reinforcement learning policy. Moreover, to mitigate the myopic nature, we propose a novel multi-step training and single-step execution paradigm to make CBF farsighted while the execution remains solving a single-step convex quadratic program. Our method is evaluated on the first and second-order systems in various scenarios, where our approach outperforms the conventional CBF both qualitatively and quantitatively.
|
|
10:30-12:00, Paper TuAT8-CC.3 | Add to My Program |
Learning Failure Prevention Skills for Safe Robot Manipulation |
|
Ak, Abdullah Cihan | Istanbul Technical University |
Aksoy, Eren Erdal | Halmstad University |
Sariel, Sanem | Istanbul Technical University |
Keywords: Robot Safety, Reinforcement Learning, Robust/Adaptive Control
Abstract: Robots are more capable of achieving manipulation tasks for everyday activities than before. But the safety of manipulation skills that robots employ is still an open problem. Considering all possible failures during skill learning increases the complexity of the process and restrains learning an optimal policy. Nonetheless, safety-focused modularity in the acquisition of skills has not been adequately addressed in previous works. For that purpose, we reformulate skills as base and failure prevention skills, where base skills aim at completing tasks and failure prevention skills aim at reducing the risk of failures to occur. Then, we propose a modular and hierarchical method for safe robot manipulation by augmenting base skills by learning failure prevention skills with reinforcement learning and forming a skill library to address different safety risks. Furthermore, a skill selection policy that considers estimated risks is used for the robot to select the best control policy for safe manipulation. Our experiments show that the proposed method achieves the given goal while ensuring safety by preventing failures. We also show that with the proposed method, skill learning is feasible and our safe manipulation tools can be transferred to the real environment.
|
|
10:30-12:00, Paper TuAT8-CC.4 | Add to My Program |
GG-LLM: Geometrically Grounding Large Language Models for Zero-Shot Human Activity Forecasting in Human-Aware Task Planning |
|
Graule, Moritz A. | Harvard University |
Isler, Volkan | University of Minnesota |
Keywords: Perception-Action Coupling, Deep Learning Methods
Abstract: A robot in a human-centric environment needs to account for the human’s intent and future motion in its task and motion planning to ensure safe and effective operation. This requires symbolic reasoning about probable future actions and the ability to tie these actions to specific locations in the physical environment. While one can train behavioral models capable of predicting human motion from past activities, this approach requires large amounts of data to achieve acceptable long-horizon predictions. More importantly, the resulting models are constrained to specific data formats and modalities. Moreover, connecting predictions from such models to the environment at hand to ensure the applicability of these predictions is an unsolved problem. We present a system that utilizes a Large Language Model (LLM) to infer a human’s next actions from a range of modalities without fine-tuning. A novel aspect of our system that is critical to robotics applications is that it links the predicted actions to specific locations in a semantic map of the environment. Our method leverages the fact that LLMs, trained on a vast corpus of text describing typical human behaviors, encode substantial world knowledge, including probable sequences of human actions and activities. We demonstrate how these localized activity predictions can be incorporated in a human-aware task planner for an assistive robot to reduce the occurrences of undesirable human-robot interactions by 29.2% on average.
|
|
10:30-12:00, Paper TuAT8-CC.5 | Add to My Program |
Modality Attention for Prediction-Based Robot Motion Generation: Improving Interpretability and Robustness of Using Multi-Modality |
|
Ichiwara, Hideyuki | Hitachi, Ltd. / Waseda University |
Ito, Hiroshi | Hitachi, Ltd |
Yamamoto, Kenjiro | Hitachi, Ltd |
Mori, Hiroki | Waseda University |
Ogata, Tetsuya | Waseda University |
Keywords: Learning from Experience, Learning from Demonstration, Sensorimotor Learning
Abstract: We developed a modality attention motion generation model on the basis of multi-modality prediction. This model provides interpretability about modality usage and demonstrates robustness against disturbances. We used a hierarchical model consisting of low-level recurrent neural networks (RNNs) for processing each modality individually and a high-level RNN that integrates the multi-modality. This integration is achieved by efficiently gating multi-modality and inputting it to the high-level RNN. We verified the interpretability and robustness of the task of inserting a furniture part, which consists of the ``approach" phase to bring the wooden dowel closer to the hole and the ``insertion" phase. While the proposed model achieves the same task success rate as the conventional model, it clarifies that it refers to vision during ``approach" and force during ``insertion," providing interpretability regarding modality use. Furthermore, in contrast to the non-modality attention model, whose task success rate drops significantly under disturbance, the proposed model enhances robustness against disturbances to modalities it does not direct attention during the task, resulting in a consistently high success rate (≒90%).
|
|
10:30-12:00, Paper TuAT8-CC.6 | Add to My Program |
Adaptive Whole-Body Robotic Tool-Use Learning on Low-Rigidity Plastic-Made Humanoids Using Vision and Tactile Sensors |
|
Kawaharazuka, Kento | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Learning from Experience, Modeling, Control, and Learning for Soft Robots, AI-Based Methods
Abstract: Various robots have been developed so far; however, we face challenges in modeling the low-rigidity bodies of some robots. In particular, the deflection of the body changes during tool-use due to object grasping, resulting in significant shifts in the tool-tip position and the body's center of gravity. Moreover, this deflection varies depending on the weight and length of the tool, making these models exceptionally complex. However, there is currently no control or learning method that takes all of these effects into account. In this study, we propose a method for constructing a neural network that describes the mutual relationship among joint angle, visual information, and tactile information from the feet. We aim to train this network using the actual robot data and utilize it for tool-tip control. Additionally, we employ Parametric Bias to capture changes in this mutual relationship caused by variations in the weight and length of tools, enabling us to understand the characteristics of the grasped tool from the current sensor information. We apply this approach to the whole-body tool-use on KXR, a low-rigidity plastic-made humanoid robot, to validate its effectiveness.
|
|
10:30-12:00, Paper TuAT8-CC.7 | Add to My Program |
Generating and Transferring Priors for Causal Bayesian Network Parameter Estimation in Robotic Tasks |
|
Diehl, Maximilian | Chalmers University of Technology |
Ramirez-Amaro, Karinne | Chalmers University of Technology |
Keywords: Learning from Experience, Probability and Statistical Methods, Transfer Learning
Abstract: Robots acting in human environments will often face new situations and can benefit from transferring prior experience. Priors could enable robots to handle new tasks zero-shot and help prevent failures, which can be particularly costly in real robot applications. Due to their interpretable nature, causal Bayesian Networks (CBN) are popular for modeling cause-effect relations between semantically meaningful environment features and their effects on action success. While the CBN structure is often intuitively transferable to a new context, its probability distribution might change, requiring data-intensive relearning. In this work, we propose three strategies that utilize semantic similarity and relatedness between the variables of two CBNs to generate and transfer informed CBN distribution priors. We evaluate the parameter prior accuracy in five different transfer scenarios, including sim-2-real, transferring parameters to more complex tasks with a larger number of parameters and even between two different tasks, which is particularly challenging. We show that the priors lead to better distribution estimates, particularly under a limited amount of new experiments, and improve the robot’s ability to predict and prevent action failures by up to 50%.
|
|
10:30-12:00, Paper TuAT8-CC.8 | Add to My Program |
Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing |
|
Tjanaka, Bryon | University of Southern California |
Fontaine, Matthew | University of Southern California |
Lee, David H. | University of Southern California |
Kalkar, Aniruddha | University of Southern California |
Nikolaidis, Stefanos | University of Southern California |
Keywords: Evolutionary Robotics, Reinforcement Learning
Abstract: Pre-training a diverse set of neural network controllers in simulation has enabled robots to adapt online to damage in robot locomotion tasks. However, finding diverse, high-performing controllers requires expensive network training and extensive tuning of a large number of hyperparameters. On the other hand, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), an evolution strategies (ES)-based quality diversity algorithm, does not have these limitations and has achieved state-of-the-art performance on standard QD benchmarks. However, CMA-MAE cannot scale to modern neural network controllers due to its quadratic complexity. We leverage efficient approximation methods in ES to propose three new CMA-MAE variants that scale to high dimensions. Our experiments show that the variants outperform ES-based baselines in benchmark robotic locomotion tasks, while being comparable with or exceeding state-of-the-art deep reinforcement learning-based quality diversity algorithms.
|
|
10:30-12:00, Paper TuAT8-CC.9 | Add to My Program |
Robotic Constrained Imitation Learning for the Peg Transfer Task in Fundamentals of Laparoscopic Surgery |
|
Kawaharazuka, Kento | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Learning from Demonstration, Surgical Robotics: Laparoscopy, Learning from Experience
Abstract: In this study, we present an implementation strategy for a robot that performs peg transfer tasks in Fundamentals of Laparoscopic Surgery (FLS) via imitation learning, aimed at the development of an autonomous robot for laparoscopic surgery. Robotic laparoscopic surgery presents two main challenges: (1) the need to manipulate forceps using ports established on the body surface as fulcrums, and (2) difficulty in perceiving depth information when working with a monocular camera that displays its images on a monitor. Especially, regarding issue (2), most prior research has assumed the availability of depth images or models of a target to be operated on. Therefore, in this study, we achieve more accurate imitation learning with only monocular images by extracting motion constraints from one exemplary motion of skilled operators, collecting data based on these constraints, and conducting imitation learning based on the collected data. We implemented an overall system using two Franka Emika Panda Robot Arms and validated its effectiveness.
|
|
TuAT9-CC Oral Session, CC-419 |
Add to My Program |
Datasets for Robot Learning |
|
|
Chair: Zeng, Long | Tsinghua University |
Co-Chair: Caesar, Holger | Delft University of Technology |
|
10:30-12:00, Paper TuAT9-CC.1 | Add to My Program |
Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding |
|
Tang, Yifan | Tsinghua University |
Tai, Cong | Tsinghua University |
Chen, Fang-xing | Tsinghua Shenzhen International Graduate School |
Zhang, Wanting | Tsinghua University |
Zhang, Tao | Pudu Technology Ltd |
Liu, Xueping | Tsinghua University |
Liu, Yong-Jin | Tsinghua University |
Zeng, Long | Tsinghua University |
Keywords: Data Sets for Robot Learning, Service Robotics, Dynamics
Abstract: Most existing robotic datasets capture static scene data and thus are limited in evaluating robots' dynamic performance. To address this, we present a mobile robot oriented large-scale indoor dataset, denoted as THUD (Tsinghua University Dynamic) robotic dataset, for training and evaluating their dynamic scene understanding algorithms. Specifically, the THUD dataset construction is first detailed, including organization, acquisition, and annotation methods. It comprises both real-world and synthetic data, collected with a real robot platform and a physical simulation platform, respectively. Our current dataset includes 13 larges-scale dynamic scenarios, 90K image frames, 20M 2D/3D bounding boxes of static and dynamic objects, camera poses, and IMU. The dataset is still continuously expanding. Then, the performance of mainstream indoor scene understanding tasks, e.g. 3D object detection, semantic segmentation, and robot relocalization, is evaluated on our THUD dataset. These experiments reveal serious challenges for some robot scene understanding tasks in dynamic scenes. By sharing this dataset, we aim to foster and iterate new mobile robot algorithms quickly for robot actual working dynamic environment, i.e. complex crowded dynamic scenes.
|
|
10:30-12:00, Paper TuAT9-CC.2 | Add to My Program |
InteRACT: Transformer Models for Human Intent Prediction Conditioned on Robot Actions |
|
Kedia, Kushal | Cornell University |
Bhardwaj, Atiksh | Cornell University |
Dan, Prithwish | Cornell University |
Choudhury, Sanjiban | Cornell University |
Keywords: Data Sets for Robot Learning, Intention Recognition, Human-Robot Collaboration
Abstract: In collaborative human-robot manipulation, a robot must predict human intents and adapt its actions accordingly to smoothly execute tasks. However, the human's intent in turn depends on actions the robot takes, creating a chicken-or-egg problem. Prior methods ignore such inter-dependency and instead train marginal intent prediction models independent of robot actions. This is because training conditional models is hard given a lack of paired human-robot interaction datasets. Can we instead leverage large-scale human-human interaction data that is more easily accessible? Our key insight is to exploit a correspondence between human and robot actions that enables transfer learning from human-human to human-robot data. We propose a novel architecture, InteRACT, that pre-trains a conditional intent prediction model on large human-human datasets and fine-tunes on a small human-robot dataset. We evaluate on a set of real-world collaborative human-robot manipulation tasks and show that our conditional model improves over various marginal baselines. We also introduce new techniques to tele-operate a 7-DoF robot arm and collect a diverse range of human-robot collaborative manipulation data which we open-source. We release our code and datasets at https://portal-cornell.github.io/interact/.
|
|
10:30-12:00, Paper TuAT9-CC.3 | Add to My Program |
Towards Learning-Based Planning: The nuPlan Benchmark for Real-World Autonomous Driving |
|
Karnchanachari, Napat | Motional |
Geromichalos, Dimitris | Motional |
Tan, Kok Seang | Motional |
Li, Nanxiang | Motional |
Eriksen, Christopher | Motional AD |
Yaghoubi, Shakiba | Motional |
Mehdipour, Noushin | Motional |
Bernasconi, Gianmarco | Motional |
Fong, Whye Kit | Motional |
Guo, Yiluan | Motional |
Caesar, Holger | Delft University of Technology |
Keywords: Data Sets for Robot Learning, Integrated Planning and Learning, Deep Learning Methods
Abstract: Machine Learning (ML) has replaced traditional handcrafted methods for perception and prediction in autonomous vehicles. Yet for the equally important planning task, the adoption of ML-based techniques is slow. We present nuPlan, the world’s first real-world autonomous driving dataset, and benchmark. The benchmark is designed to test the ability of ML-based planners to handle diverse driving situations and to make safe and efficient decisions. To that end, we introduce a new large-scale dataset that consists of 1282 hours of diverse driving scenarios from 4 cities (Las Vegas, Boston, Pittsburgh, and Singapore) and includes high-quality auto-labeled object tracks and traffic light data. We exhaustively mine and taxonomize common & rare driving scenarios which are used during evaluation to get fine-grained insights into the performance and characteristics of a planner. Beyond the dataset, we provide a simulation and evaluation framework that enables a planner’s actions to be simulated in closed-loop to account for interactions with other traffic participants. We present a detailed analysis of numerous baselines and investigate gaps between ML-based and traditional methods. Find the nuPlan dataset and code at nuplan.org.
|
|
10:30-12:00, Paper TuAT9-CC.4 | Add to My Program |
TBD Pedestrian Data Collection: Towards Rich, Portable, and Large-Scale Natural Pedestrian Data |
|
Wang, Allan | Carnegie Mellon University |
Sato, Daisuke | Carnegie Mellon University |
Corzo, Yasser | Carnegie Mellon University |
Simkin, Sonya | Carnegie Mellon University |
Biswas, Abhijat | Carnegie Mellon University |
Steinfeld, Aaron | Carnegie Mellon University |
Keywords: Data Sets for Robot Learning, Human-Aware Motion Planning, Data Sets for Robotic Vision
Abstract: Social navigation and pedestrian behavior research has shifted towards machine learning-based methods and converged on the topic of modeling inter-pedestrian interactions and pedestrian-robot interactions. For this, large-scale datasets that contain rich information are needed. We describe a portable data collection system, coupled with a semi-autonomous labeling pipeline. As part of the pipeline, we designed a label correction web application that facilitates human verification of automated pedestrian tracking outcomes. Our system enables large-scale data collection in diverse environments and fast trajectory label production. Compared with existing pedestrian data collection methods, our system contains three components: a combination of top-down and ego-centric views, natural human behavior in the presence of a socially appropriate "robot", and human-verified labels grounded in the metric space. To the best of our knowledge, no prior data collection system has a combination of all three components. We further introduce our ever-expanding dataset from the ongoing data collection effort -- the TBD Pedestrian Dataset and show that our collected data is larger in scale, contains richer information when compared to prior datasets with human-verified labels, and supports new research opportunities.
|
|
10:30-12:00, Paper TuAT9-CC.5 | Add to My Program |
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics |
|
Sermanet, Pierre | Google |
Ding, Tianli | Google |
Zhao, Jeffrey | Google |
Xia, Fei | Google Inc |
Dwibedi, Debidatta | Google |
Gopalakrishnan, Keerthana | Google |
Chan, Christine | Google LLC |
Dulac-Arnold, Gabriel | Google |
Maddineni, Sharath | Google |
Joshi, Nikhil J | Google |
Florence, Peter | MIT |
Han, Wei | Google |
Robert, Baruch | Google.com |
Lu, Yao | Google |
Mirchandani, Suvir | Google |
Xu, Peng | Google |
Sanketi, Pannag | Google |
Hausman, Karol | Google Brain |
Shafran, Izhak | Google |
Ichter, Brian | Google Brain |
Cao, Yuan | Google |
Keywords: Data Sets for Robot Learning, Data Sets for Robotic Vision, Learning Categories and Concepts
Abstract: We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple embodiments (robot, human, human with grasping tool). With this data, we show that models trained on all embodiments perform better than ones trained on the robot data only, even when evaluated solely on robot episodes. We explore the economics of collection costs and find that for a fixed budget it is beneficial to take advantage of the cheaper human collection along with robot collection. We release a large and highly diverse (29,520 unique instructions) dataset dubbed RoboVQA containing 829,502 (video, text) pairs for robotics-focused visual question answering. We also demonstrate how evaluating real robot experiments with an intervention mechanism enables performing tasks to completion, making it deployable with human oversight even if imperfect while also providing a single performance metric. We demonstrate a single video-conditioned model named RoboVQA-VideoCoCa trained on our dataset that is capable of performing a variety of grounded high-level reasoning tasks in broad realistic settings with a cognitive intervention rate 46% lower than the zero-shot state of the art visual language model (VLM) baseline and is able to guide real robots through long-horizon tasks. The performance gap with zero-shot state-of-the-art models indicates that a lot of grounded data remains to be collected for real-world deployment, emphasizing the critical need for scalable data collection approaches. Finally, we show that video VLMs significantly outperform single-image VLMs with an average error rate reduction of 19% across all VQA tasks. Data and videos are available at https://robovqa.github.io
|
|
10:30-12:00, Paper TuAT9-CC.6 | Add to My Program |
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot |
|
Fang, Hao-Shu | Shanghai Jiao Tong University |
Fang, Hongjie | Shanghai Jiao Tong University |
Tang, Zhenyu | Shanghai Jiao Tong University |
Liu, Jirong | Shanghai Jiaotong University |
Wang, Chenxi | Shanghai Jiao Tong University |
Wang, Junbo | Shanghai Jiao Tong University |
Zhu, Haoyi | University of Science and Technology of China |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Data Sets for Robot Learning, Big Data in Robotics and Automation, Imitation Learning
Abstract: A key challenge for robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent progress in one-shot imitation learning and robotic foundation models have shown promise in transferring trained policies to new tasks based on demonstrations. This feature is attractive for enabling robots to acquire new skills and improve their manipulative ability. However, due to limitations in the training dataset, the current focus of the community has mainly been on simple cases, such as push or pick-place tasks, relying solely on visual guidance. In reality, there are many complex skills, some of which may even require both visual and tactile perception to solve. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception. To achieve this, we have collected a dataset comprising over 110,000 contact-rich robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected in the real world. Each sequence in the dataset includes visual, force, audio, and action information. Moreover, we also provide a corresponding human demonstration video and a language description for each robot sequence. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. The dataset is made publicly available on our website: https://rh20t.github.io.
|
|
10:30-12:00, Paper TuAT9-CC.7 | Add to My Program |
SACSoN: Scalable Autonomous Control for Social Navigation |
|
Hirose, Noriaki | UC Berkeley / TOYOTA Motor North America |
Shah, Dhruv | University of California, Berkeley |
Sridhar, Ajay | University of California, Berkeley |
Levine, Sergey | UC Berkeley |
Keywords: Data Sets for Robot Learning, Machine Learning for Robot Control
Abstract: Machine learning provides a powerful tool for building socially compliant robotic systems that go beyond simple predictive models of human behavior. By observing and understanding human interactions from past experiences, learning can enable effective social navigation behaviors directly from data. In this paper, our goal is to develop methods for training policies for socially unobtrusive behavior, such that robots can navigate among humans in ways that don't disturb human behavior in visual navigation using only onboard RGB observations. We introduce a definition for such behavior based on the counterfactual perturbation of the human: if the robot had not intruded into the space, would the human have acted in the same way? By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space. Instantiating this principle requires training policies to minimize their effect on human behavior, and this in turn requires data that allows us to model the behavior of humans in the presence of robots. Therefore, our approach is based on two key contributions. First, we collect a large dataset where an indoor mobile robot interacts with human bystanders. Second, we utilize this dataset to train policies that minimize counterfactual perturbation. We provide supplementary videos and make publicly available the visual navigation dataset on our project page.
|
|
TuAT10-CC Oral Session, CC-501 |
Add to My Program |
Soft Robot Materials and Design I |
|
|
Chair: Howard, Matthew | King's College London |
Co-Chair: Li, Shuguang | Tsinghua University |
|
10:30-12:00, Paper TuAT10-CC.1 | Add to My Program |
Lightweight Untethered Soft Robotic Fish |
|
Wang, Xiangxing | Beijing Jiaotong University, School of Electronics and Informati |
Pei, Xuan | Beijing Jiaotong University |
Wang, Xinyang | Beihang University,School of Mechanical Engineering and A |
Hou, Taogang | Beijing Jiaotong University |
Keywords: Soft Robot Materials and Design, Biologically-Inspired Robots
Abstract: Aquatic organisms, due to soft body structure and high agility, have inspired many biomimetic robots. However, considering the issues of insulation and waterproofing, as well as the driving module of soft materials, their control systems are usually larger and heavier. Therefore, small underwater robots often tethered, i.e., it cannot integrate energy and control systems onto the body, which greatly limited in its working range and activity mode. This paper presents a small untethered bionic manta ray. The robotic fish is driven by dielectric elastomer actuators (DEA), which controls the double wing structure on both sides by the central muscle part to simulate the process of the manta ray’s lateral fins fanning to propel itself forward. And the flexible printed circuit board (FPC) constitutes the body of the fish and is also an independent energy control system. The electronic components are evenly distributed on the double-wing structure of the robotic fish to realize the integration of the energy control system. This circuit system can be powered by a small lithium battery and output a periodic voltage to drive the motion of the robotic fish. The masses of our tethered and untethered fish are 1.9g and 5.1g respectively. The swimming speed of these two types of fish can reach 42.5mm/s and 17.0 mm/s. And this design principle can be extended to the research and design of various flexible devices and soft robots.
|
|
10:30-12:00, Paper TuAT10-CC.2 | Add to My Program |
Optimal Design of Flexible-Link Mechanisms with Desired Load-Displacement Profiles |
|
Maloisel, Guirec | Disney Research |
Knoop, Espen | The Walt Disney Company |
Schumacher, Christian | Disney Research |
Thomaszewski, Bernhard | Université De Montréal |
Bächer, Moritz | Disney Research |
Coros, Stelian | ETH Zurich |
Keywords: Soft Robot Materials and Design, Compliant Joints and Mechanisms, Optimization and Optimal Control
Abstract: Robot mechanisms that exploit compliance can perform complex tasks under uncertainty using simple control strategies, but it remains difficult to design mechanisms with a desired embodied intelligence. In this article, we propose an automated design technique that optimizes the desired load-displacement behavior of planar flexible-link mechanisms. To do so, we replace a subset of rigid with flexible links in an existing mechanism, and optimize their rest shape. We demonstrate the efficacy of our approach on a set of examples, including two fabricated prototypes, illustrating applications for grasping and locomotion tasks.
|
|
10:30-12:00, Paper TuAT10-CC.3 | Add to My Program |
A Scalable, Light-Controlled, Individually Addressable, Non-Metal Actuator Array |
|
Paul, Sophie | University of California, Santa Barbara |
Devlin, Matthew | University of California, Santa Barbara |
Hawkes, Elliot Wright | University of California, Santa Barbara |
Keywords: Soft Robot Materials and Design, Flexible Robotics
Abstract: Research in the area of photo-actuation is growing rapidly, yet there are few examples of photo-actuators with practical use cases. One potential application is for the control of intelligent electromagnetic surfaces, or two-dimensional ar- rays that could shape and control an incident electromagnetic field in ideally any manner. A promising concept to realize such a surface leverages signal refraction via antenna edges, but requires non-metal actuation, large antenna rotations, and high antenna angular accuracy for long periods of time. Here, we present a nonmetal, light-controlled, multi-position inchworm actuator array that can rotate an antenna 88 degrees in incre- mental steps of less than 3.4 degrees with zero-power shape- persistence. The design is modular and rapidly manufacturable via a layered laser-cutting technique, such that the actuator can be tiled into an array to control the rotation of many antennas. We control the array with a single focused IR light that rasters across the actuators to precisely control all antenna positions. We characterize the response time, accuracy, and repeatability of a single actuator, and demonstrate the array achieving diverse antenna configurations. This work advances the precision and scalability of photothermal actuation not only for use in intelligent electromagnetic surfaces but for any application benefitting from light-controlled actuation.
|
|
10:30-12:00, Paper TuAT10-CC.4 | Add to My Program |
A Passively Bendable, Compliant Tactile Palm with RObotic Modular Endoskeleton Optical (ROMEO) Fingers |
|
Liu, Sandra Q. | Massachusetts Institute of Technology |
Adelson, Edward | MIT |
Keywords: Soft Robot Materials and Design, Force and Tactile Sensing, Mechanism Design
Abstract: Many robotic hands currently rely on extremely dexterous robotic fingers and a thumb joint to envelop themselves around an object. Few hands focus on the palm even though human hands greatly benefit from their central fold and soft surface. As such, we develop a novel structurally compliant soft palm, which enables more surface area contact for the objects that are pressed into it. Moreover, this design, along with the development of a new low-cost, flexible illumination system, is able to incorporate a high-resolution tactile sensing system inspired by the GelSight sensors. Concurrently, we design RObotic Modular Endoskeleton Optical (ROMEO) fingers, which are underactuated two-segment soft fingers that are able to house the new illumination system, and we integrate them into these various palm configurations. The resulting robotic hand is slightly bigger than a baseball and represents one of the first soft robotic hands with actuated fingers and a passively compliant palm, all of which have high-resolution tactile sensing. This design also potentially helps researchers discover and explore more soft-rigid tactile robotic hand designs with greater capabilities in the future.
|
|
10:30-12:00, Paper TuAT10-CC.5 | Add to My Program |
Shape-Conformable Suction Cups with Controllable Adaptive Suction on Complex Surfaces |
|
Yue, Tianqi | University of Bristol |
Bloomfield-Gadelha, Hermes | University of Bristol, UK |
Rossiter, Jonathan | University of Bristol |
Keywords: Soft Robot Materials and Design, Grippers and Other End-Effectors, Biomimetics
Abstract: Suction is widely used in industry, but the adaptation of state-of-the-art suction cups on complex surfaces (i.e., curved, cornered, uneven, rough, etc.) are still limited. In this paper, we present a novel shape-conformable suction mechanism to achieve highly-adaptive suction on complex surfaces. The shape-conformable adaptive suction is obtained by squeezing a soft multi-layer structure on the substrate, to form a shape-to-roughness sealed suction region. Based on this mechanism, two shape-conformable suction cups (SCSCs) - a displacement-driven shape-conformable suction cup (SDisp) and a force-driven shape-conformable suction cup (SForce) - are designed. They both achieve highly-adaptive suction on challenging surface topographies including highly-curved, cornered, textured, uneven and tilted surfaces. Particularly, SDisp has better adaptation (e.g., on a 90-degree corner and a balloon) and SForce is more lightweight (26 g) and compact (46×35 mm), and exhibits quicker suction response (0.4 s). We analyse the underlying adaptive suction mechanism by the physical model, and demonstrate its adaptive suction capability by qualitatively comparing it with previous suction cups. We finally conclude design principles for improving suction adaptation. We believe the proposed shape-conformable suction mechanism provides a novel solution to realize adaptive suction on complex surfaces in next-generation robotic gripping, anchoring and manipulation.
|
|
10:30-12:00, Paper TuAT10-CC.6 | Add to My Program |
A Phase-Change Emulsion Jamming Gripper for Manipulation of Micro-Scale Textured Surfaces |
|
Keller, Alexander | University of Bristol |
Yue, Tianqi | University of Bristol |
Qi, Qiukai | University of Bristol |
Conn, Andrew | University of Bristol |
Rossiter, Jonathan | University of Bristol |
Keywords: Soft Robot Materials and Design, Grippers and Other End-Effectors, Soft Robot Applications
Abstract: The inherent elasticity of soft materials can be used to create robotic grippers that deform and comply to a variety of irregular shapes. To date, several soft adaptive grasping strategies have been reported, however, most of them focus on adapting to the overall shape of the structure, while the adaptive grasping of small surface asperities is overlooked. In this paper, we propose a novel method to achieve adaptive grasping on surface asperities with a smart shape-memory silicone sponge. Heating above 60◦C makes the sponge soft and deformable to allow it to penetrate within surface asperities via a pressure normal to the surface. Cooling down below 60◦C makes the sponge “jam” to retain its deformed shape. The interlocking force between the jammed sponge and the asperities, and the increased area of contact, allows for adaptive grasping on asperities down to 0.4 mm with an adhesive force of up to 27.7 N in a 40 × 40 mm contacting area. We introduce the design, working principle, fabrication, and optimization of a robotic gripper based on this shape-memory silicone sponge. This sponge-jamming gripper shows great potential for developing next-generation robotic grippers for the manipulation of textured and discontinuous surfaces.
|
|
10:30-12:00, Paper TuAT10-CC.7 | Add to My Program |
Design and Fabrication of String-Driven Origami Robots |
|
Yang, Peiwen | Tsinghua University |
Li, Shuguang | Tsinghua University / MIT / Harvard University |
Keywords: Soft Robot Materials and Design, Mechanism Design, Additive Manufacturing
Abstract: Origami designs and structures have been widely used in many fields, such as morphing structures, robotics, and metamaterials. However, the design and fabrication of origami structures rely on human experiences and skills, which are both time and labor-consuming. In this paper, we present a rapid design and fabrication method for string-driven origami structures and robots. We developed an origami design software to generate desired crease patterns based on analytical models and Evolution Strategies (ES). Additionally, the software can automatically produce 3D models of origami designs. We then used a dual-material 3D printer to fabricate those wrapping-based origami structures with the required mechanical properties. We utilized Twisted String Actuators (TSAs) to fold the target 3D structures from flat plates. To demonstrate the capability of these techniques, we built and tested an origami crawling robot and an origami robotic arm using 3D-printed origami structures driven by TSAs.
|
|
10:30-12:00, Paper TuAT10-CC.8 | Add to My Program |
Design and Implementation of a Ferrofluid-Based Liquid Robot for Small-Scale Manipulation |
|
Kong, Fanxing | Harbin Institute of Technology |
Zhao, Jie | Harbin Institute of Technology |
Cai, Hegao | Harbin Institute of Technology |
Zhu, Yanhe | Harbin Institute of Technology |
Keywords: Soft Robot Materials and Design, Micro/Nano Robots, Soft Robot Applications
Abstract: Magnetic manipulation of miniature soft or liquid robots capable of deformation has gained increasing attention and is demonstrating great potential in small-scale applications, such as drug delivery, minimal invasive surgery, and manipulation of delicate objects. In this study, we introduce a liquid robot composed of ferrofluid that shows promise for small-scale magnetic manipulation applications. The objective of this work is to achieve more flexible manipulation capabilities of the robot. To this end, we utilize a redundant magnetic actuation system composed of five electromagnets and implement 4 degrees of freedom (4-DOF) control of the liquid robot in planar space. Based on the planar 4-DOF control, the liquid robot is able to perform various actions and implement versatile manipulation tasks, such as transporting objects, separating or assembling miniature parts, and operating customized tools. Furthermore, we suggest an automatic transportation method to enhance manipulation precision. A series of experiments are conducted to validate the effectiveness of the proposed method and the robot's capacity to accomplish diversified manipulation tasks. The proposed liquid robot indicates flexibility and provides novel solutions for small-scale untethered manipulation.
|
|
10:30-12:00, Paper TuAT10-CC.9 | Add to My Program |
Towards Optimal Design of Dielectric Elastomer Actuators Using a Graph Neural Network Encoder |
|
Li, Yangfan | Institute of High Performance Computing, A*Star |
Liu, Jun | Institute of High Performance Computing |
Liang, Wenyu | Institute for Infocomm Research, A*STAR |
Liu, ZhuangJian | Institute of High Performance Computing |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Optimization and Optimal Control
Abstract: Dielectric elastomer actuators (DEAs), a type of "artificial muscles", can generate significant deformations and offer speedy responses when exposed to voltage. Owing to their high electromechanical conversion efficiency and great flexibility, they have been extensively used in soft robot applications, such as soft grippers, walking robots, crawling robots, climbing robots, swimming robots, etc. Although previous research has explored the use of DEAs in soft robot locomotion, achieving optimal behavior is challenging due to the complexity of the constituent materials and the highly nonlinear nature of the problem. In this study, a simulation-based design optimization approach is proposed to address this challenge. The proposed approach involves developing a computational modeling framework that evaluates the electromechanical behavior of the DEA. A graph neural network (GNN) is employed as an encoder to extract the latent representation of the geometry in a low dimensional space, which is further used to construct a surrogate model for fast prediction of target responses. To achieve an optimal actuation capability under design constraints, a multi-objective optimization function is formulated to balance the actuation distance and the actuator size, where the Pareto front demonstrates the trade-off between the actuation distance and design constraint. Finally, three optimized designs are fabricated and tested, demonstrating a performance improvement of over 140% compared to an
|
|
TuAT11-CC Oral Session, CC-502 |
Add to My Program |
Deep Learning for Visual Perception I |
|
|
Chair: Shibata, Tomohiro | Kyushu Institute of Technology |
Co-Chair: Oishi, Takeshi | The University of Tokyo |
|
10:30-12:00, Paper TuAT11-CC.1 | Add to My Program |
Multi-Confidence Guided Source-Free Domain Adaption Method for Point Cloud Primitive Segmentation |
|
Wang, Shaohu | Institute of Automation, Chinese Academy of Sciences |
Tong, Yuchuang | The Institute of Automation of the Chinese Academy of Sciences |
Shang, Xiuqin | Institute of Automation, Chinese Academy of Sciences |
Zhang, Zhengtao | Institute of Automation, Chinese Academy of Sciences |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, RGB-D Perception
Abstract: Point cloud primitive segmentation aims to segment the surface point cloud into various geometric types of primitives, which plays a vital role in robot operation and industrial automation. However, differences in object structures and shapes across industrial datasets create domain shift issues, compounded by privacy concerns preventing dataset sharing. To address these challenges, we propose a novel source-free domain adaptation method for point cloud primitive segmentation, which follows the popular pseudo-label based self-training framework. Unlike previous works using single-model uncertainty to refine pseudo labels, our method leverages multi-confidence, including transformation consistency, task confidence, and geometric saliency to provide more informative guidance. Specifically, the transformation consistency is first utilized to vote pseudo-labels and task confidences. Furthermore, to filter out high-confident noises and obtain more reliable pseudo-labels, we investigate the geometric curvature properties of primitives and propose a geometric saliency guided dynamic prototype matching and label graph aggregation strategies for pseudo-label reassignment with different task confidence. For this novel task, we construct several datasets and verify the effectiveness of the proposed methods through a series of experiments.
|
|
10:30-12:00, Paper TuAT11-CC.2 | Add to My Program |
FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization |
|
Ma, Nan | Beijing University of Technology, Beijing, China |
Wang, Mohan | Beijing University of Technology |
Han, Yiheng | Beijing University of Technology |
Liu, Yong-Jin | Tsinghua University |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Localization
Abstract: Cross-modality point cloud registration is confronted with significant challenges due to inherent differences in modalities between sensors. To deal with this problem, we propose FF-LOGO: a cross-modality point cloud registration framework with Feature Filtering and LOcal-Global Optimization. The cross-modality feature correlation filtering module extracts geometric transformation-invariant features from cross-modality point clouds and achieves point selection by feature matching. We also introduce a cross-modality optimization process, including a local adaptive key region aggregation module and a global modality consistency fusion optimization module. Experimental results demonstrate that our two-stage optimization significantly improves the registration accuracy of the feature association and selection module. Our method achieves a substantial increase in recall rate compared to the current state-of-the-art methods on the 3DCSR dataset, improving from 40.59% to 75.74%. Our code will be available at https://github.com/wangmohan17/FFLOGO.
|
|
10:30-12:00, Paper TuAT11-CC.3 | Add to My Program |
CAPT: Category-Level Articulation Estimation from a Single Point Cloud Using Transformer |
|
Fu, Lian | The University of Tokyo |
Ishikawa, Ryoichi | The University of Tokyo |
Sato, Yoshihiro | Kyoto University of Advanced Science |
Oishi, Takeshi | The University of Tokyo |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Recognition
Abstract: The ability to estimate joint parameters is essential for various applications in robotics and computer vision. In this paper, we propose CAPT: category-level articulation estimation from a point cloud using Transformer. CAPT uses an end-to-end transformer-based architecture for joint parameter and state estimation of articulated objects from a single point cloud. The proposed CAPT methods accurately estimate joint parameters and states for various articulated objects with high precision and robustness. The paper also introduces a motion loss approach, which improves articulation estimation performance by emphasizing the dynamic features of articulated objects. Additionally, the paper presents a double voting strategy to provide the framework with coarse-to-fine parameter estimation. Experimental results on several category datasets demonstrate that our methods outperform existing alternatives for articulation estimation. Our research provides a promising solution for applying Transformer-based architectures in articulated object analysis.
|
|
10:30-12:00, Paper TuAT11-CC.4 | Add to My Program |
Energy-Based Detection of Adverse Weather Effects in LiDAR Data |
|
Piroli, Aldi | Universität Ulm |
Dallabetta, Vinzenz | BMW Group |
Kopp, Johannes | Ulm University |
Walessa, Marc | BMW Group |
Meissner, Daniel | BMW Group |
Dietmayer, Klaus | University of Ulm |
Keywords: Deep Learning for Visual Perception, Semantic Scene Understanding, Recognition
Abstract: Autonomous vehicles rely on LiDAR sensors to perceive the environment. Adverse weather conditions like rain, snow, and fog negatively affect these sensors, reducing their reliability by introducing unwanted noise in the measurements. In this work, we tackle this problem by proposing a novel approach for detecting adverse weather effects in LiDAR data. We reformulate this problem as an outlier detection task and use an energy-based framework to detect outliers in point clouds. More specifically, our method learns to associate low energy scores with inlier points and high energy scores with outliers allowing for robust detection of adverse weather effects. In extensive experiments, we show that our method performs better in adverse weather detection and has higher robustness to unseen weather effects than previous state-of-the-art methods. Furthermore, we show how our method can be used to perform simultaneous outlier detection and semantic segmentation. Finally, to help expand the research field of LiDAR perception in adverse weather, we release the SemanticSpray dataset, which contains labeled vehicle spray data in highway-like scenarios. The dataset will be available upon acceptance (see supplementary material for sample).
|
|
10:30-12:00, Paper TuAT11-CC.5 | Add to My Program |
EdgePoint: Efficient Point Detection and Compact Description Via Distillation |
|
Yao, Haodi | Harbin Institute of Technology |
Hao, Ning | Harbin Institute of Technology |
Xie, Chen | Harbin Institute of Technology |
He, Fenghua | Harbin Institute of Technology |
Keywords: Deep Learning for Visual Perception, Vision-Based Navigation
Abstract: Efficient interest point detection and description in images play a crucial role in many tasks such as multi-robot SLAM and collaborative localization. To facilitate fast detection and generate compact descriptions on edge devices, we introduce EdgePoint, a lightweight neural network. We design a new detection loss UnfoldSoftmax to improve inference speed. Futhermore, we propose Ortho-Alignment loss combined with LocalPCA compression to learn compact 32-dimensional descriptors. To enable efficient storage or communication, we also quantize the generated descriptors into integral values. We perform EdgePoint on various datasets, and show that it surpasses SuperPoint in performance while utilizing only 1% of the parameters and achieving up to more than 10 times faster inference speed. By applying descriptor quantization, the requirements for storage and communication can be reduced by up to 97% without performance decreasing.
|
|
10:30-12:00, Paper TuAT11-CC.6 | Add to My Program |
Fast and Robust Point Cloud Registration with Tree-Based Transformer |
|
Chen, Guangyan | Beijing Institute of Technology |
Wang, Meiling | Beijing Institute of Technology |
Yang, Yi | Beijing Institute of Technology |
Yuan, Li | Peking University |
Yue, Yufeng | Beijing Institute of Technology |
Keywords: Deep Learning for Visual Perception, Visual Learning
Abstract: Point cloud registration is essential in computer vision and robotics. Recently, transformer-based methods have achieved advanced point cloud registration performance. However, the standard attention mechanism utilized in these methods considers many low-relevance points, and it has difficulty focusing its attention weights on sparse and meaningful points, leading to limited local structure modeling capabilities and quadratic computational complexity. To address these limitations, we present the Tree-based Transformer (TrT), which is able to extract abundant local and global features with linear computational complexity. Specifically, the TrT builds coarse-to-dense feature trees, and a novel Tree-based Attention (TrA) is proposed to guide the progressive convergence of the attended regions toward meaningful points and to structurize point clouds following tree structures. In each layer, the top S key points with the highest attention scores are selected, such that in the next layer, attention is evaluated only within the specified high-relevance regions, corresponding to the child points of these selected S points. Additionally, coarse features containing high-level semantic information are incorporated into the child points to guide the feature extraction process, facilitating local structure modeling and multiscale information integration. Consequently, TrA enables the model to focus on critical local structures and extract rich local information with linear computational complexity. Experiments demonstrate that our method achieves state-of-the-art performance on 3DMatch and KITTI benchmarks. The code for our method is publicly available at https://github.com/CGuangyan-BIT/TrT.
|
|
TuAT12-CC Oral Session, CC-503 |
Add to My Program |
Deep Learning in Grasping and Manipulation I |
|
|
Chair: Kan, Zhen | University of Science and Technology of China |
Co-Chair: Qureshi, Ahmed H. | Purdue University |
|
10:30-12:00, Paper TuAT12-CC.1 | Add to My Program |
Uncertainty-Driven Exploration Strategies for Online Grasp Learning |
|
Shi, Yitian | Bosch Center for Artificial Intelligence |
Schillinger, Philipp | Bosch Center for Artificial Intelligence |
Gabriel, Miroslav | Bosch Center for Artificial Intelligence |
Qualmann, Alexander | Robert Bosch GmbH, Corporate Sector Research and Advance Enginee |
Ziesche, Hanna | Bosch BCAI |
Feldman, Zohar | Bosch |
Anh Vien, Ngo | Bosch GmbH |
Keywords: Deep Learning in Grasping and Manipulation
Abstract: Existing grasp prediction approaches are mostly based on offline learning, while, ignoring the exploratory grasp learning during online adaptation to new picking scenarios, i.e., objects that are unseen or out-of-domain (OOD), camera and bin settings, etc. In this paper, we present an uncertainty-based approach for online learning of grasp predictions for robotic bin picking. Specifically, the online learning algorithm with an effective exploration strategy can significantly improve its adaptation performance to unseen environment settings. To this end, we first propose to formulate online grasp learning as an RL problem that will allow us to adapt both grasp reward prediction and grasp poses. We propose various uncertainty estimation schemes based on emph{Bayesian uncertainty quantification} and emph{distributional ensembles}. We carry out evaluations on real-world bin picking scenes of varying difficulty. The objects in the bin have various challenging physical and perceptual characteristics that can be characterized by semi- or total transparency, and irregular or curved surfaces. The results of our experiments demonstrate a notable improvement of grasp performance in comparison to conventional online learning methods which incorporate only naive exploration strategies. Video: https://youtu.be/fPKOrjC2QrU
|
|
10:30-12:00, Paper TuAT12-CC.2 | Add to My Program |
Pseudo-Labeling and Contextual Curriculum Learning for Online Grasp Learning in Robotic Bin Picking |
|
Le, Huy | TU Dormund |
Schillinger, Philipp | Bosch Center for Artificial Intelligence |
Gabriel, Miroslav | Bosch Center for Artificial Intelligence |
Qualmann, Alexander | Robert Bosch GmbH, Corporate Sector Research and Advance Enginee |
Anh Vien, Ngo | Bosch GmbH |
Keywords: Deep Learning in Grasping and Manipulation
Abstract: The prevailing grasp prediction methods predominantly rely on offline learning, overlooking the dynamic grasp learning that occurs during real-time adaptation to novel picking scenarios. These scenarios may involve previously unseen objects, variations in camera perspectives, and bin configurations, among other factors. In this paper, we introduce introduce a novel approach, SSL-ConvSAC, that combines semi-supervised learning and reinforcement learning for online grasp learning. By treating pixels with reward feedback as labeled data and others as unlabeled, it efficiently exploits unlabeled data to enhance learning. In addition, we address the imbalance between labeled and unlabeled data by proposing a contextual curriculum-based method. We ablate the proposed approach on real-world evaluation data and demonstrate promise for improving online grasp learning on bin picking tasks using a physical 7-DoF Franka Emika robot arm with a suction gripper.
|
|
10:30-12:00, Paper TuAT12-CC.3 | Add to My Program |
PoseFusion: Multi-Scale Keypoint Correspondence for Monocular Camera-To-Robot Pose Estimation in Robotic Manipulation |
|
Han, Xujun | University of Science and Technology of China |
Wang, Shaochen | University of Science and Technology of China |
Huang, Xiucai | Chongqing University |
Kan, Zhen | University of Science and Technology of China |
Keywords: Deep Learning in Grasping and Manipulation
Abstract: Visual-based robot pose estimation is a fundamen- tal challenge, involving the determination of the camera’s pose with respect to a robot. Conventional methods for camera-to- robot pose calibration rely on fiducial markers to establish keypoint correspondences. However, these approaches exhibit significant variability in accuracy and robustness, particularly in 2D keypoint detection. In this work, we present an end- to-end pose estimation approach that achieves camera-to-robot calibration using monocular images and keypoint information. Our method employs a two-level nested U-shaped architecture, featuring a bottom-level residual U-block to extract richer contextual information from diverse receptive fields to enhance keypoint refinement. By incorporating the perspective-n-point (PnP) algorithm and leveraging 3D robot joint keypoints, we establish correspondence of 3D coordinate points between the robot’s coordinate system and the camera’s coordinate system, facilitating accurate pose estimation. Experimental evaluations encompass real-world and synthetic datasets, demonstrating competitive results across three distinct robot manipulators.
|
|
10:30-12:00, Paper TuAT12-CC.4 | Add to My Program |
Online Fault Detection in Manipulation Tasks Via Generative Models |
|
Lanighan, Michael | TRACLabs, Inc |
Youngquist, Oscar | University of Massachusetts Amherst |
Keywords: Deep Learning in Grasping and Manipulation, Deep Learning Methods, Failure Detection and Recovery
Abstract: This paper introduces a method, Generative Adversarial Networks for Detecting Erroneous Results (GANDER), leveraging Generative Adversarial Networks to provide online error detection in manipulation tasks for autonomous robot systems. GANDER relies on mapping input images of a trained task to a learned manifold that contains only positive task executions and outcomes. When reconstructed through this manifold, the input images from successful task executions will remain largely unchanged, while the images from a failed task will change significantly. Using this insight, GANDER enables inspection and task outcome verification capabilities using a large number of positive examples but only a small set of negative examples, thus increasing the applicability of autonomous robot systems. We detail the design of GANDER and provide results of a proof-of-concept system, establishing its efficacy in an autonomous inspection, maintenance, and repair task. GANDER produces favorable results compared to baseline approaches and is capable of correctly identifying off-nominal behavior with 91.65% accuracy in our test task. Ablation studies were also performed to quantify the amount of data ultimately needed for this approach to succeed.
|
|
10:30-12:00, Paper TuAT12-CC.5 | Add to My Program |
One-Shot Learning for Task-Oriented Grasping |
|
Holomjova, Valerija | University of Aberdeen |
Starkey, Andrew | University of Aberdeen |
Yun, Bruno | Université Claude Bernard, Lyon 1 |
Meißner, Pascal | Wuerzburg-Schweinfurt Technical University of Applied Sciences |
Keywords: Deep Learning in Grasping and Manipulation, Computer Vision for Automation, Recognition
Abstract: Task-oriented grasping models aim to predict a suitable grasp pose on an object to fulfill a task. These systems have limited generalization capabilities to new tasks, but have shown the ability to generalize to novel objects by recognizing affordances. This object generalization comes at the cost of being unable to recognize the object category being grasped, which could lead to unpredictable or risky behaviors. To overcome these generalization limitations, we contribute a novel system for task-oriented grasping called the One-shot Task-oriented Grasping (OS-TOG) framework. OS-TOG comprises four interchangeable neural networks that interact through dependable reasoning components, resulting in a single system that predicts multiple grasp candidates for a specific object and task from multi-object scenes. Embedded one-shot learning models leverage references within a database for OS-TOG to generalize to novel objects and tasks more efficiently than existing alternatives. Additionally, the paper presents suitable candidates for the framework’s neural components, covering essential adjustments for their integration and evaluative comparisons to state-of-the-art. In physical experiments with novel objects, OS-TOG recognizes 69.4% of detected objects correctly and predicts suitable task-oriented grasps with 82.3% accuracy, having a physical grasp success rate of 82.3%.
|
|
10:30-12:00, Paper TuAT12-CC.6 | Add to My Program |
Multi-Level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection |
|
Zhu, Xinghao | University of California, Berkeley |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Romeres, Diego | Mitsubishi Electric Research Laboratories |
Sun, Lingfeng | University of California, Berkeley |
Tomizuka, Masayoshi | University of California |
Cherian, Anoop | Mitsubishi Electric Research Labs |
Keywords: Deep Learning in Grasping and Manipulation, Assembly, Big Data in Robotics and Automation
Abstract: Automating the assembly of objects from their parts is a complex problem with innumerable applications in manufacturing, maintenance, and recycling. Unlike existing research, which is limited to target segmentation, pose regression, or using fixed target blueprints, our work presents a holistic multi-level framework for part assembly planning consisting of part assembly sequence inference, part motion planning, and robot contact optimization. We present the Part Assembly Sequence Transformer (PAST) -- a sequence-to-sequence neural network -- to infer assembly sequences recursively from a target blueprint. We then use a motion planner and optimization to generate part movements and contacts. To train PAST, we introduce D4PAS: a large-scale Dataset for Part Assembly Sequences consisting of physically valid sequences for industrial objects. Experimental results show that our approach generalizes better than prior methods while needing significantly less computational time for inference.
|
|
10:30-12:00, Paper TuAT12-CC.7 | Add to My Program |
Learning to Design 3D Printable Adaptations on Everyday Objects for Robot Manipulation |
|
Guo, Michelle | Stanford University |
Liu, Ziang | Cornell University |
Tian, Stephen | Stanford University |
Xie, Zhaoming | Stanford University |
Wu, Jiajun | Stanford University |
Liu, Karen | Stanford University |
Keywords: Deep Learning in Grasping and Manipulation
Abstract: Advancements in robot learning for object manipulation have shown promising results, yet certain everyday objects remain challenging for robots to effectively interact with. This discrepancy arises from the fact that human-designed objects are optimized for human use rather than robot manipulation. To address this gap, we propose a framework to automatically design 3D printable adaptations that can be attached to hard-to-use objects, thus improving "robot ergonomics". Our learning-based framework formulates the adaptation design and control as a dual Markov decision process and is able to improve robot-object interactions for various robot end effectors and objects. We further validate our designs in the real world with a Franka Panda robot. Please see the supplementary video and https://object-adaptation.github.io for additional visualizations.
|
|
10:30-12:00, Paper TuAT12-CC.8 | Add to My Program |
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning |
|
Ku, Chahyon | University of Minnesota |
Winge, Carl | University of Minnesota |
Diaz, Ryan | University of Minnesota - Twin Cities |
Yuan, Wentao | University of Washington |
Desingh, Karthik | University of Minnesota |
Keywords: Deep Learning in Grasping and Manipulation, Dual Arm Manipulation, Transfer Learning
Abstract: This paper focuses on benchmarking the robustness of visual representations toward policy learning in the context of object assembly tasks, particularly the alignment and insertion of objects with geometrical extrusions, commonly known as a peg-in-hole task. The accuracy required to detect and orient the peg and the hole geometry in SE(3) space for successful assembly poses significant challenges. Addressing this, we employ a general framework in visuomotor policy learning that utilizes visual pretraining models as vision encoders. Our study investigates the robustness of this framework when applied to a dual-arm manipulation setup, specifically to the grasp variations. Our quantitative analysis shows that existing pretrained models fail to capture the essential visual features necessary for this task. However, a visual encoder trained from scratch consistently outperforms the frozen pretrained models. Moreover, we discuss rotation representations and associated loss functions that substantially improve policy learning. We present a novel task scenario designed to evaluate the progress in visuomotor policy learning, with a specific focus on improving the robustness of intricate assembly tasks that require both geometrical and spatial reasoning. Videos, data, and code for our simulator and evaluation are available at https://sites.google.com/view/geometric-peg-in-hole.
|
|
10:30-12:00, Paper TuAT12-CC.9 | Add to My Program |
DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects Via Deformable Objects |
|
Wang, Zixing | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Deep Learning in Grasping and Manipulation, Cooperating Robots, Learning from Experience
Abstract: Recent research efforts have yielded significant advancements in manipulating objects under homogeneous settings where the robot is required to either manipulate rigid or deformable (soft) objects. However, the manipulation under heterogeneous setups that involve both 1D deformable and rigid objects remains an unexplored area of research. Such setups are common in various scenarios that involve the transportation of heavy objects via ropes, e.g., on factory floors, at disaster sites, and in forestry. To address this challenge, we introduce DeRi-Bot, the first framework that enables the collaborative manipulation of rigid objects with deformable objects. Our framework comprises an Action Prediction Network (APN) and a Configuration Prediction Network (CPN) to model the complex pattern and stochasticity of soft-rigid body systems. We demonstrate the effectiveness of DeRi-Bot in moving rigid objects to a target position with ropes connected to robotic arms. Furthermore, DeRi-Bot is a distributive method that can accommodate an arbitrary number of robots or human partners without reconfiguration or retraining. We evaluate our framework in both simulated and real-world environments and show that it achieves promising results with strong generalization across different types of objects and multi-agent settings, including human-robot collaboration.
|
|
TuAT13-AX Oral Session, AX-201 |
Add to My Program |
Physical Human-Robot Interaction I |
|
|
Chair: Mattila, Jouni | Tampere University of Technology |
Co-Chair: Kurita, Yuichi | Hiroshima University |
|
10:30-12:00, Paper TuAT13-AX.1 | Add to My Program |
Unified Power and Admittance Adaptation for Safe and Effective Physical Interaction with Unmodelled Dynamic Environments |
|
Benzi, Federico | University of Modena and Reggio Emilia |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Keywords: Physical Human-Robot Interaction, Compliance and Impedance Control, Human-Robot Collaboration
Abstract: When interacting with unmodelled dynamic systems, a robot controller should be capable of adapting online its behavior, in order to be robust to the changing environmental conditions. In the paradigm of passivity-based control, virtual energy tanks allow to perform such adaptations in a robustly stable way, by bounding the amount of energy allocated to the controller. Nevertheless, when the workspace is shared with human collaborators, additional limits have to be imposed to the power the system can exert, in order to guarantee the overall safety. These bounds are difficult to estimate a priori, might vary over time and can significantly affect task execution. In this letter, we tackle this problem by simultaneously adapting online the admittance and the power limits in the controller, ensuring both safety and task performance. Experimental results with a collaborative manipulator validate the presented framework.
|
|
10:30-12:00, Paper TuAT13-AX.2 | Add to My Program |
Towards Robot to Human Skill Coaching: A ML-Powered IoT and HRI Platform for Martial Arts Training |
|
Bourahmoune, Katia | University of Technology Sydney |
Ishac, Karlos | University of Technology Sydney |
Carmichael, Marc | Centre for Autonomous Systems |
Keywords: Physical Human-Robot Interaction, AI-Based Methods, Human-Centered Robotics
Abstract: Advances in human sensing and machine learning are paving the way for new applications of robotics in sports and fitness, making skill coaching smarter, easier and more accessible. Physical and social human robot interaction in particular has received special attention as a feedback mechanism for human performance augmentation. A core challenge in deploying robots that interact physically with humans in dynamic environments such as sports, relates to modeling human skills and designing appropriate interaction schemes. We present the first ML-based HRI platform for physical robot to human skill coaching in real-time in Martial Arts which can be extended to various sports. Our system comprises of the Sawyer robot, our specially developed IoT katana and a skill-training program for the Martial Art of Iaido. We built and deployed in real-time a ML-based Iaido strike recognition model trained on expert and beginner data, and achieved accuracies ranging between 94.8% and 99.97%. We assessed the system's effectiveness in coaching skills through robot interaction in a sparring experiment and a survey involving 12 participants practicing key Iaido techniques with guided training from Sawyer. Our results demonstrated improvement in all participants' Iaido strike skill after training with Sawyer, and they responded positively to robot-assisted skill coaching.
|
|
10:30-12:00, Paper TuAT13-AX.3 | Add to My Program |
Towards Robo-Coach: Robot Interactive Stiffness/Position Adaptation for Human Strength and Conditioning Training |
|
Li, Chenzui | The Chinese University of Hong Kong |
Wu, Xi | The Chinese University of Hong Kong |
Teng, Tao | Istituto Italiano Di Tecnologia & Università Cattolica Del Sacro |
Calinon, Sylvain | Idiap Research Institute |
Chen, Fei | The Chinese University of Hong Kong |
Keywords: Physical Human-Robot Interaction, Compliance and Impedance Control, Learning from Demonstration
Abstract: Traditional strength and conditioning training relies on the utilization of free weights, such as weighted implements, to elicit external stimuli. However, this approach poses a significant challenge when attempting to modify or adjust the loads within a single training set. This paper introduces an innovative method for achieving adjustable loads during resistance training by leveraging physical Human-Robot Interaction (pHRI). The primary objective is to regulate targeted muscle activation through the use of Robo-Coach (robotic coach system). We first utilize a Task-Parameterized Gaussian Mixture Model (TP-GMM) to learn the motion of coach demonstration, which can be generalized for the trainees. The 3D path extracted from the generated trajectory is then projected onto a 2D plane with respect to the direction of the load. Furthermore, we propose a hybrid stiffness/position generator for online task execution. This generator determines the desired positions in the 2D plane according to the contact point displacements in the stimuli direction and, simultaneously, sets the desired stiffness based on the muscle activation feedback. Finally, the Robo-Coach is implemented with a variable impedance controller to achieve load-adjustable resistance training with the trainee. The biceps curl exercises were conducted and the results showed favorable performance, indicating the effectiveness of this approach.
|
|
10:30-12:00, Paper TuAT13-AX.4 | Add to My Program |
Predictive and Robust Robot Assistance for Sequential Manipulation |
|
Stouraitis, Theodoros | RoboPhren and Honda Research Institute Europe |
Gienger, Michael | Honda Research Institute Europe |
Keywords: Physical Human-Robot Interaction, Human-Aware Motion Planning, Optimization and Optimal Control
Abstract: This paper presents a novel concept to support physically impaired humans in daily object manipulation tasks with a robot. Given a user’s manipulation sequence, we propose a predictive model that uniquely casts the user’s sequential behavior as well as a robot support intervention into a hierarchical multi-objective optimization problem. A major contribution is the prediction formulation, which allows to consider several different future paths concurrently. The second contribution is the encoding of a general notion of constancy constraints, which allows to consider dependencies between consecutive or far apart keyframes (in time or space) of a sequential task. We perform numerical studies, simulations and robot experiments to analyse and evaluate the proposed method in several table top tasks where a robot supports impaired users by predicting their posture and proactively re-arranging objects.
|
|
10:30-12:00, Paper TuAT13-AX.5 | Add to My Program |
Nonlinear Subsystem-Based Adaptive Impedance Control of Physical Human-Robot-Environment Interaction in Contact-Rich Tasks |
|
Hejrati, Mahdi | Tampere University |
Mattila, Jouni | Tampere University |
Keywords: Physical Human-Robot Interaction, Compliance and Impedance Control, Robust/Adaptive Control
Abstract: Haptic upper limb exoskeletons are robots that assist human operators during task execution while having the ability to render virtual or remote environments. Therefore, ensuring the stability of such robots in physical human-robot-environment interaction (pHREI) is crucial. Having a wide range of Z-width, which indicates the region of passively renderable impedance by a haptic display, is also important for rendering a broad range of virtual environments. To address these issues, this study designs subsystem-based adaptive impedance control to achieve a stable pHREI for a 7 degrees of freedom haptic exoskeleton. The presented controller decomposes the entire system into subsystems and designs controller at the subsystem level. The stability of controller in the presence of contact with the virtual environment and human arm force is proven by employing the concept of virtual stability. Additionally, the Z-width of 7-DoF haptic exoskeleton is illustrated using experimental data and improved by exploiting varying virtual mass element. Finally, experimental results are provided to demonstrate the performance of the proposed controller. The control results are also compared to state-of-the-art control methods, highlighting the excellence of the designed controller.
|
|
10:30-12:00, Paper TuAT13-AX.6 | Add to My Program |
Model Predictive Control with Graph Dynamics for Garment Opening Insertion During Robot-Assisted Dressing |
|
Kotsovolis, Stelios | Imperial College London |
Demiris, Yiannis | Imperial College London |
Keywords: Physical Human-Robot Interaction, Bimanual Manipulation, Human-Centered Robotics
Abstract: Robots have a great potential to help people with movement limitations in activities of daily living, such as dressing. A common problem in almost all dressing tasks is the insertion of a garment’s opening around a part of the human body. The rich contact environment and the deformations of the garment make the task a challenging problem for robots. In this paper, we propose a bi-manual control method for garment opening insertion during robot-assisted dressing. Specifically, we propose a model predictive controller that uses an Attention-based Relational Graph Convolutional Network (ARGCN) for modeling the dynamics of the opening in the presence of the body. We train the model entirely in simulation and validate our method in four real-world dressing scenarios of a medical training manikin. We show that our method generalizes well in the real-world opening insertion tasks achieving an overall success rate of 97.5%, even though the dynamics and the shapes vastly differ from the simulation setup.
|
|
10:30-12:00, Paper TuAT13-AX.7 | Add to My Program |
Force Feedback-Based Gamification: Performance Validation of Squat Exergame Using Pneumatic Gel Muscles and Dynamic Difficulty Adjustment |
|
Ramasamy, Priyanka | Hiroshima University |
Renganathan, Gunarajulu | Hiroshima University |
Kurita, Yuichi | Hiroshima University |
Keywords: Physical Human-Robot Interaction, Human Performance Augmentation, Virtual Reality and Interfaces
Abstract: Exergames have been considered an advanced approach for enhancing the physical activities of the elderly community. Advancement includes greater immersive and enjoyment factors, which compute performance validation and sustain engagement to activate them from sedentary lifestyles. As a result, an at-home-based game design combined with squat workouts is essential to enhance lower-limb performance. We designed a virtual reality (VR) based exergame with dynamic difficulty adjustment (DDA) conditions in which the speed of the moving objects and air pressure of the PGM vary with respect to the knee shakiness feature. This letter aims to estimate the muscle loading and unloading effects of exergaming sessions at the onset and during the squat phase of the squat cycle. We acquired surface electromyography (sEMG) of five major lower limb muscles for seven subjects to evaluate the significant reduction in muscle activity during conventional and exergame-based squat sessions. In addition, we assessed the knee indicators to identify the variation in motion performance. The subjects performed 120 squats per session, followed by a maximum voluntary contraction (MVC) task. No adverse events, such as fatigue and dizziness, were reported during the study. Our results show a higher significant p<0.01 muscle activity difference for all tested muscles. We also found that knee shakiness showed a statistically significant reduction of p<0.01 during exergaming sessions (2 and 3).
|
|
10:30-12:00, Paper TuAT13-AX.8 | Add to My Program |
External Dynamics Dependent Human Gait Adaptation Using a Cable-Driven Exoskeleton |
|
Nakka, S S Sanjeevi | Indian Institute of Technology Gandhinagar |
Vashista, Vineet | Indian Institute of Technology Gandhinagar |
Keywords: Physical Human-Robot Interaction, Human-Centered Robotics, Rehabilitation Robotics
Abstract: The emergence of exoskeleton technology has enabled new opportunities for gait rehabilitation, but effective methods to restore healthy gait patterns with exoskeletons are not yet clearly understood. Early research in robot-based gait rehabilitation offered little improvement over current standards of physical care and emphasized the need for a deeper understanding of the complex interaction between humans and the robot, i.e., physical human-robot interaction (pHRI). Studies reported varied lower limb responses for a similar intervention with different exoskeletons, implying that the exoskeleton's external dynamics affect musculoskeletal adaptation outcomes. Accordingly, the current study aims at showcasing the external dynamics dependent gait adaptation using a Cable-Driven Leg Exoskeleton (CDLE). A swing phase gait intervention using three different CDLE cable-routing configurations that impose varied dynamics at human anatomical joints is studied. Twenty-four healthy participants, eight for each CDLE configuration, were tested. Results showed varied gait adaptation among the three groups such that the subjects used either predominantly their hip joint, knee joint, or a combination of both joints implying selective joint strategy adaptations for different external dynamics conditions for the same intervention. The results of this study can provide insights into the optimal design of leg exoskeleton-based rehabilitation paradigms for effective gait rehabilitation.
|
|
10:30-12:00, Paper TuAT13-AX.9 | Add to My Program |
Comparison of Rating Scale and Pairwise Comparison Methods for Measuring Human Co-Worker Subjective Impression of Robot During Physical Human-Robot Collaboration |
|
Wang, Qiao | University of Technology Sydney |
Wang, Ziqi | University of Technology Sydney |
Carmichael, Marc | Centre for Autonomous Systems |
Liu, Dikai | University of Technology, Sydney |
Lin, Chin-Teng | UTS |
Keywords: Physical Human-Robot Interaction, Human-Robot Collaboration, Human Factors and Human-in-the-Loop
Abstract: The Rating Scale method has been long deemed the standard for measuring subjective perceptions. However, in the field of physical human-robot collaboration (pHRC), its aptness should be put under scrutiny due to inherent challenges such as response bias, between-subject variations, and the granularity nature. Individual variances can introduce significant bias in the rating scale results. A high granularity in the scale could overwhelm participants, leading to unclear and biased responses, while a low granularity may gloss over the fine nuances of human feelings. Additionally, there’s a notable risk of receiving careless responses, which compromise data reliability. Recognizing these challenges, this paper proposes the application of Pairwise Comparison (PC) in pHRC — an alternative survey technique that emphasizes direct comparisons between items on the defined criteria. By using the NASA Task Load Index (NASA-TLX) as a template, RS and PC questionnaires are designed and used in a series of pHRC experiments. Our preliminary findings suggest that PC is more precise and robust than the rating scale method. Compared to RS, PC fosters authentic participant interests in the experiment by intuitive question design and reducing the experimental duration. Besides, the accuracy and reliability of PC are also found to be consistent regardless of the variations in our experimental procedure design.
|
|
TuAT14-AX Oral Session, AX-202 |
Add to My Program |
Prosthetics and Exoskeletons I |
|
|
Chair: Shimoda, Shingo | RIKEN |
Co-Chair: Choi, Junho | Korea Institute of Science and Technology |
|
10:30-12:00, Paper TuAT14-AX.1 | Add to My Program |
A Transtibial Prosthesis Using a Parallel Spring Mechanism |
|
Jung, Donggyu | Korea Institute of Science and Technology |
Park, Shinsuk | Korea University |
Choi, Junho | Korea Institute of Science and Technology |
Keywords: Prosthetics and Exoskeletons, Actuation and Joint Mechanisms, Mechanism Design
Abstract: Prosthetic legs have been used to restore function in the lower limbs lost due to amputation. Early designs including prosthetic legs with a passive joint or without any joint as well as the Energy Storing and Releasing (ESR) feet have shown deficiency in push-off torque, which results in asymmetric gait pattern, slower walking speed, and higher cost of transportation. Although powered prosthetic legs address the aforementioned problems, they suffer from lower energy efficiency, higher volume and weight. In this paper, a powered transtibial prosthesis using a Parallel Elastic Actuator (PEA) is proposed in order to generate the joint torque needed for walking with a lower-powered actuator for lighter and more compact design. A non-linear spring mechanism is proposed to generate the spring torque as needed. The implemented prosthetic leg is evaluated with three intact subjects. The experimental results shows that smaller torque is required for the motor with the spring mechanism. Therefore, less electrical power is consumed when the spring mechanism is used, which implies a lower-powered actuator is sufficient to generate the joint torque needed for walking.
|
|
10:30-12:00, Paper TuAT14-AX.2 | Add to My Program |
Deep Learning Based Acoustic Measurement Approach for Robotic Applications on Orthopedics |
|
Lan, Bangyu | University of Twente |
Abayazid, Momen | University of Twente |
Verdonschot, Nico | Orthopaedic Research Lab; RadboudUMC |
Stramigioli, Stefano | University of Twente |
Niu, Kenan | University of Twente |
Keywords: Prosthetics and Exoskeletons, Computer Vision for Medical Robotics, Visual Tracking
Abstract: In Total Knee Replacement Arthroplasty (TKA), surgical robotics can provide image-guided navigation to fit implants with high precision. Its tracking approach highly relies on inserting bone pins into the bones tracked by the optical tracking system. This is normally done by invasive, radiative manners (implantable markers and CT scans), which introduce unnecessary trauma and prolong the preparation time for patients. To tackle this issue, ultrasound-based bone tracking could offer an alternative. In this study, we proposed a novel deep-learning structure to improve the accuracy of bone tracking by an A-mode ultrasound (US). We first obtained a set of ultrasound dataset from the cadaver experiment, where the ground truth locations of bones were calculated using bone pins. These data were used to train the proposed CasAtt-UNet to predict bone location automatically and robustly. The ground truth bone locations and those locations of US were recorded simultaneously. Therefore, we could label bone peaks in the raw US signals. As a result, our method achieved sub-millimeter precision across all eight bone areas with the only exception of one channel in the ankle. This method enables the robust measurement of lower extremity bone positions from 1D raw ultrasound signals. It shows great potential to apply A-mode ultrasound in orthopedic surgery from safe, convenient, and efficient perspectives.
|
|
10:30-12:00, Paper TuAT14-AX.3 | Add to My Program |
EMG-Based Intention Detection Using Deep Learning for Shared Control in Upper-Limb Assistive Exoskeletons |
|
Sedighi, Paniz | University of Alberta |
Li, Xingyu | University of Alberta |
Tavakoli, Mahdi | University of Alberta |
Keywords: Prosthetics and Exoskeletons, Deep Learning Methods, Physical Human-Robot Interaction
Abstract: In the field of human-robot interaction, surface electromyography (sEMG) provides a valuable tool for measuring active muscular effort. While numerous studies have investigated real-time control of upper extremity exoskeletons based on user intention and task-specific movements, the prediction of body joint positions based on EMG features has remained largely unexplored. In this paper, we address this gap by proposing a novel approach that leverages Convolutional Neural Networks and Long-Short-Term Memory (CNN-LSTM) models to generate exoskeleton joint trajectories. Our methodology involves collecting data from three channels of EMG and three degrees-of-freedom (DoF) joint angles and enables us to position control a pneumatic cable-driven upper-limb exoskeleton, thereby assisting users in various tasks. Through extensive experimentation, our intention-based model demonstrates robust performance across different speeds and is capable of detecting variations in payload and electrode placement. The empirical results yielded from our study underscore the efficacy of our approach, particularly in reducing the EMG levels of the user during different tasks by providing exoskeleton assistance as needed.
|
|
10:30-12:00, Paper TuAT14-AX.4 | Add to My Program |
Gait Phase Detection Based on LSTM-CRF for Stair Ambulation |
|
Wei, Haochen | Monash University |
Tong, Kai Yu | The Chinese University of Hong Kong |
Wang, Michael Yu | Mywang@gbu.edu.cn |
Chen, Chao | Monash University |
Keywords: Prosthetics and Exoskeletons, Deep Learning Methods
Abstract: It is essential to accurately identify gait phases when active exoskeleton devices assist with the lower limbs. This work focuses on IMU-based phase detection for stair ambulation. In order to enhance the detection sensitivity of phase transition, this work utilises the LSTM-CRF hybrid model. Four IMU sensors attached to the thighs and shanks on both legs were utilised to collect data during trials on ten healthy subjects for stair ascent and descent. The network’s performance is evaluated by F1-score, recall (true positive rate), and precision, which are 96.3% on average with a standard deviation (std) of 1.9%, 96.6% on average with an std of 1.6%, and 95.9% on average with an std of 2.7%, respectively.
|
|
10:30-12:00, Paper TuAT14-AX.5 | Add to My Program |
Towards a Unified Approach for Continuously-Variable Impedance Control of Powered Prosthetic Legs Over Walking Speeds and Inclines |
|
Lee, Albert | University of Michigan |
Laubscher, Curt A. | University of Michigan |
Best, T. Kevin | University of Michigan |
Gregg, Robert D. | University of Michigan |
Keywords: Prosthetics and Exoskeletons, Human-Centered Robotics
Abstract: Research in powered prosthesis control has explored the use of impedance-based control algorithms due to their biomimetic capabilities and intuitive structure. Modern impedance controllers feature parameters that smoothly vary over gait phase and task according to a data-driven model. However, these recent efforts only use continuous impedance control during stance and instead utilize discrete transition logic to switch to kinematic control during swing, necessitating two separate models for the different parts of the stride. In contrast, this paper presents a controller that uses smooth impedance parameter trajectories throughout the gait, unifying the stance and swing periods under a single, continuous model. Furthermore, this paper proposes a basis model to represent intertask relationships in the impedance parameters—a strategy that has previously been shown to improve model accuracy over classic linear interpolation methods. In the proposed controller, a weighted sum of Fourier series is used to model the impedance parameters of each joint as continuous functions of gait cycle progression and task. Fourier series coefficients are determined via convex optimization such that the controller best reproduces the joint torques and kinematics in a reference able-bodied dataset. Experiments with a powered knee-ankle prosthesis show that this simpler, unified model produces competitive results when compared to a more complex hybrid impedance-kinematic model over varying walking speeds and inclines.
|
|
10:30-12:00, Paper TuAT14-AX.6 | Add to My Program |
Design, Simulation and Kinematic Verification of a Multi-Loop Ankle-Foot Prosthetic Mechanism |
|
Song, Majun | Hangzhou Innovation Institute, Beihang University |
Li, Zhongyi | Hangzhou Innovation Institute, Beihang University |
Chen, Weihai | Hangzhou Innovation Institute, Beihang University |
Zheng, Hao | Hangzhou Innovation Institute, Beihang University |
Guo, Sheng | Beijing Jiaotong University |
Rasmussen, John | Aalborg University |
Bai, Shaoping | Aalborg University |
Keywords: Prosthetics and Exoskeletons, Mechanism Design, Human-Centered Robotics
Abstract: Inspired by the bionic characteristics of ankle and calf skeletal muscles, a novel ankle-foot prosthesis (AFP) with variable stiffness mechanisms (VSMs) is proposed to assist transtibial amputees to restore ankle plantarflexion- dorsiflexion. The prosthesis is designed in the form of a spring-loaded three-loop linkage for function of continuous energy absorption-release in gait stance phase, which can facilitate ankle plantarflexion- dorsiflexion and keep human body move forward steadily. A compliant crank slider mechanism is also developed to power-assist AFP mechanism to improve the adaptive compliant contact between prosthesis and ground. In this paper, mechanics models of the ATP are developed to reveal the variable moment of the ankle joint, which is verified by human-machine simulation. An AFP prototype is built to validate the design experimentally. The results demonstrate that the AFP mechanism has the advantages of low power consumption, human-like joint moment profile. In particular, it is shown that the AFP mechanism with 54W power provided in toe-off phase can reduce the peak power of the motor by 24%.
|
|
10:30-12:00, Paper TuAT14-AX.7 | Add to My Program |
Short Term After-Effects of Small Force Fields Applied by an Upper-Limb Exoskeleton on Inter-Joint Coordination |
|
Dubois, Océane | Sorbonne University |
Roby-Brami, Agnès | Université Pierre Et Marie Curie, Paris 6 |
Parry, Ross | Université Paris Nanterre |
Jarrassé, Nathanael | Sorbonne Université, ISIR UMR 7222 CNRS |
Keywords: Prosthetics and Exoskeletons, Physical Human-Robot Interaction, Physically Assistive Devices
Abstract: Exoskeleton technologies have numerous potential applications, ranging from improving human motor skills to aiding individuals in their daily activities. While exoskeletons are increasingly viewed, for example, as promising tools in industrial ergonomics, the effect of using them on human motor control, particularly on inter-joint coordination, remains relatively uncharted. This paper investigates the effects of generic low-amplitude force fields applied by an exoskeleton on motor strategies in asymptomatic users. The force fields mimic common perturbations encountered in exoskeletons, such as residual friction, over/under-tuned assistance, or structural elasticity. Fifty-five participants performed reaching tasks while connected to an arm exoskeleton, experiencing one of five tested force fields. Their movements before and after exposure to the exoskeleton force field were compared. The study focuses both on spatial and temporal changes in coordination using specific metrics. The results reveal that even brief exposure to a low-amplitude force field, or to uncompensated residual friction and dynamic forces, applied at the joint level can alter the inter-joint coordination, while task performance remains unaffected. The tested force fields induced varying degrees of changes in joint contributions and synchronization. This study highlights the importance of monitoring coordination changes to fully understand the impact of exoskeletons on human motor control and thus enable safe and widespread adoption of those devices.
|
|
10:30-12:00, Paper TuAT14-AX.8 | Add to My Program |
Real-Time Dexterous Prosthesis Hand Control by Decoding Neural Information Based on EMG Decomposition |
|
Ying, Zhenzhi | The University of Tokyo |
Zhang, Xianyu | The University of Tokyo |
Li, Shihao | The University of Tokyo |
Nakashima, Koki | The University of Tokyo |
Shu, Liming | Dalian University of Technology |
Sugita, Naohiko | The University of Tokyo |
Keywords: Prosthetics and Exoskeletons, Physically Assistive Devices, Neurorobotics
Abstract: The vague interpretation of myoelectrical signals on the residual limb end makes restoring dexterous hand function in amputees still impossible. Understanding motor control between human motion intention and synaptic inputs to motor neurons also remains a significant challenge. The neural decoding methods of surface EMG signals remains challenging, which limit the application of robot hand in real life. Herein, we propose and substantiate a human-machine interface for motor control that introduces neural information of motor neurons in conjunction with the combination mechanism of muscle contraction. The interface firstly introduces a new concept of motor unit (MU) spike trains, which combines decoupling of the electrical activations on motor neuron axons with extraction of motion patterns from the discharge timings of the motor neuron pools. We realized a real-time implementation of the EMG decomposition algorithm on our developed prosthesis hand control system. The control scheme provides an accurate classification of intuitive hand motions, enabling the amputee to perform versatile finger movements of the prosthesis hand. The concept of motor neuron discharge timings was evaluated through experiments on one amputee participant and six able-bodied participants. The results show that the neuroprosthesis hand control scheme based on MU spike trains has the capacity of generating accurate and intuitive hand movements for amputees in a physical environment.
|
|
10:30-12:00, Paper TuAT14-AX.9 | Add to My Program |
A Novel Back-Support Exoskeleton with a Differential Series Elastic Actuator for Lifting Assistance |
|
Ding, Shuo | National University of Singapore |
Anaya Reyes, Francisco | National University of Singapore |
Bhattacharya, Shounak | National University of Singapore |
Narayan, Ashwin | National University of Singapore |
Han, Shuaishuai | Nanjing University of Science and Technology |
Ofori, Seyram | National University of Singapore |
Yu, Haoyong | National University of Singapore |
Keywords: Prosthetics and Exoskeletons, Product Design, Development and Prototyping, Physical Human-Robot Interaction, Force Control
Abstract: Compared to conventional back-support exoskeletons (BSEs) with two motors, BSEs driven by a single motor have the advantage of light weight. However, current single-motor BSEs have problems with accommodating asynchronous hip movements, achieving precise force control and efficient force transmission, and giving autonomy to users when walking. In this paper, we propose a novel BSE with a differential series elastic actuator (D-SEA) for lifting assistance. The unique differential working principle can accommodate the angular difference between the hip joints and provide the same assistive torque at both hip joints. The D-SEA achieves precise force control with a custom controller based on accurate spring deflection feedback, and drives the hip joints via an efficient cable-roller mechanism. Taking advantage of the active backdrivability of the D-SEA, we proposed an intelligent assistive strategy that automatically provides adequate support for lifting tasks and grants autonomy to users during walking. In the experiments, the BSE reduced the activation level of the back muscles by up to 40% during lifting, without increasing the muscle activation during walking.
|
|
TuAT15-AX Oral Session, AX-203 |
Add to My Program |
Multi-Modal Perception for HRI I |
|
|
Chair: Paolillo, Antonio | IDSIA USI-SUPSI |
Co-Chair: Ye, Qi | Zhejiang University |
|
10:30-12:00, Paper TuAT15-AX.1 | Add to My Program |
The Un-Kidnappable Robot: Acoustic Localization of Sneaking People |
|
Yang, Mengyu | Georgia Institute of Technology |
Grady, Patrick | Georgia Institute of Technology |
Brahmbhatt, Samarth Manoj | Intel Corporation |
Vasudevan, Arun Balajee | Carnegie Mellon University |
Kemp, Charles C. | Hello Robot Inc |
Hays, James | Georgia Institute of Technology, Argo AI |
Keywords: Human Detection and Tracking, Robot Audition
Abstract: How easy is it to sneak up on a robot? We examine whether we can detect where people are using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict whether there is a moving person nearby and then their location. We implement our method on a robot in real time, demonstrating the ability for robots to navigate populated indoor spaces in a passive manner. For demonstration videos, see our project page: https://sites.google.com/view/unkidnappable-robot
|
|
10:30-12:00, Paper TuAT15-AX.2 | Add to My Program |
Predicting the Intention to Interact with a Service Robot: The Role of Gaze Cues |
|
Arreghini, Simone | IDSIA USI-SUPSI |
Abbate, Gabriele | Istituto Dalle Molle Di Studi sull'Intelligenza Artificiale (IDS |
Giusti, Alessandro | IDSIA USI-SUPSI |
Paolillo, Antonio | IDSIA USI-SUPSI |
Keywords: Multi-Modal Perception for HRI, Social HRI
Abstract: For a service robot, it is crucial to perceive as early as possible that an approaching person intends to interact: in this case, it can proactively enact friendly behaviors that lead to an improved user experience. We solve this perception task with a sequence-to-sequence classifier of a potential user intention to interact, which can be trained in a self-supervised way. Our main contribution is a study of the benefit of features representing the person’s gaze in this context. Extensive experiments on a novel dataset show that the inclusion of gaze cues significantly improves the classifier performance (AUROC increases from 84.5 % to 91.2 %); the distance at which an accurate classification can be achieved improves from 2.4 m to 3.2 m. We also quantify the system’s ability to adapt to new environments without external supervision. Qualitative experiments show practical applications with a waiter robot.
|
|
10:30-12:00, Paper TuAT15-AX.3 | Add to My Program |
Non-Verbal Cues on Robot-Group Persuasion |
|
Esperança Gonçalves, Alexandra | Instituto Superior Técnico, University of Lisbon |
Moreno, Plinio | IST-ID |
Forlizzi, Jodi | Carnegie Mellon University |
Garcia Marques, Leonel | Faculty of Psychology, University of Lisbon |
Bernardino, Alexandre | IST - Técnico Lisboa |
Keywords: Gesture, Posture and Facial Expressions, Human-Robot Collaboration, Human-Robot Teaming
Abstract: When integrating robots into human daily life, persuasive power can be essential. However, there are often group dynamics which can complicate persuasion. This study focuses on how non-verbal cues, specifically gaze and hand gestures, affect the persuasiveness of a social robot. We have designed a protocol to include non-verbal cues in the social robot Vizzy (head and eye gaze, hand gestures) and test them in a series of experiments using the paradigm of the "Desert Survival Challenge". The goal of the robot is to persuade the participants of the game into changing their answers whilst avoiding negative feelings. It is hypothesized that the non-verbal cues will help avoid psychological reactance without diminishing compliance to the verbal requests issued by the robot. This phenomenon has been verified before for single person persuasion, but it is yet to be tested on groups. Thus, the goal of this project is to verify the effect of non-verbal cues in group persuasion by a robot and comparing it to single person persuasion. The results showed that the robot's gestures increased compliance by the group and the gaze behaviour decreased psychological reactance.
|
|
10:30-12:00, Paper TuAT15-AX.4 | Add to My Program |
Naturalistic Robot-To-Human Bimanual Handover in Complex Environments through Multi-Sensor Fusion (I) |
|
Ovur, Salih Ertug | Imperial College London |
Demiris, Yiannis | Imperial College London |
Keywords: Multi-Modal Perception for HRI, Human-Robot Collaboration, Human-Centered Robotics
Abstract: Robot-human object handover has been extensively studied in recent years for a wide range of applications. However, it is still far from being as natural as human-human handovers, largely due to the robots’ limited sensing capabilities. Previous approaches in the literature typically simplify the handover scenarios, including one or more of (a) conducting handovers at fixed locations, (b) not adapting to human preferences, or (c) only focusing on single-arm handover with small objects due to the sensor occlusions caused by large objects. To advance the state of the art toward a human-human level of handover fluency, this paper investigates a bimanual handover scenario in a naturalistic, complex setup. Specifically, we target robot-to-human box transfer while the human partner is on a ladder, and ensure that object is adaptively delivered based on human preferences. To address the occlusion problem that arises in a complex environment, we develop an onboard multi-sensor perception system for the bimanual robot, introduce a measurement confidence estimation technique, and propose an occlusion-resilient multi-sensor fusion technique by positioning visual perception sensors in distinct locations on the robot with different fields of view. Multi-sensor fusion approach achieved a handover success rate above 86.7% for all experiments by successfully combining the strengths of both fields of view of human pose tracking under partial occlusions without sacrificing handover duration.
|
|
10:30-12:00, Paper TuAT15-AX.5 | Add to My Program |
Language and Sketching: An LLM-Driven Interactive Multimodal Multitask Robot Navigation Framework |
|
Zu, Weiqin | ShanghaiTech University |
Song, Wenbin | ShanghaiTech University |
Chen, Ruiqing | ShanghaiTech University |
Guo, Ze | Harbin Institute of Technology |
Sun, Fanglei | ShangTech University |
Tian, Zheng | ShanghaiTech University |
Pan, Wei | The University of Manchester |
Wang, Jun | University College London |
Keywords: Multi-Modal Perception for HRI, Human-Centered Robotics, Autonomous Agents
Abstract: The socially-aware navigation system has evolved to adeptly avoid various obstacles while performing multiple tasks, such as point-to-point navigation, human-following, and -guiding. However, a prominent gap persists: in Human-Robot Interaction (HRI), the procedure of communicating commands to robots demands intricate mathematical formulations. Furthermore, the transition between tasks does not quite possess the intuitive control and user-centric interactivity that one would desire. In this work, we propose an LLM-driven interactive multimodal multitask robot navigation framework, termed LIM2N, to solve the above new challenge in the navigation field. We achieve this by first introducing a multimodal interaction framework where language and hand-drawn inputs can serve as navigation constraints and control objectives. Next, a reinforcement learning agent is built to handle multiple tasks with the received information. Crucially, LIM2N creates smooth cooperation among the reasoning of multimodal input, multitask planning, and adaptation and processing of the intelligent sensing modules in the complicated system. Extensive experiments are conducted in both simulation and the real world demonstrating that LIM2N has superior user needs understanding, alongside an enhanced interactive experience.
|
|
10:30-12:00, Paper TuAT15-AX.6 | Add to My Program |
Dual-Modal Tactile E-Skin: Enabling Bidirectional Human-Robot Interaction Via Integrated Tactile Perception and Feedback |
|
Mu, Shilong | Tsinghua University |
Zhao, Runze | Tsinghua University |
ZenanLin, Zenan | Tsinghua University |
Huang, Yan | Wuhan University |
Li, Shoujie | Tsinghua Shenzhen International Graduate School |
Li, Chenchang | Tsinghua University |
Zhang, Xiao-Ping | Ryerson University |
Ding, Wenbo | Tsinghua University |
Keywords: Multi-Modal Perception for HRI, Haptics and Haptic Interfaces, Touch in HRI
Abstract: To foster an immersive and natural human-robot interaction (HRI), the implementation of tactile perception and feedback becomes imperative, effectively bridging the conventional sensory gap. In this paper, we propose a dual-modal electronic skin (e-skin) that integrates magnetic tactile sensing and vibration feedback for enhanced HRI. The dual-modal tactile e-skin offers multi-functional tactile sensing and programmable haptic feedback, underpinned by a layered structure comprised of flexible magnetic films, soft silicone elastomer, a Hall sensor and actuator array, and a microcontroller unit. The e-skin captures the magnetic field changes caused by subtle deformations through Hall sensors, employing deep learning for accurate tactile perception. Simultaneously, the actuator array generates mechanical vibrations to facilitate haptic feedback, delivering diverse mechanical stimuli. Notably, the dual-modal e-skin is capable of transmitting tactile information bidirectionally, enabling object recognition and fine-weighing operations. This bidirectional tactile interaction framework will enhance the immersion and efficiency of interactions between humans and robots.
|
|
10:30-12:00, Paper TuAT15-AX.7 | Add to My Program |
Close-Range Human Following Control on a Cane-Type Robot with Multi-Camera Fusion |
|
Liu, Haowen | Southern University of Science and Technology |
Wu, Fengxian | Tongji University |
Zhong, Bin | Southern University of Science and Technology |
Zhao, Yijun | Southern University of Science and Technology |
Zhang, Jiatong | Harbin Institute of Technology |
Niu, Wenxin | Tongji University |
Zhang, Mingming | Southern University of Science and Technology |
Keywords: Human Detection and Tracking, Sensor-based Control, Rehabilitation Robotics
Abstract: Cane-type robots have been utilized to assist the mobility-impaired population. The essential technique for cane-type robots is to follow the user's ambulation at a close range. This study developed a new cane-type wheeled robot and proposed a novel human-following control frame with multi-camera fusion. This human following control adopts a cascade control strategy consisting of two parts: 1) a human following position controller that locates a user by detecting his/her legs' positions via multi-camera fusion and 2) a cane robot velocity controller to steer the cane robot to the target position. The proposed strategy's effectiveness has been validated in outdoor experiments with six healthy subjects. The experimental scenarios included different terrains (i.e., straight, turning, and inclined paths), road conditions (i.e., flat and rough roads), and walking speeds. The obtained results showed that the average tracking error in the X and Y directions was less than 4.1 cm and 4.4 cm, respectively, and the error in angle was less than 12.9° across all scenarios. Moreover, the cane robot can effectively adapt to a wide range of individual gait patterns and achieve stable human following at daily walking speeds (0.74 m/s - 1.47 m/s).
|
|
10:30-12:00, Paper TuAT15-AX.8 | Add to My Program |
CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction |
|
Han, Guwen | Zhejiang University |
Ye, Qi | Zhejiang University |
Chen, Anjun | Zhejiang University |
Chen, Jiming | Zhejiang University |
Keywords: Gesture, Posture and Facial Expressions, Visual Learning, Deep Learning for Visual Perception
Abstract: Interactive hand mesh reconstruction from single-view images poses a significant challenge with the severe occlusion and depth ambiguity inherent in interactive hand gestures. Recent approaches that employ probabilistic models and token-pruned techniques have shown decent results in multi-view human body reconstruction. Nevertheless, these methods have not fully utilized multi-scale semantic information from multi-view images and are not applicable in scenarios involving severe occlusion during dual-hand interactions. Simultaneously, current single-view methods independently reconstruct the left and right hands, which are ineffective in enhancing the interaction between both hands. To address these challenges, we propose CAMInterHand, a cooperative attention-based method for multi-view interactive hand pose and mesh reconstruction. Specifically, CAMInterHand extracts local pyramid features and global vertex features from multi-scale feature maps of multi-view images, enabling the exploration of rich local semantic information and facilitating effective feature alignment. Furthermore, CAMInterHand employs the cooperative attention fusion module to fuse all features from multi-view images, enhancing interactions among vertices of dual hands within global and local contexts. We conduct extensive experiments on the large-scale multi-view dataset InterHand2.6M and CAMInterHand achieves a substantial performance improvement over existing methods for multi-view and single-view interactive hand reconstruction.
|
|
10:30-12:00, Paper TuAT15-AX.9 | Add to My Program |
Action Segmentation Using 2D Skeleton Heatmaps and Multi-Modality Fusion |
|
Hyder, Syed Waleed | Retrocausal |
Usama, Muhammad | Retrocausal |
Zafar, Anas | Retrocausal, Inc |
Naufil, Muhammad | Retrocausal, Inc |
Javed Fateh, Fawad | Retrocausal |
Konin, Andrey | Retrocausal Inc |
Zia, M. Zeeshan | Retrocausal |
Tran, Quoc-Huy | Retrocausal, Inc |
Keywords: Multi-Modal Perception for HRI, Gesture, Posture and Facial Expressions, Deep Learning for Visual Perception
Abstract: This paper presents a 2D skeleton-based action segmentation method with applications in fine-grained human activity recognition. In contrast with state-of-the-art methods which directly take sequences of 3D skeleton coordinates as inputs and apply Graph Convolutional Networks (GCNs) for spatiotemporal feature learning, our main idea is to use sequences of 2D skeleton heatmaps as inputs and employ Temporal Convolutional Networks (TCNs) to extract spatiotemporal features. Despite lacking 3D information, our approach yields comparable/superior performances and better robustness against missing keypoints than previous methods on action segmentation datasets. Moreover, we improve the performances further by using both 2D skeleton heatmaps and RGB videos as inputs. To our best knowledge, this is the first work to utilize 2D skeleton heatmap inputs and the first work to explore 2D skeleton+RGB fusion for action segmentation.
|
|
TuAT16-AX Oral Session, AX-204 |
Add to My Program |
Force and Tactile Sensing I |
|
|
Chair: Pearson, Martin | Bristol Robotics Laboratory |
Co-Chair: Tumerdem, Ugur | Marmara University |
|
10:30-12:00, Paper TuAT16-AX.1 | Add to My Program |
ViTacTip: Design and Verification of a Novel Biomimetic Physical Vision-Tactile Fusion Sensor |
|
Fan, Wen | University of Bristol |
Li, Haoran | University of Bristol |
Si, Weiyong | University of Essex |
Luo, Shan | King's College London |
Lepora, Nathan | University of Bristol |
Zhang, Dandan | Imperial College London |
Keywords: Force and Tactile Sensing
Abstract: Tactile sensing is significant for robotics since it can obtain physical contact information during manipulation. To capture multimodal contact information within a compact framework, we designed a novel sensor called ViTacTip, which seamlessly integrates both tactile and visual perception capabilities into a single, integrated sensor unit. ViTacTip features a transparent skin to capture fine features of objects during contact, which can be known as the see-through-skin mechanism. In the meantime, the biomimetic tips embedded in ViTacTip can amplify touch motions during tactile perception. For comparative analysis, we also fabricated a ViTac sensor devoid of biomimetic tips, as well as a TacTip sensor with opaque skin. Furthermore, we develop a Generative Adversarial Network (GAN)-based approach for modality switching between different perception modes, effectively alternating the emphasis between vision and tactile perception modes. We conducted a performance evaluation of the proposed sensor across three distinct tasks: i) grating identification, ii) pose regression, iii) contact localization and force estimation. In the grating identification task, ViTacTip demonstrated an accuracy of 99.72%, surpassing TacTip, which achieved 94.60%. It also exhibited superior performance in both pose and force estimation tasks with the minimum error of 0.08mm and 0.03N, respectively, in contrast to ViTac's 0.12mm and 0.15N. Results indicate that ViTacTip outperforms single-modality sensors.
|
|
10:30-12:00, Paper TuAT16-AX.2 | Add to My Program |
A Large-Area Tactile Sensor for Distributed Force Sensing Using Highly Sensitive Piezoresistive Sponge |
|
Zheng, Wendong | Tsinghua University |
Liu, Kun | Institute of Semiconductors, Chinese Academy of Sciences |
Guo, Di | Beijing University of Posts and Telecommunications |
Yang, Wuqiang | The University of Manchester |
Zhu, Jun | Nanjing University of Information Science and Technology |
Liu, Huaping | Tsinghua University |
Keywords: Force and Tactile Sensing
Abstract: Tactile sensing plays a critical role in enabling robots to interact safely with target objects in dynamic and unstructured environments. While various tactile sensors based on different sensing principles or different sensitive materials have been proposed, the development of flexible large-area tactile sensors for robots is still challenging. In this paper, a novel highly sensitive piezoresistive sponge based on multi-walled carbon nanotubes (MWCNTs) and polyurethane (PU) sponge is fabricated for pressure sensing. The sensing behavior of the piezoresistive sponge was experimentally evaluated, showing high sensitivity and fast response. Based on the piezoresistive sponge, a flexible large-area tactile sensor is designed for distributed force detection with electrical resistance tomography technology. The sensing performance of the sensor is validated by touch location, sensitivity analysis, real-time touch discrimination, and touch modality recognition. The experimental results indicate that the sensor performs well in detecting the position and force of contact in a large area. The sensor's performance shows promise in embodied tactile sensing and human–robot interaction.
|
|
10:30-12:00, Paper TuAT16-AX.3 | Add to My Program |
A Neuromorphic System for the Real-Time Classification of Natural Textures |
|
Brayshaw, George | University of Bristol |
Ward-Cherrier, Benjamin | University of Bristol |
Pearson, Martin | Bristol Robotics Laboratory |
Keywords: Force and Tactile Sensing, Bioinspired Robot Learning, Neurorobotics
Abstract: Tactile exploration of surfaces is a key component of everyday life, allowing us to make complex inferences about our environments even when vision is occluded. The emergence of biomimetic neuromorphic hardware in recent years has furthered our ability to create biologically plausible sensing solutions. While these platforms continue to improve in regards to latency and power consumption, within recent literature on tactile texture classification there is an emphasis on accuracy at the expense of real-time processing. In order for these tactile sensing systems to find use outside of experimental laboratory environments, it is key to design systems capable of capturing and processing data in real-time. Within this paper we present a system for the real-time classification of texture using a neuromorphic tactile sensor, a spiking neural network and a novel decision making algorithm. Our real-time system achieves classification accuracies of 94% on a dataset of 11 natural textile textures. Furthermore our system is capable of identifying textures at human-level performance in as little as 84ms. Additionally, benchmarking our system across CPU, GPU and Loihi2 hardware platforms resulted in a 96% reduction in power consumption on the neuromorphic platform. This system out-performed previous work by the authors and the state of art, both in terms of accuracy and classification speed.
|
|
10:30-12:00, Paper TuAT16-AX.4 | Add to My Program |
An Electromagnetism-Inspired Method for Estimating In-Grasp Torque from Visuotactile Sensors |
|
Fuchioka, Yuni | University of British Columbia |
Hamaya, Masashi | OMRON SINIC X Corporation |
Keywords: Force and Tactile Sensing, Compliant Assembly
Abstract: Tactile sensing has become a popular sensing modality for robot manipulators, due to the promise of providing robots with the ability to measure the rich contact information that gets transmitted through its sense of touch. Among the diverse range of information accessible from tactile sensors, torques transmitted from the grasped object to the fingers through extrinsic environmental contact may be particularly important for tasks such as object insertion. However, tactile torque estimation has received relatively little attention when compared to other sensing modalities, such as force, texture, or slip identification. In this work, we introduce the notion of the Tactile Dipole Moment, which we use to estimate tilt torques from gel-based visuotactile sensors. This method does not rely on deep learning, sensor-specific mechanical, or optical modeling, and instead takes inspiration from electromechanics to analyze the vector field produced from 2D marker displacements. Despite the simplicity of our technique, we demonstrate its ability to provide accurate torque readings over two different tactile sensors and three object geometries, and highlight its practicality for the task of USB stick insertion with a compliant robot arm. These results suggest that simple analytical calculations based on dipole moments can sufficiently extract physical quantities from visuotactile sensors.
|
|
10:30-12:00, Paper TuAT16-AX.5 | Add to My Program |
Sim-To-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing |
|
Yang, Max | University of Bristol |
Lin, Yijiong | University of Bristol |
Church, Alex | Cambrian |
Lloyd, John | University of Bristol |
Zhang, Dandan | Imperial College London |
Barton, David A. W. | University of Bristol |
Lepora, Nathan | University of Bristol |
Keywords: Force and Tactile Sensing, Dexterous Manipulation, Reinforcement Learning
Abstract: Object pushing presents a key non-prehensile manipulation problem that is illustrative of more complex robotic manipulation tasks. While deep reinforcement learning (RL) methods have demonstrated impressive learning capabilities using visual input, a lack of tactile sensing limits their capability for fine and reliable control during manipulation. Here we propose a deep RL approach to object pushing using tactile sensing without visual input, namely tactile pushing. We present a goal-conditioned formulation that allows both model-free and model-based RL to obtain accurate policies for pushing an object to a goal. To achieve real-world performance, we adopt a sim-to-real approach. Our results demonstrate that it is possible to train on a single object and a limited sample of goals to produce precise and reliable policies that can generalize to a variety of unseen objects and pushing scenarios without domain randomization. We experiment with the trained agents in harsh pushing conditions, and show that with significantly more training samples, a model-free policy can outperform a model-based planner, generating shorter and more reliable pushing trajectories despite large disturbances. The simplicity of our training environment and effective real-world performance highlights the value of rich tactile information for fine manipulation.
|
|
10:30-12:00, Paper TuAT16-AX.6 | Add to My Program |
Learning Contact for Haptic Feedback: Switching X-Lateral Teleoperators |
|
Yilmaz, Nural | Marmara University |
Tumerdem, Ugur | Marmara University |
Keywords: Force and Tactile Sensing, Haptics and Haptic Interfaces, Machine Learning for Robot Control
Abstract: In this paper, we propose X-lateral teleoperation: a novel hybrid unilateral-bilateral teleoperation framework. Bilateral teleoperation enables kinesthetic coupling between the operator and the remote environment with haptic feedback. However, in free motion, unlike unilateral teleoperators, bilateral teleoperators reflect undesirable operational forces to the operator. The proposed X-lateral teleoperation framework benefits from a learning-based contact detection algorithm which triggers switching from unilateral teleoperation in free motion to bilateral teleoperation in contact. We also present a neural network based two-class classification technique to detect contacts even with environments not seen in training. In experiments with linear motors and Phantom Omni devices, using sensorless force estimation, we show that the proposed method can decrease operational forces significantly over transparency-optimized bilateral architectures.
|
|
10:30-12:00, Paper TuAT16-AX.7 | Add to My Program |
L3 F-TOUCH: A Wireless GelSight with Decoupled Tactile and Three-Axis Force Sensing |
|
Li, Wanlin | Beijing Institute for General Artificial Intelligence (BIGAI) |
Wang, Meng | Beijing Institute for General Artificial Intelligence |
Li, Jiarui | Peking University |
Su, Yao | Beijing Institute for General Artificial Intelligence |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Qian, Xinyuan | University of Science and Technology Beijing |
Althoefer, Kaspar | Queen Mary University of London |
Liu, Hangxin | Beijing Institute for General Artificial Intelligence (BIGAI) |
Keywords: Force and Tactile Sensing, Mechanism Design
Abstract: GelSight sensors that estimate contact geometry and force by reconstructing the deformation of their soft elastomer from images would yield poor force measurements when the elastomer deforms uniformly or reaches deformation saturation. Here we present L3 F-TOUCH sensor that considerably enhances the three-axis force sensing capability of typical GelSight sensors. Specifically, L3 F-TOUCH sensor comprises: (i) an elastomer structure resembling the classic GelSight sensor design for fine-grained contact geometry sensing; and (ii) a mechanically simple suspension structure to enable three-dimensional elastic displacement of the elastomer structure upon contact. Such displacement is tracked by detecting the displacement of an ARTag and is transformed to three-axis contact force via calibration. We further revamp the sensor’s optical system by fixing the ARTag on the base and reflecting it to the same camera viewing the elastomer through a mirror. As a result, the tactile and force sensing modes can operate independently, but the entire L3 F-TOUCH remains Light-weight and Low-cost while facilitating a wireLess deployment. Evaluations and experiment results demonstrate that the proposed L3 F-TOUCH sensor compromises GelSight’s limitation in force sensing and is more practical compared with equipping commercial three-axis force sensors. Thus, the L3 F-TOUCH could further empower existing Vision-based Tactile Sensors (VBTSs) in replication and deployment.
|
|
10:30-12:00, Paper TuAT16-AX.8 | Add to My Program |
GelLink: A Compact Multi-Phalanx Finger with Vision-Based Tactile Sensing and Proprioception |
|
Ma, Yuxiang | Massachusetts Institute of Technology |
Zhao, Jialiang | Massachusetts Institute of Technology |
Adelson, Edward | MIT |
Keywords: Force and Tactile Sensing, Mechanism Design, Grippers and Other End-Effectors
Abstract: Compared to fully-actuated robotic end-effectors, underactuated ones are generally more adaptive, robust, and cost-effective. However, state estimation for underactuated hands is usually more challenging. Vision-based tactile sensors, like Gelsight, can mitigate this issue by providing high-resolution tactile sensing and accurate proprioceptive sensing. As such, we present GelLink, a compact, underactuated, linkage-driven robotic finger with low-cost, high-resolution vision-based tactile sensing and proprioceptive sensing capabilities. In order to reduce the amount of embedded hardware, i.e. the cameras and motors, we optimize the linkage transmission with a planar linkage mechanism simulator and develop a planar reflection simulator to simplify the tactile sensing hardware. As a result, GelLink only requires one motor to actuate the three phalanges, and one camera to capture tactile signals along the entire finger. Overall, GelLink is a compact robotic finger that shows adaptability and robustness when performing grasping tasks. The integration of vision-based tactile sensors can significantly enhance the capabilities of underactuated fingers and potentially broaden their future usage.
|
|
10:30-12:00, Paper TuAT16-AX.9 | Add to My Program |
RainbowSight: A Family of Generalizable, Curved, Camera-Based Tactile Sensors for Shape Reconstruction |
|
Tippur, Megha | Massachusetts Institute of Technology |
Adelson, Edward | MIT |
Keywords: Force and Tactile Sensing, Mechanism Design, Soft Sensors and Actuators
Abstract: Camera-based tactile sensors can provide high resolution positional and local geometry information for robotic manipulation. Curved and rounded fingers are often advantageous, but it can be difficult to derive illumination systems that work well within curved geometries. To address this issue, we introduce RainbowSight, a family of curved, compact, camera-based tactile sensors which use addressable RGB LEDs illuminated in a novel rainbow spectrum pattern. In addition to being able to scale the illumination scheme to different sensor sizes and shapes to fit on a variety of end effector configurations, the sensors can be easily manufactured and require minimal optical tuning to obtain high resolution depth reconstructions of an object deforming the sensor’s soft elastomer surface. Additionally, we show the advantages of our new hardware design and improvements in calibration methods for accurate depth map generation when compared to alternative lighting methods commonly implemented in previous camera-based tactile sensors. With these advancements, we make the integration of tactile sensors more accessible to roboticists by allowing them the flexibility to easily customize, fabricate, and calibrate camera-based tactile sensors to best fit the needs of their robotic systems.
|
|
TuAT17-AX Oral Session, AX-205 |
Add to My Program |
Legged Robots I |
|
|
Chair: Sugihara, Tomomichi | OMRON Corporation |
Co-Chair: Griffin, Robert J. | Institute for Human and Machine Cognition (IHMC) |
|
10:30-12:00, Paper TuAT17-AX.1 | Add to My Program |
Walking-By-Logic: Signal Temporal Logic-Guided Model Predictive Control for Bipedal Locomotion Resilient to External Perturbations |
|
Gu, Zhaoyuan | Georgia Institute of Technology |
Guo, Rongming | Georgia Institute of Technology |
Yates, William | Georgia Institute of Technology |
Chen, Yipu | Georgia Institute of Technology |
Zhao, Yuntian | Southern University of Science and Technology |
Zhao, Ye | Georgia Institute of Technology |
Keywords: Humanoid and Bipedal Locomotion, Formal Methods in Robotics and Automation, Collision Avoidance
Abstract: This study proposes a novel planning framework based on a model predictive control formulation that incorporates signal temporal logic (STL) specifications for task completion guarantee and robustness quantification. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion push recovery, where the robot experiences unexpected disturbances. Existing recovery strategies often struggle with complex task logic reasoning and locomotion robustness evaluation, making them susceptible to failures due to inappropriate recovery strategies or insufficient robustness. To address this issue, the STL-guided framework generates optimal and safe recovery trajectories that simultaneously satisfy the task specification and maximize the locomotion robustness. Our framework outperforms a state-of-the-art locomotion controller in a high-fidelity dynamic simulation, especially in scenarios involving crossed-leg maneuvers. Furthermore, it demonstrates versatility in tasks such as locomotion on stepping stones, where the robot must select from a set of disjointed footholds to maneuver successfully.
|
|
10:30-12:00, Paper TuAT17-AX.2 | Add to My Program |
Seamless Reaction Strategy for Bipedal Locomotion Exploiting Real-Time Nonlinear Model Predictive Control |
|
Choe, JongHun | KAIST |
Kim, Joon-Ha | Korea Advanced Institute of Science and Technology(KAIST) |
Hong, Seungwoo | MIT (Massachusetts Institute of Technology) |
Lee, Jinoh | German Aerospace Center (DLR) |
Park, Hae-Won | Korea Advanced Institute of Science and Technology |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Humanoid Robot Systems
Abstract: This paper presents a reactive locomotion method for bipedal robots enhancing robustness and external disturbance rejection performance by seamlessly rendering several walking strategies of the ankle, hip, and footstep adjustment. The Nonlinear Model Predictive Control (NMPC) is formulated to take into account nonlinear Divergent Component of Motion (DCM) error dynamics that predicts the future states of the robot in response to the walking strategies. This formulated NMPC enables the seamless application of these strategies improving push disturbance rejection performance. The proposed controller is validated in simulation and through an experiment on a bipedal robot platform, Gazelle, which confirms its effectiveness in real-time.
|
|
10:30-12:00, Paper TuAT17-AX.3 | Add to My Program |
Synthesizing Robust Walking Gaits Via Discrete-Time Barrier Functions with Application to Multi-Contact Exoskeleton Locomotion |
|
Tucker, Maegan | Georgia Institute of Technology |
Li, Kejun | California Institute of Technology |
Ames, Aaron | Caltech |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Hybrid Logical/Dynamical Planning and Verification
Abstract: Successfully achieving bipedal locomotion remains challenging due to real-world factors such as model uncertainty, random disturbances, and imperfect state estimation. In this work, we propose a novel metric for locomotive robustness -- the estimated size of the hybrid forward invariant set associated with the step-to-step dynamics. Here, the forward invariant set can be loosely interpreted as the region of attraction for the discrete-time dynamics. We illustrate the use of this metric towards synthesizing nominal walking gaits using a simulation-in-the-loop learning approach. Further, we leverage discrete-time barrier functions and a sampling-based approach to approximate sets that are maximally forward invariant. Lastly, we experimentally demonstrate that this approach results in successful locomotion for both flat-foot walking and multi-contact walking on the Atalante lower-body exoskeleton.
|
|
10:30-12:00, Paper TuAT17-AX.4 | Add to My Program |
Efficient, Dynamic Locomotion through Step Placement with Straight Legs and Rolling Contacts |
|
Fasano, Stefan | Florida Institute for Human & Machine Cognition |
Foster, James Paul | University of West Florida |
Bertrand, Sylvain | Institute for Human and Machine Cognition |
DeBuys, Christian | Texas A&M University |
Griffin, Robert J. | Institute for Human and Machine Cognition (IHMC) |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Whole-Body Motion Planning and Control
Abstract: For humans, fast, efficient walking over flat ground represents the vast majority of locomotion that an individual experiences on a daily basis, and for an effective, real-world humanoid robot the same will likely be the case. In this work, we propose a locomotion controller for efficient walking over near-flat ground using a relatively simple, model-based controller that utilizes a novel combination of several interesting design features including an ALIP-based step adjustment strategy, stance leg length control as an alternative to center of mass height control, and rolling contact for heel-to-toe motion of the stance foot. We then present the results of this controller on our robot Nadia, both in simulation and on hardware. These results include validation of this controller’s ability to perform fast, reliable forward walking at 0.75 m/s along with backwards walking, side-stepping, turning in place, and push recovery. We also present an efficiency comparison between the proposed control strategy and our baseline walking controller over three steady-state walking speeds. Lastly, we demonstrate some of the benefits of utilizing rolling contact in the stance foot, specifically the reduction of necessary positive and negative work throughout the stride.
|
|
10:30-12:00, Paper TuAT17-AX.5 | Add to My Program |
Unified Motion Planner for Walking, Running, and Jumping Using the Three-Dimensional Divergent Component of Motion |
|
Mesesan, George | German Aerospace Center (DLR) |
Schuller, Robert | German Aerospace Center (DLR) |
Englsberger, Johannes | DLR (German Aerospace Center) |
Ott, Christian | TU Wien |
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Keywords: Humanoid and Bipedal Locomotion, Motion and Path Planning, Humanoid Robots, Legged Robots
Abstract: Running and jumping are locomotion modes that allow legged robots to rapidly traverse great distances and overcome difficult terrain. In this article, we show that the 3-D Divergent Component of Motion (3D-DCM) framework, which was successfully used for generating walking trajectories in previous works, retains its validity and coherence during flight phases, and, therefore, can be used for planning running and jumping motions. We propose a highly efficient motion planner that generates stable center-of-mass (CoM) trajectories for running and jumping with arbitrary contact sequences and time parametrizations. The proposed planner constructs the complete motion plan as a sequence of motion phases that can be of different types: stance, flight, transition phases, etc. We introduce a unified formulation of the CoM and DCM waypoints at the start and end of each motion phase, which makes the framework extensible and enables the efficient waypoint computation in matrix and algorithmic form. The feasibility of the generated reference trajectories is demonstrated by extensive whole-body simulations with the humanoid robot TORO.
|
|
10:30-12:00, Paper TuAT17-AX.6 | Add to My Program |
Data-Driven Latent Space Representation for Robust Bipedal Locomotion Learning |
|
Castillo, Guillermo A. | The Ohio State University |
Weng, Bowen | Iowa State University |
Zhang, Wei | Southern University of Science and Technology |
Hereid, Ayonga | Ohio State University |
Keywords: Humanoid and Bipedal Locomotion, Representation Learning, Legged Robots
Abstract: This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is then used as states for training a robust RL-based gait policy, eliminating the need for heuristic state selections or the use of template models for gait planning. The results demonstrate that the learned latent variables are disentangled and directly correspond to different gaits or speeds, such as moving forward, backward, or walking in place. Compared to traditional template model-based approaches, our framework exhibits superior performance and robustness in simulation. The trained policy effectively tracks a wide range of walking speeds and demonstrates good generalization capabilities to unseen scenarios.
|
|
10:30-12:00, Paper TuAT17-AX.7 | Add to My Program |
LIKO: LiDAR, Inertial, and Kinematic Odometry for Bipedal Robots |
|
Zhao, Qingrui | Beijing Institute of Technology |
Li, Mingyuan | Beijing Institute of Technology |
Shi, Yongliang | Tsinghua University |
Chen, Xuechao | Beijing Insititute of Technology |
Yu, Zhangguo | Beijing Institute of Technology |
Han, Lianqiang | Beijing Institute of Technology |
Fu, Zhenyuan | School of Mechatronic Engineering, Beijing Institute of Technolo |
Zhang, Jintao | Beijing Institute of Technology |
Li, Chao | Beijing Institute of Technology |
Zhang, YuanXi | Beijing Institute of Technology |
Huang, Qiang | Beijing Institute of Technology |
Keywords: Humanoid Robot Systems, Legged Robots, Sensor Fusion
Abstract: High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Additionally, the use of kinematic measurement results in an increased output state frequency of about 1kHz. This ensures temporal continuity of the estimated state and makes it practical for control purposes of biped robots. We also announce a biped robot dataset consisting of LiDAR, inertial measurement unit (IMU), joint encoders, force/torque (F/T) sensors, and motion capture ground truth to evaluate the proposed method. The dataset is collected during robot locomotion, and our approach reached the best quantitative result among other LIO-based methods and biped robot state estimation algorithms. The dataset and source code will be available at https://github.com/Mr-Zqr/LIKO.
|
|
10:30-12:00, Paper TuAT17-AX.8 | Add to My Program |
Barry: A High-Payload and Agile Quadruped Robot |
|
Valsecchi, Giorgio | Robotic System Lab, ETH |
Rudin, Nikita | ETH Zurich, NVIDIA |
Nachtigall, Lennart | ETH Zurich |
Mayer, Konrad | ETH Zurich |
Tischhauser, Fabian | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Legged Robots, Actuation and Joint Mechanisms, Mechanism Design
Abstract: This paper introduces Barry, a dynamically balancing quadruped robot optimized for high payload capabilities and efficiency. It presents a new high-torque and low-inertia leg design, which includes custom-built high-efficiency actuators and transparent, sensorless transmissions. The robot’s reinforcement learning-based controller is trained to fully leverage the new hardware capabilities to balance and steer the robot. The newly developed controller can manage the non-linearities introduced by the new leg design and handle unmodeled payloads up to 90kg while operating at high efficiency. The approach’s efficacy is demonstrated by a high payload-to-weight ratio verified with multiple tests, with a maximum ratio of 2 on flat terrain. Experiments also demonstrate Barry’s power consumption and cost of transport, which converge to a value of 0.7 at 1.4m/s, regardless of the payload mass.
|
|
TuAT18-AX Oral Session, AX-206 |
Add to My Program |
Motion Control I |
|
|
Chair: Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Co-Chair: Ryll, Markus | Technical University Munich |
|
10:30-12:00, Paper TuAT18-AX.1 | Add to My Program |
A Robotic Manipulator Using Dual-Motor Joints: Prototype Design and Anti-Backlash Control |
|
Xu, Jiqian | Northeastern University |
Wang, Huaizhen | Inspur Group |
Zhao, Qiankun | Northeastern University |
Gao, Yue | Northeastern University |
Wan, Yingcai | Northeastern University |
Fang, Lijin | Northeastern University |
Keywords: Motion Control, Actuation and Joint Mechanisms, Redundant Robots
Abstract: This letter focuses on the design and control of a novel seven-degree-of-freedom (7-DOF) robotic manipulator (D-Arm) to address the issue of backlash nonlinearity coupling unknown disturbance through the dual-motor anti-backlash control technology. Specifically, the first three axes of the D-Arm near the base are implemented as dual-motor joints (DMJs), while the remaining four axes are single-motor joints (SMJs), which achieve a more comprehensive performance. For DMJs, which are over-actuated systems, we first discover an internal disturbance phenomenon named servo-conflict and consider it in the controller design. To mitigate the adverse effects of backlash coupling disturbance, an admittance control-based position compensator is proposed. Then, after the backlash elimination, a dual-motor linear active disturbance rejection controller is effectively developed for load tracking task of the DMJ. In the presence of unknown backlash and disturbance, the proposed strategy can improve both transient and steady-state position tracking response, reduce energy consumption without requiring any backlash model information. The effectiveness and simplicity of the developed control strategy are verified through comparative experiments on the D-Arm.
|
|
10:30-12:00, Paper TuAT18-AX.2 | Add to My Program |
CoNi-MPC: Cooperative Non-Inertial Frame Based Model Predictive Control |
|
Zhang, Baozhe | The Chinese University of Hong Kong, Shenzhen |
Chen, Xinwei | Zhejiang University |
Li, Zhehan | Zhejiang University |
Beltrame, Giovanni | Ecole Polytechnique De Montreal |
Xu, Chao | Zhejiang University |
Gao, Fei | Zhejiang University |
Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Keywords: Motion Control, Aerial Systems: Applications
Abstract: This paper presents a novel solution for UAV control in cooperative multi-robot systems, which can be used in various scenarios such as leader-following, landing on a moving base, or specific relative motion with a target. Unlike classical methods that tackle UAV control in the world frame, we directly control the UAV in the target coordinate frame, without making motion assumptions about the target. In detail, we formulate a non- linear model predictive controller of a UAV, referred to as the agent, within a non-inertial frame (i.e., the target frame). The system requires the relative states (pose and velocity), the angular velocity and the accelerations of the target, which can be obtained by relative localization methods and ubiquitous MEMS IMU sensors, respectively. This framework eliminates dependencies that are vital in classical solutions, such as accurate state estimation for both the agent and target, prior knowledge of the target motion model, and continuous trajectory re-planning for some complex tasks. We have performed extensive simulations to investigate the control performance with varying motion characteristics of the target. Furthermore, we conducted real robot experiments, employing either simulated relative pose estimation from motion capture systems indoors or directly from our previous relative pose estimation devices outdoors, to validate the applicability and feasibility of the proposed approach.
|
|
10:30-12:00, Paper TuAT18-AX.3 | Add to My Program |
Uniform Passive Fault-Tolerant Control of a Quadcopter with One, Two, or Three Rotor Failure |
|
Ke, Chenxu | Beihang University |
Cai, Kai-Yuan | Beijing University of Aeronautics and Astronautics |
Quan, Quan | Beihang University |
Keywords: Motion Control, Aerial Systems: Mechanics and Control, Failure Detection and Recovery, Passive Fault-Tolerant Control
Abstract: This study proposes a uniform passive fault-tolerant control (FTC) method for a quadcopter that does not rely on fault information subject to one, two adjacent, two opposite, or three rotor failure. The uniform control implies that the passive FTC is able to cover the condition from quadcopter fault-free to rotor failure without controller switching. To achieve the purpose of the passive FTC, the rotors’ fault is modeled as a lumped disturbance acting on the virtual control of the quadcopter system. The disturbance estimate is used directly for the passive FTC with rotor failure. At the same time, a modified controller structure is designed to achieve the passive FTC ability for two and three rotor failure. To avoid the control allocation switch from the fault-free control to the FTC, a dynamic control allocation is used. In addition, the closed-loop stability is analyzed under up to three rotor failure. To validate the proposed uniform passive FTC method, outdoor experiments are performed for the first time, which have demonstrated that the hovering quadcopter is able to recover from one rotor failure by the proposed controller and continue to fly even if two adj
|
|
10:30-12:00, Paper TuAT18-AX.4 | Add to My Program |
Geometric Slosh-Free Tracking for Robotic Manipulators |
|
Arrizabalaga, Jon | Technical University of Munich (TUM) |
Pries, Lukas | Technical University of Munich (TUM) |
Laha, Riddhiman | Technical University of Munich |
Li, Runkang | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Ryll, Markus | Technical University Munich |
Keywords: Motion Control, Aerial Systems: Mechanics and Control, Industrial Robots
Abstract: This work focuses on the agile transportation of liquids with robotic manipulators. In contrast to existing methods that are either computationally heavy, system/container specific or dependant on a singularity-prone pendulum model, we present a real-time slosh-free tracking technique. This method solely requires the reference trajectory and the robot's kinematic constraints to output kinematically feasible joint space commands. The crucial element underlying this approach consists on mimicking the end-effector's motion through a virtual quadrotor, which is inherently slosh-free and differentially flat, thereby allowing us to calculate a slosh-free reference orientation. Through the utilization of a cascaded proportional-derivative (PD) controller, this slosh-free reference is transformed into task space acceleration commands, which, following the resolution of a Quadratic Program (QP) based on Resolved Acceleration Control (RAC), are translated into a feasible joint configuration. The validity of the proposed approach is demonstrated by simulated and real-world experiments on a 7 DoF Franka Emika Panda robot.
|
|
10:30-12:00, Paper TuAT18-AX.5 | Add to My Program |
Development of a Four-Wheel Steering Scale Vehicle for Research and Education on Autonomous Vehicle Motion Control |
|
Rother, Christopher | Oakland University |
Zhou, Zhaodong | Oakland University |
Chen, Jun | Oakland University |
Keywords: Motion Control, Autonomous Agents, Optimization and Optimal Control
Abstract: Autonomous vehicle motion control development requires testing and evaluation at all stages of the process. The development phase involving the instrumentation and operation of a full-size vehicle can be especially costly. Scale vehicles have been developed in the literature to serve as a cost-effective transition from testing in a simulation environment to a physical system. However, the existing scale vehicle platforms do not support four-wheel steering and cannot isolate the performance of motion control algorithms from other modules such as perception and path planning. This paper closes this gap by proposing a new scale vehicle platform, called JetRacer-4WS, based on the open-source JetRacer autonomous vehicle with additional modifications to support four-wheel steering, model predictive control-based path following, and high-precision ultrasonic-based real-time positioning. The proposed JetRacer-4WS can be used as a low-cost platform for both research and education on motion control, path following, and vehicle dynamics. We describe the design of JetRacer-4WS, and experimentally demonstrate JetRacer-4WS’ ability to perform controller auto-tuning and illustrate the advantage of four-wheel steering. We also show that JetRacer-4WS can be used as a validation platform for testing advanced control algorithms such as event-triggered model predictive control. The associated code is open-sourced and available at: https://github.com/jchenee2015/jetracer-4WS.
|
|
10:30-12:00, Paper TuAT18-AX.6 | Add to My Program |
Online-Learning-Based Distributionally Robust Motion Control with Collision Avoidance for Mobile Robots |
|
Wang, Han | Shanghai Jiao Tong University |
Ning, Chao | Shanghai Jiao Tong University |
Li, Longyan | Shanghai Jiao Tong University |
Zhang, Weidong | Shanghai JiaoTong University |
Keywords: Motion Control, Collision Avoidance, Planning under Uncertainty
Abstract: Collision-free navigation is a critical issue in robotic systems as the environment is often dynamic and uncertain. This paper investigates a data-stream-driven motion control problem for mobile robots to avoid randomly moving obstacles when the probability distribution of the obstacle’s movement is partially observable through data and can be even time varying. A data-stream-driven ambiguity set is firstly constructed from movement data by leveraging a Dirichlet process mixture model and is updated online using real-time data. Then we propose an Online-Learning-based Distributionally Robust Nonlinear Model Predictive Control (OL-DR-NMPC) approach for limiting the risk of collision through considering the worst-case distribution within the ambiguity set. To facilitate solving the OL-DR-NMPC problem, we reformulate it as a finite-dimensional nonlinear optimization problem. To cope with the bilinear matrix inequality constraints in the nonlinear problem, we develop a parabolic relaxation and a sequential algorithm, by which the problem is further transformed into polynomial-time solvable surrogates. The simulations using a quadrotor model are employed to demonstrate the effectiveness and advantages of the proposed method.
|
|
10:30-12:00, Paper TuAT18-AX.7 | Add to My Program |
Prediction of Pose Errors Implied by External Forces Applied on Robots: Towards a Metric for the Control of Collaborative Robots |
|
Fortineau, Vincent | Inria, Talence, France |
Padois, Vincent | Inria Bordeaux |
Daney, David | Inria Centre at the University of Bordeaux, F-33405 Talence, Fra |
Keywords: Motion Control, Compliance and Impedance Control, Physical Human-Robot Interaction
Abstract: The presented work tackles the question of quantifying the pose deviations of robots subject to external disturbance forces. While this question may not be central for large robots perfectly rejecting disturbances through high controller gains, it is an important factor when considering collaborative settings where smaller robots may be deviated from their task because of unmodeled physical interactions. This is all the more true with human-robot collaboration where human capacities may fluctuate over time and have to be compensated by a proper adaptation of the robot control. To move forward in this direction, this work first derives a deviation prediction methodology and exemplifies it using three largely employed control approaches. The proposed prediction method is then validated using simulated and real robot experiments both in single and multiple robots cases. The obtained results constitute a stepping stone towards a quantitative metric for robots adapting their behaviour to human motor fluctuation.
|
|
10:30-12:00, Paper TuAT18-AX.8 | Add to My Program |
Iterative Learning Control for Deformable Open-Frame Cable-Driven Parallel Robots |
|
Cheng, Wuichung | The Chinese University of Hong Kong |
Chan, Ngo Foon | The Chinese University of Hong Kong |
Lau, Darwin | The Chinese University of Hong Kong |
Keywords: Motion Control, Flexible Robotics, Parallel Robots
Abstract: This paper proposed an iterative learning control (ILC) scheme for deformable open-frame cable-driven parallel robots (D-CDPRs). In contrast to the straightforward inverse kinematics of the rigid frame cable-driven parallel robots (CDPRs), accurate modeling of the deformable frame poses challenges due to errors and uncertainties. To address these issues, the authors propose the use of ILC, a control strategy that modifies the control input over iterations based on previous results. ILC has been successfully applied to traditional cable robots, particularly in handling model uncertainty. The paper presents a novel ILC control scheme specifically designed for D-CDPRs, with a focus on reducing tracking errors over repetitive operations. Additionally, hardware experiments are conducted to validate the effectiveness and reliability of the proposed ILC approach. The results demonstrate the efficacy of ILC in mitigating tracking errors, even in scenarios where the dynamic model of the D-CDPRs is unknown.
|
|
10:30-12:00, Paper TuAT18-AX.9 | Add to My Program |
A Study of Force-Free Control Framework for Industrial Manipulator Tasks Based on High-Pass Filter |
|
He, Guanwei | Shenzhen Campus of Sun Yat-Sen University |
Feng, Guodong | School of Information Science and Technology, Sun Yat-Sen Univer |
Ding, Beichen | Sun Yat-Sen University |
Keywords: Motion Control, Industrial Robots, Human-Robot Collaboration
Abstract: Force-free control (FFC) allows for flexible manipulator motion in response to external forces, making it a vital component of human-robot interaction (HRI). Manual intervention may cause uneven forces on the manipulator or frequencies close to the natural frequency, and mechanical resonance can occur due to the inertia of the manipulator and adjustable equivalent stiffness of the controller. This paper proposes an FFC approach for industrial manipulators using a six-axis force/torque sensor (F/T sensor), implemented through a three-layer control architecture, consisting of motion control layer, Admittance Control layer and force decoupling layer. To mitigate the effects of mechanical resonance, a high-pass filter (HPF) is integrated with the F/T sensor and its impact is investigated. Experimental validation is conducted using both a simulation model and an industrial manipulator. Test results indicate that the proposed FFC architecture enables the manipulator not only to interact smoothly with external forces, but also to distinguish load forces at different frequencies and potentially address the issue of mechanical resonance between the manipulator and the applied load forces.
|
|
TuAT19-NT Oral Session, NT-G301 |
Add to My Program |
Medical Robots I |
|
|
Chair: Palopoli, Luigi | University of Trento |
Co-Chair: Nasseri, M. Ali | Technische Universitaet Muenchen |
|
10:30-12:00, Paper TuAT19-NT.1 | Add to My Program |
Uncertainty-Aware Contextual Visualization for Human Supervision of OCT-Guided Autonomous Robotic Subretinal Injection |
|
Sommersperger, Michael | Technical University of Munich |
Dehghani, Shervin | TUM |
Matten, Philipp | Medical University of Vienna |
Roodaki, Hessam | Technische Universität München |
Navab, Nassir | TU Munich |
Keywords: Medical Robots and Systems, Acceptability and Trust, Safety in HRI
Abstract: The injection of therapeutic agents into the subretinal space might allow improved treatment of age-related macular degeneration. Various robotic systems have been developed to achieve the required precision, and in combination with intraoperative Optical Coherence Tomography (iOCT) imaging, methods for autonomous robotic guidance have been proposed. In such systems, the robot’s cognition is often governed by machine learning algorithms, such as convolutional neural networks (CNNs), which provide semantic scene information from iOCT images. Although the robot performs a surgical task autonomously, human supervision is critical to monitor the robot’s execution and, if necessary, stop the robot or take control to avoid trauma to the patient. In this paper, we propose a novel visualization concept for improved human supervision of autonomous robotic subretinal injection that integrates uncertainty information of the data provided to the robot. We design a focus and context visualization that renders an automatically identified instrument-aligned B-scan in the context of the 3D OCT volume. Our visualization is enriched by augmenting the uncertainty information on the instrument-aligned B-scan. To dynamically model task-specific uncertainty, we introduce a weighting scheme to assign an importance factor to each pair of classes, controlling the impact of their confusion on the overall uncertainty. We demonstrate our visualization concept on iOCT volumes acquired at different stages during subretinal injection on ex-vivo porcine eyes. We show that our processing pipeline achieves sufficient update rates for surgical display and discuss the impact of our visualization concept on the acceptance of robotic task autonomy for subretinal injection procedures.
|
|
10:30-12:00, Paper TuAT19-NT.2 | Add to My Program |
A Track-Based Colon Endoscopic Robot with Depth Perception Stereo Cameras for Haustral Fold Detection During Colonic Navigation |
|
He, Shujing | Southern University of Science and Technology |
Zhang, Yujie | Southern University of Science and Technology |
Huang, Baoyi | Southern University of Science and Technology |
Lin, Jie | Guangzhou University of Chinese Medicine |
Shi, Chaoyang | Tianjin University |
Hu, Chengzhi | Southern University of Science and Technology |
Keywords: Medical Robots and Systems, Automation at Micro-Nano Scales
Abstract: Colon endoscopic robots represent a promising screening modality for the visualization of colon cancers with high sensitivity. However, current colonoscopy robots are often characterized by intricate and bulky mechanical structures, which pose practical challenges when moving through the complex and narrow environment of the colon. Moreover, these robots are typically equipped with a single camera, limiting their ability to accurately estimate the depth of haustral folds in colon, which is of great importance for the active colonic navigation of the robots. To address these challenges, we develop a track-based stereoscopic endoscopic robot (TSER) which is equipped with four tracks positioned at the corners of its body. This innovative design maximizes the contact between the tracks and the colon wall, enhancing maneuverability. The tracks are constructed from de-molded polydimethylsiloxane (PDMS) and incorporate micro-patterns on their outer surfaces. We have proposed a straightforward strategy for detecting haustral folds using TSER's stereo camera, which allows for precise identification of their position and depth. The TSER achieves an average motion speed of 9.8 mm/s in a bellows tube that contains silicone oil and a speed of 5.2 mm/s in an ex-vivo porcine intestinal segment. Impressively, the TSER boasts an 88.11% accuracy rate in haustral fold depth estimation, surpassing the performance of existing geometric shape fitting methods. These results demonstrate that the TSER holds great potential for effective and efficient movement and inspection within the colon, offering a promising solution for improved colon cancer screening.
|
|
10:30-12:00, Paper TuAT19-NT.3 | Add to My Program |
Caveats on the First-Generation Da Vinci Research Kit: Latent Technical Constraints and Essential Calibrations |
|
Cui, Zejian | Imperial College London |
Cartucho, João | Imperial College London |
Giannarou, Stamatia | Imperial College London |
Rodriguez y Baena, Ferdinando | Imperial College, London, UK |
Keywords: Medical Robots and Systems, Calibration and Identification, Kinematics
Abstract: Telesurgical robotic systems provide a well established form of assistance in the operating theater, with evidence of growing uptake in recent years. Until now, the da Vinci surgical system has been the most widely adopted robot of this kind. To accelerate research on robotic-assisted surgery, the retired first-generation da Vinci robots have been redeployed for research use as "da Vinci Research Kits" (dVRKs), which have been distributed to research institutions around the world. In the past ten years, a great amount of research on the dVRK has been carried out across a vast range of research topics. During this extensive and distributed process, common technical issues have been identified that are buried deep within the dVRK research and development architecture. This paper gathers and analyzes the most significant of these, with a focus on the technical constraints of the first-generation dVRK, which both existing and prospective users should be aware of before embarking onto dVRK-related research. The hope is that this review will aid users in identifying and addressing common limitations of the systems promptly, thus helping to accelerate progress in the field.
|
|
10:30-12:00, Paper TuAT19-NT.4 | Add to My Program |
A Passive Variable Impedance Control Strategy with Viscoelastic Parameters Estimation of Soft Tissues for Safe Ultrasonography |
|
Beber, Luca | University of Trento |
Lamon, Edoardo | University of Trento |
Nardi, Davide | University of Trento |
Fontanelli, Daniele | University of Trento |
Saveriano, Matteo | University of Trento |
Palopoli, Luigi | University of Trento |
Keywords: Medical Robots and Systems, Compliance and Impedance Control, Physical Human-Robot Interaction
Abstract: In the context of telehealth, robotic approaches have proven a valuable solution to in-person visits in remote areas, with decreased costs for patients and infection risks. In particular, in ultrasonography, robots have the potential to reproduce the skills required to acquire high-quality images while reducing the sonographer's physical efforts. In this paper, we address the control of the interaction of the probe with the patient's body, a critical aspect of ensuring safe and effective ultrasonography. We introduce a novel approach based on variable impedance control, allowing the real-time optimisation of compliant controller parameters during ultrasound procedures. This optimisation is formulated as a quadratic programming problem and incorporates physical constraints derived from viscoelastic parameter estimations. Safety and passivity constraints, including an energy tank, are also integrated to minimise potential risks during human-robot interaction. The proposed method's efficacy is demonstrated through experiments on a patient's dummy torso, highlighting its potential for achieving safe behaviour and accurate force control during ultrasound procedures, even in cases of contact loss.
|
|
10:30-12:00, Paper TuAT19-NT.5 | Add to My Program |
A Robotic System for Transanal Endoscopic Microsurgery: Design, Dexterity Optimization and Prototyping |
|
Li, Jichen | Tianjin University |
Wang, Shuxin | Tianjin University |
Zhang, Zhiqiang | University of Leeds |
Shi, Chaoyang | Tianjin University |
Keywords: Medical Robots and Systems, Compliant Joints and Mechanisms, Surgical Robotics: Steerable Catheters/Needles
Abstract: The paper introduces a novel robotic system for transanal endoscopic microsurgery (TEM) with a master-slave operated configuration. This slave manipulator features a modular distal continuum section, comprising two 7-DoF surgical instruments and a 5-DoF endoscopic arm designed to enhance hand-eye coordination and instrument triangulation in narrow and shallow rectal spaces. Key innovations include the hybrid coaxial continuum unit (HCCU) for improved bending characteristics and structural stiffness, and a design optimization for dexterity under anatomical constraints. Experimental validations demonstrate the system's precision and capability in simulated surgical tasks, highlighting its potential for advanced TEM applications with improved operational dexterity and reduced view obstruction.
|
|
10:30-12:00, Paper TuAT19-NT.6 | Add to My Program |
Implicit Neural Representations for Breathing-Compensated Volume Reconstruction in Robotic Ultrasound |
|
Velikova, Yordanka | TU Munich |
Azampour, Mohammad Farid | Technical Univeristy of Munich |
Simson, Walter | Technical University Munich |
Esposito, Marco | ImFusion GmbH |
Navab, Nassir | TU Munich |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Deep Learning Methods
Abstract: Ultrasound (US) imaging is widely used in diagnosing and staging abdominal diseases due to its lack of non-ionizing radiation and prevalent availability. However, significant inter-operator variability and inconsistent image acquisition hinder the widespread adoption of broader screening programs. Robotic ultrasound systems have emerged as a promising solution, offering standardized acquisition protocols and the possibility of automated acquisition. Additionally, robotic ultrasound systems enable access to 3D data via robotic tracking and incoherent compounding of ultrasound frames, which results in improved interpretation and disease diagnosis. However, the interpretability of 3D ultrasound reconstruction of abdominal images can be affected by the patient's breathing motion. This study introduces a method to compensate for breathing motion in 3D ultrasound compounding by leveraging implicit neural representations. Our approach employs a robotic ultrasound system for automated screenings. To demonstrate the method's effectiveness, we evaluate our proposed method for the diagnosis and monitoring of abdominal aorta aneurysms as a representative use case. Our experiments demonstrate that our proposed pipeline facilitates robust automated robotic acquisition, mitigating artifacts from breathing motion, and yields smoother 3D reconstructions for enhanced screening and medical diagnosis.
|
|
10:30-12:00, Paper TuAT19-NT.7 | Add to My Program |
Shadow-Based 3D Pose Estimation of Intraocular Instrument Using Only 2D Images |
|
Yang, Junjie | TUM |
Zhao, Zhihao | Technische Universität München |
Maier, Mathias | Klinikum Rechts Der Isar Der TU München |
Huang, Kai | Sun Yat-Sen University |
Navab, Nassir | TU Munich |
Nasseri, M. Ali | Technische Universitaet Muenchen |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Localization
Abstract: In ophthalmic surgeries, such as vitreoretinal operations, surgeons rely on imaging systems, primarily microscopes, for real-time instrument monitoring and motion planning. However, novice surgeons struggle to estimate instrument positions from 2D microscope frames, necessitating extensive trial-and-error experience with the background that additional imaging modalities such as iOCT remain inaccessible in most operating rooms. Targeting intraocular assessment within the current surgical setup, this paper presents an image-based pose estimation method to obtain real-time instrument positions in a standard 12mm-radius spherical eyeball model, achieved by linking floating instruments with on-the-retinal objects based on the intraocular shadowing principle. We validate this pose-estimation method in a Unity simulator and verify its depth estimation capability using a specially designed eyeball phantom. Both simulator and phantom experiments demonstrate an average needle-tip estimation error within [1.0, 2.0] mm using only 2D microscope frames.
|
|
10:30-12:00, Paper TuAT19-NT.8 | Add to My Program |
Skeleton Graph-Based Ultrasound-CT Non-Rigid Registration |
|
Jiang, Zhongliang | Technical University of Munich |
Li, Xuesong | Technical University of Munich |
Zhang, Chenyu | Technical University of Munich |
Bi, Yuan | TUM |
Stechele, Walter | Technical University of Munich |
Navab, Nassir | TU Munich |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Robotics and Automation in Life Sciences
Abstract: Autonomous ultrasound (US) scanning has attracted increased attention, and it has been seen as a potential solution to overcome the limitations of conventional US examinations, such as inter-operator variations. However, it is still challenging to autonomously and accurately transfer a planned scan trajectory on a generic atlas to the current setup for different patients, particularly for thorax applications with limited acoustic windows. To address this challenge, we proposed a skeleton graph-based non-rigid registration to adapt patient-specific properties using subcutaneous bone surface features rather than the skin surface. To this end, the self-organization mapping is successively used twice to unify the input point cloud and extract the key points, respectively. Afterward, the minimal spanning tree is employed to generate a tree graph to connect all extracted key points. To appropriately characterize the rib cartilage outline to match the source and target point cloud, the path extracted from the tree graph is optimized by maximally maintaining continuity throughout each rib. To validate the proposed approach, we manually extract the US cartilage point cloud from one volunteer and seven CT cartilage point clouds from different patients. The results demonstrate that the proposed graph-based registration is more effective and robust in adapting to the inter-patient variations than the ICP (distance error mean±SD: 5.0±1.9mm vs 8.6±6.7mm on seven CTs).
|
|
10:30-12:00, Paper TuAT19-NT.9 | Add to My Program |
Automated Image Acquisition of Parasternal Long-Axis View with Robotic Echocardiography |
|
Shida, Yuuki | Waseda University |
Kumagai, Souto | Waseda University |
Tsumura, Ryosuke | National Institute of Advanced Industrial Science and Technology |
Iwata, Hiroyasu | Waseda University |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: This study proposes a method for finding a parasternal long-axis view in echocardiography autonomously with a robotic ultrasound (US) system. In obtaining this view, it is necessary to avoid the ribs and lungs because they reduce the clarity of US image. Meanwhile, the anatomical position and size of the heart, lungs, and ribs differ between individuals, which makes it difficult to find the optimal position of the US probe. Our proposed system is comprised of the following three processes. First, an exhaustive scan of the chest wall region is performed. The position of the probe that allows the mitral valve to be centrally positioned is estimated based on this scan. Second, the probe is rotated once in the yaw direction while being fixed in that position. The yaw angle is estimated at a point parallel to the left ventricular longitudinal axis in the acquired images. Finally, the pitch angle of the probe is estimated so that the probe avoids the connection between the mitral valve and the papillary muscle and chordae. To validate the proposed method, we performed human trials with five healthy subjects and measured the detection rate of observation points used to evaluate the image quality of parasternal long-axis view. The result showed that the median detection rate of the observation points was 63.3 ± 5.3%, which implies that the proposed method is valid.
|
|
TuAT20-NT Oral Session, NT-G302 |
Add to My Program |
Mobile Manipulation |
|
|
Chair: Chalvatzaki, Georgia | Technische Universität Darmstadt |
Co-Chair: Zhou, Boyu | Sun Yat-Sen University |
|
10:30-12:00, Paper TuAT20-NT.1 | Add to My Program |
Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark |
|
Li, Juncheng | Purdue University |
Cappelleri, David | Purdue University |
Keywords: Mobile Manipulation, Grasping, Deep Learning in Robotics and Automation, Suction Cup Gripper
Abstract: This paper presents Sim-Suction, a robust suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments. We address the lack of large-scale, accurately-annotated suction grasp datasets by proposing a benchmark synthetic dataset, Sim-Suction-Dataset. It comprises 500 cluttered environments with 3.2 million annotated suction grasp poses. The dataset generation process combines analytical models and dynamic physical simulations to create fast and accurate suction grasp pose annotations. We introduce Sim-Suction-Pointnet to generate robust 6D suction grasp poses by learning point-wise affordances from the Sim-Suction-Dataset, leveraging the synergy of zero-shot text-to-segmentation. Real-world experiments for picking up all objects demonstrate that Sim-Suction-Pointnet achieves success rates of 96.76%, 94.23%, and 92.39% on cluttered level 1 objects, cluttered level 2 objects, and cluttered mixed objects, respectively. The Sim-Suction policies outperform state-of-the-art benchmarks tested by approximately 21% in cluttered mixed scenes.
|
|
10:30-12:00, Paper TuAT20-NT.2 | Add to My Program |
Robot Task Planning under Local Observability |
|
Merlin, Max | Brown University |
Parr, Shane | University of Massachusetts Amherst |
Parikh, Neev | Brown University |
Orozco, Sergio | Brown University |
Gupta, Vedant | Brown University |
Rosen, Eric | Brown University |
Konidaris, George | Brown University |
Keywords: Mobile Manipulation, Planning under Uncertainty, Task Planning
Abstract: Real-world robot task planning is intractable in part due to partial observability. A common approach to reducing complexity is introducing additional structure into the decision process, such as mixed-observability, factored states, or temporally-extended actions. We propose the locally observable Markov decision process, a novel formulation that models task-level planning where uncertainty pertains to object- level attributes and where a robot has subroutines for seeking and accurately observing objects. This models sensors that are range-limited and line-of-sight—objects occluded or outside sensor range are unobserved, but the attributes of objects that fall within sensor view can be resolved via repeated observation. Our model results in a three-stage planning process: first, the robot plans using only observed objects; if that fails, it generates a target object that, if observed, could result in a feasible plan; finally, it attempts to locate and observe the target, replanning after each newly observed object. By combining LOMDPs with off-the-shelf Markov planners, we outperform state-of-the-art-solvers for both object-oriented POMDP and MDP analogues with the same task specification. We then apply the formulation to successfully solve a task on a mobile robot.
|
|
10:30-12:00, Paper TuAT20-NT.3 | Add to My Program |
Real-Time Whole-Body Motion Planning for Mobile Manipulators Using Environment-Adaptive Search and Spatial-Temporal Optimization |
|
Wu, Chengkai | Harbin Institute of Technology, Shenzhen |
Wang, Ruilin | Sun Yat-Sen University |
Song, Mianzhi | Sun Yat-Sen University |
Gao, Fei | Zhejiang University |
Mei, Jie | City University of Hong Kong |
Zhou, Boyu | Sun Yat-Sen University |
Keywords: Mobile Manipulation, Manipulation Planning, Motion and Path Planning
Abstract: Mobile manipulators have recently gained significant attention in the robotics community due to their superior potential in industrial and service applications. However, the high degree of freedom associated with mobile manipulators poses challenges in achieving real-time whole-body motion planning. To bridge the gap, this paper presents a motion planning method capable of generating high-quality, safe, agile and feasible trajectories for mobile manipulators in real time. First, we present a novel environment-adaptive path searching method, which can generate paths in real-time in various environments by adaptively adjusting searching dimension based on environment complexity. Additionally, we propose a real-time spatial-temporal trajectory optimization method that takes into account the whole-body safety, agility and dynamic feasibility of mobile manipulators. Moreover, task constraints are applied to ensure that the trajectory can fulfill specific task requirements. Simulation and real-world experiments demonstrate that our method is capable of generating whole-body trajectories in real-time in challenging environments. We will release our code to benefit the community.
|
|
10:30-12:00, Paper TuAT20-NT.4 | Add to My Program |
Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation |
|
Schmalstieg, Fabian | University of Freiburg |
Honerkamp, Daniel | Albert Ludwigs Universität Freiburg |
Welschehold, Tim | Albert-Ludwigs-Universität Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Mobile Manipulation, AI-Enabled Robotics, Service Robotics
Abstract: Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real world that demonstrate that, with accurate perception, the decision making of HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.
|
|
10:30-12:00, Paper TuAT20-NT.5 | Add to My Program |
Keep It Upright: Model Predictive Control for Nonprehensile Object Transportation with Obstacle Avoidance on a Mobile Manipulator |
|
Heins, Adam | University of Toronto |
Schoellig, Angela P. | TU Munich |
Keywords: Mobile Manipulation, Whole-Body Motion Planning and Control
Abstract: We consider a nonprehensile manipulation task in which a mobile manipulator must balance objects on its end effector without grasping them---known as the waiter's problem---and move to a desired location while avoiding static and dynamic obstacles. In contrast to existing approaches, our focus is on fast online planning in response to new and changing environments. Our main contribution is a whole-body constrained model predictive controller (MPC) for a mobile manipulator that balances objects and avoids collisions. Furthermore, we propose planning using the minimum statically-feasible friction coefficients, which provides robustness to frictional uncertainty and other force disturbances while also substantially reducing the compute time required to update the MPC policy. Simulations and hardware experiments on a velocity-controlled mobile manipulator with up to seven balanced objects, stacked objects, and various obstacles show that our approach can handle a variety of conditions that have not been previously demonstrated, with end effector speeds and accelerations up to 2.0 m/s and 7.9 m/s^2, respectively. Notably, we demonstrate a projectile avoidance task in which the robot avoids a thrown ball while balancing a tall bottle.
|
|
10:30-12:00, Paper TuAT20-NT.6 | Add to My Program |
Gaussian Mixture Likelihood-Based Adaptive MPC for Interactive Mobile Manipulators |
|
Rakovitis, Dimitrios | DFKI |
Mronga, Dennis | University of Bremen, German Research Center for Artificial Inte |
Keywords: Mobile Manipulation, Optimization and Optimal Control, Model Learning for Control
Abstract: Mobile robots are nowadays frequently used for interaction tasks in the real world, e.g. for opening doors or for pick-and-place tasks. When used in real-world environments, adapting the robot controllers to uncertain contact dynamics is a significant challenge. Adaptive Model Predictive Control (AMPC) is an approach for controlling robot motions while adapting to uncertain or changing dynamics. However, most of the existing AMPC approaches used in mobile manipulation require either expert tuning or extensive training, making it very difficult to introduce novel or diverse tasks. In addition, the adjustment of several, independent environment parameters is usually not considered in the AMPC formulation. In this work, we introduce a hierarchical approach that uses Gaussian Mixture Models (GMMs) and Gaussian Mixture Regression (GMR) to predict the dynamic model parameters of MPC based on proprioceptive measurements and perform tasks with multiple unknown environmental parameters. The approach is evaluated in simulation and in real experiments on a mobile manipulator and compared to several baseline methods. It is shown that it outperforms standard MPC and an existing AMPC approach on several tasks such as carrying, pushing, and door opening.
|
|
10:30-12:00, Paper TuAT20-NT.7 | Add to My Program |
GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning Based on Online Grasping Pose Fusion |
|
Zhang, Jiazhao | Peking University |
Nandiraju, Gireesh | IIIT Hyderabad |
Wang, Jilong | University of California Santa Cruz |
Fang, Xiaomeng | Beijing Academy of Artificial Intelligence |
Xu, Chaoyi | BAAI |
Chen, Weiguang | Beijing University of Posts and Telecommunications |
Dai, Liu | Tongji University |
Wang, He | Peking University |
Keywords: Mobile Manipulation, Deep Learning in Grasping and Manipulation, AI-Based Methods
Abstract: Mobile manipulation constitutes a fundamental task for robotic assistants and garners significant attention within the robotics community. A critical challenge inherent in mobile manipulation is the effective observation of the target while approaching it for grasping. In this work, we propose a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation. Specifically, the predicted grasping poses are online organized to eliminate the redundant, outlier grasping poses, which can be encoded as a grasping pose observation state for reinforcement learning. Moreover, on-the-fly fusing the grasping poses enables a direct assessment of graspability, encompassing both the quantity and quality of grasping poses. This assessment can subsequently serve as an observe-to-grasp reward, motivating the agent to prioritize actions that yield detailed observations while approaching the target object for grasping. Through extensive experiments conducted on the Habitat and Isaac Gym simulators, we find that our method attains a good balance between observation and manipulation, yielding high performance under various grasping metrics. Furthermore, we discover that the incorporation of temporal information from grasping poses aids in mitigating the sim-to-real gap, leading to robust performance in challenging real-world experiments.
|
|
10:30-12:00, Paper TuAT20-NT.8 | Add to My Program |
Dynamic Interaction Control in Legged Mobile Manipulators: A Decoupled Approach |
|
Li, Qikai | Beihang University |
Meng, Qinchen | Beihang University |
Qin, Yuxing | Beihang University |
Chen, Jiawei | Beihang University |
Ding, Xilun | Beijing Univerisity of Aeronautics and Astronautics |
Xu, Kun | Beijing University |
Keywords: Mobile Manipulation, Legged Robots, Compliance and Impedance Control
Abstract: Legged mobile manipulators are receiving much more attention. Mobile platforms can infinitely expand the workspace of robotic arms, providing more possibilities for robot application scenarios. Compared with wheeled mobile manipulators, legged mobile manipulators have higher requirements for cooperative control of legged robots and robotic arms. This work decouples the control of the robotic arm and the legged robot. On the legged robot side, we explicitly estimate the wrench exerted by the robotic arm on the base and bring it into the legged robot's dynamics, and then use a nonlinear model predictive controller (NMPC) to control the legged robot. On the robotics arm side, we adopt an impedance controller to realize the end-effector's force control, and the introduction of impedance control has improved the safety and interactivity of legged mobile manipulators. We conducted experiments on physical robot to compare the differences between decoupled control and independent control, and the results show that the stability and robustness of robot systems have improved using decoupled control.
|
|
10:30-12:00, Paper TuAT20-NT.9 | Add to My Program |
Active-Perceptive Motion Generation for Mobile Manipulation |
|
Jauhri, Snehal | TU Darmstadt |
Lueth, Sophie C. | Technical University of Darmstadt, Stanford University |
Chalvatzaki, Georgia | Technische Universität Darmstadt |
Keywords: Mobile Manipulation, Perception for Grasping and Manipulation, Reactive and Sensor-Based Planning
Abstract: Mobile Manipulation (MoMa) systems incorporate the benefits of mobility and dexterity, due to the enlarged space in which they can move and interact with their environment. However, even when equipped with onboard sensors, e.g., an embodied camera, extracting task-relevant visual information in unstructured and cluttered environments, such as households, remains challenging. In this work, we introduce an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks, such as grasping in unknown, cluttered scenes. Our proposed approach, ActPerMoMa, generates robot paths in a receding horizon fashion by sampling paths and computing path-wise utilities. These utilities trade-off maximizing the visual Information Gain (IG) for scene reconstruction and the task-oriented objective, e.g., grasp success, by maximizing grasp reachability. We show the efficacy of our method in simulated experiments with a dual-arm TIAGo++ MoMa robot performing mobile grasping in cluttered scenes with obstacles. We empirically analyze the contribution of various utilities and parameters, and compare against representative baselines both with and without active perception objectives. Finally, we demonstrate the transfer of our mobile grasping strategy to the real world, indicating a promising direction for active-perceptive MoMa.
|
|
TuAT21-NT Oral Session, NT-G303 |
Add to My Program |
Bioinspired Locomotion |
|
|
Chair: Nakanishi, Jun | Meijo University |
Co-Chair: Ijspeert, Auke | EPFL |
|
10:30-12:00, Paper TuAT21-NT.1 | Add to My Program |
Visual CPG-RL: Learning Central Pattern Generators for Visually-Guided Quadruped Locomotion |
|
Bellegarda, Guillaume | EPFL |
Shafiee, Milad | EPFL |
Ijspeert, Auke | EPFL |
Keywords: Biologically-Inspired Robots, Bioinspired Robot Learning, Legged Robots
Abstract: We present a framework for learning visually-guided quadruped locomotion by integrating exteroceptive sensing and central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework. Through both exteroceptive and proprioceptive sensing, the agent learns to coordinate rhythmic behavior among different oscillators to track velocity commands, while at the same time override these commands to avoid collisions with the environment. We investigate several open robotics and neuroscience questions: 1) What is the role of explicit interoscillator couplings between oscillators, and can such coupling improve sim-to-real transfer for navigation robustness? 2) What are the effects of using a memory-enabled vs. a memory-free policy network with respect to robustness, energy-efficiency, and tracking performance in sim-to-real navigation tasks? 3) How do animals manage to tolerate high sensorimotor delays, yet still produce smooth and robust gaits? To answer these questions, we train our perceptive locomotion policies in simulation and perform sim-to-real transfers to the Unitree Go1 quadruped, where we observe robust navigation in a variety of scenarios. Our results show that the CPG, explicit interoscillator couplings, and memory-enabled policy representations are all beneficial for energy efficiency, robustness to noise and sensory delays of 90 ms, and tracking performance for successful sim-to-real transfer for navigation tasks.
|
|
10:30-12:00, Paper TuAT21-NT.2 | Add to My Program |
Form Closure for Fully Actuated and Robust Obstacle-Aided Locomotion in Snake Robots |
|
Løwer, Jostein | Norwegian University of Science and Technology |
Gravdahl, Irja | Norwegian University of Science and Technology |
Varagnolo, Damiano | Norwegian University of Science and Technology |
Stavdahl, Øyvind | Norwegian University of Science and Technology (NTNU) |
Keywords: Biologically-Inspired Robots, Biomimetics, Multi-Contact Whole-Body Motion Planning and Control
Abstract: In this paper we adapt the theory of form closure to define the form closed region, i.e., the subset of a snake robot's configuration space for which the constraints imposed by the obstacles in its environment render the system fully actuated. We show that the identification of form closed configurations is numerically feasible, and introduce the relaxed condition of form boundedness to achieve robustness in the presence of model uncertainties. We moreover show an example application where the concept of form closed region is used to produce predictable constrained motion in a cluttered environment using lateral undulation.
|
|
10:30-12:00, Paper TuAT21-NT.3 | Add to My Program |
Self-Righting Shell for Robotic Hexapod |
|
King, Katelyn | University of Michigan |
Revzen, Shai | University of Michigan |
Keywords: Biologically-Inspired Robots, Legged Robots
Abstract: Decimeter scale robots in human environments are small relative to obstacles they encounter, making them prone to flipping over and needing to self-right. We present a multi-faceted shell that by its geometry alone enables the hexapedal robot MediumANT to passively self-right without the need for additional sensory feedback. We designed the shell by specifying the cross-sectional geometry in the yz and xy planes such that the robot returns to an upright position by rolling around the longitudinal (x) axis, and then tweaked the design to reduce the number of faces. We then attached the shell to the robot by modifying some of its chassis structural plates to extend to and support the shell. We evaluated the effectiveness of the shell in two experimental scenarios: passive righting – balancing the robot on each face of the shell before releasing the robot – and an intentional fall – walking the robot off a ledge at various approach angles. As intended by our design, the robot recovered the upright orientation from all starting faces in the passive righting test and righted itself and continued walking in all falling trials. This work presents an example of using biologically inspired simplicity to solve what would otherwise be a technically challenging problem.
|
|
10:30-12:00, Paper TuAT21-NT.4 | Add to My Program |
Quadruped-Frog: Rapid Online Optimization of Continuous Quadruped Jumping |
|
Bellegarda, Guillaume | EPFL |
Shafiee, Milad | EPFL |
Özberk, Merih Ekin | École Polytechnique Fédérale De Lausanne |
Ijspeert, Auke | EPFL |
Keywords: Biologically-Inspired Robots, Legged Robots
Abstract: Legged robots are becoming increasingly agile in exhibiting dynamic behaviors such as running and jumping. Usually, such behaviors are either optimized and engineered offline (i.e. the behavior is designed for before it is needed), either through model-based trajectory optimization, or through deep learning-based methods involving millions of timesteps of simulation interactions. Notably, such offline-designed locomotion controllers cannot perfectly model the true dynamics of the system, such as the motor dynamics. In contrast, in this paper, we consider a quadruped jumping task that we rapidly optimize online. We design foot force profiles parameterized by only a few parameters which we optimize for directly on hardware with Bayesian Optimization. The force profiles are tracked at the joint level, and added to Cartesian PD impedance control and Virtual Model Control to stabilize the jumping motions. After optimization, which takes only a handful of jumps, we show that this control architecture is capable of diverse and omnidirectional jumps including forward, lateral, and twist (turning) jumps, even on uneven terrain, enabling the Unitree Go1 quadruped to jump 0.5 m high, 0.5 m forward, and jump-turn over 2 rad.
|
|
10:30-12:00, Paper TuAT21-NT.5 | Add to My Program |
AeroDima: Cheetah-Inspired Aerodynamic Tail Design for Rapid Maneuverability |
|
Bright, Daryn | University of Cape Town |
Shield, Stacey Leigh | University of Cape Town |
Patel, Amir | University of Cape Town |
Keywords: Biologically-Inspired Robots, Mechanism Design, Wheeled Robots
Abstract: Scientists have long theorized that the cheetah’s tail contributes to its impressive maneuvrability at high speeds by stabilizing its body. This has inspired the design of several agile robots, including Dima - a wheeled platform that used cheetah-inspired inertial tail swings to better execute rapid acceleration and turning motions. Subsequent research suggests that the effectiveness of the cheetah’s tail might be enhanced by aerodynamic effects. In this paper, we introduce AeroDima: a follow-up to the original Dima design that uses aerodynamic drag on the tail as the primary mechanism for generating the stabilizing torque. The resulting sail-like tail is substantially lighter than the original, but still improves the performance of the platform, allowing it to enter turns at a higher speed without toppling. While the yaw rate of the robot was actually higher without the tail, the tail substantially reduced unwanted roll, confirming that this appendage increases maneuvrability by increasing stability, rather than by directly contributing to lateral acceleration.
|
|
10:30-12:00, Paper TuAT21-NT.6 | Add to My Program |
Spined Torso Renders Advanced Mobility for Quadrupedal Locomotion |
|
Wang, Jichao | University of Science and Technology of China |
Cheng, Jinyu | University of Science and Technology of China |
Hu, Jiangtao | University of Science and Technology of China |
Gao, Wei | University of Science and Technology of China |
Zhang, Shiwu | University of Science and Technology of China |
Keywords: Biologically-Inspired Robots, Model Learning for Control, Reinforcement Learning
Abstract: Animals possessing spinal columns often exhibit exceptional agility for highly dynamic locomotion. The spine grants the trunk with increased degrees of freedom, thereby endowing diverse postures. This paper presents the development of a robot STRAY for quadrupedal locomotion, featuring a four-degree-of-freedom spine design. Using trajectory based reinforcement learning techniques, STRAY is able to trot and bound dynamically using its spine. Simulation results reveal the positive roles of spinal movement, such as twisting, extension, retraction and rotation, in helping STRAY realize efficient locomotion. Preliminary results from experiments demonstrate that STRAY can achieve a trotting gait of approximately 0.6 m/s and a bounding gait of 0.7 m/s, with desired velocities of 0.8 m/s and 1.0 m/s, respectively. The results also indicate that reinforcement learning is a feasible way to investigate how the spine should be used in dynamic quadrupedal locomotion and achieve more possibilities in the future.
|
|
10:30-12:00, Paper TuAT21-NT.7 | Add to My Program |
Pegasus: A Novel Bio-Inspired Quadruped Robot with Underactuated Wheeled-Legged Mechanism |
|
Pan, Yuzhen | Fudan University |
Khan, Rezwan Al Islam | Fudan University |
Zhang, Chenyun | Fudan University |
Anzheng, Zhang | Fudan University |
Shang, Huiliang | Fudan University |
Keywords: Biologically-Inspired Robots, Motion Control, Dynamics
Abstract: This paper presents the design and analysis of Pegasus, a quadrupedal wheeled robot grounded in biomimicry principles. Pegasus offers two distinct motion modes, including a wheeled motion and a hybrid wheeled-legged motion, enabling adaptability across various tasks and environmental conditions. The robot draws inspiration from the joint structures of quadruped animals and incorporates biomimetic features. At the robot's ankle joint, we imitate the articulation of a radius-ulna joint to enhance the wheeled motion's agility. Additionally, we establish comprehensive mathematical models for adaptive dynamics model, providing a robust theoretical foundation for subsequent motion planning and high-precision control. A novel telescopic vehicle mode is also proposed for complex wheel-leg hybrid motion, offering optimized solutions for intricate robot locomotion. Furthermore, we employ parallel underactuated MPC controllers for each leg at the control level, contributing to heightened motion precision and stability. Extensive validation through physical platform experiments highlights the effectiveness and feasibility of the proposed controllers, offering substantial support for real-world applications in robotics.
|
|
10:30-12:00, Paper TuAT21-NT.8 | Add to My Program |
LeapRun: A Dynamic Soft Robot with Running and Jumping Capabilities |
|
Lu, Jiangfeng | Tsinghua University |
Liang, Jiaming | Tencent |
Zhu, Dekuan | Tsinghua University |
Wang, Dongkai | Tsinghua University, Tsinghua Shenzhen International Graduate Sc |
Liu, Ying | Tsinghua University |
Chen, Huimin | Tsinghua University, Tsinghua Shenzhen International Graduate Sc |
Bai, Yunfei | Tsinghua University, Shenzhen International Graduate School |
Zhang, Haolong | Tsinghua University |
Zhang, Min | Tsinghua University |
Keywords: Biologically-Inspired Robots, Soft Robot Materials and Design, Soft Robot Applications
Abstract: In the natural world, insects exhibit remarkable locomotion capabilities through a combination of running and jumping. However, replicating this versatile locomotion in a soft robot poses technical and design complexities. Here, we propose a dynamic soft robot named LeapRun that possesses agile locomotion and the ability to perform continuous jumping. To achieve this, a prototype soft robot (weight of 300 mg, size of 30 mm × 15 mm × 5 mm), composed of piezoelectric thin film, shape memory alloy, magnet-locking mechanism, and corresponding support structures, is fabricated. Experimental results demonstrate a maximum moving speed of 15 cm/s and a maximum jumping height of 8.7 cm. Continuous jumping of steps and crossing of complex rugged surfaces is realized. Besides, integrated with the power source, wireless communication module, and control module, the untethered operation is also presented, showcasing the potential for multiple applications in search and rescue, exploration, and monitoring.
|
|
10:30-12:00, Paper TuAT21-NT.9 | Add to My Program |
Machine Learning-Driven Burrowing with a Snake-Like Robot |
|
Even, Sean | University of Notre Dame |
Gordon, Holden | Santa Clara University |
Yang, Hoeseok | Santa Clara University |
Ozkan-Aydin, Yasemin | University of Notre Dame |
Keywords: Machine Learning for Robot Control, Biomimetics, Bioinspired Robot Learning
Abstract: Subterranean burrowing is inherently difficult for robots because of the high forces experienced as well as the high amount of uncertainty in this domain. Because of the difficulty in modeling forces in granular media, we propose the use of a novel machine-learning control strategy to obtain optimal techniques for vertical self-burrowing. In this paper, we realize a snake-like bio-inspired robot that is equipped with an IMU and two triple-axis magnetometers. Utilizing magnetic field strength as an analog for depth, a novel deep learning architecture was proposed based on sinusoidal and random data in order to obtain a more efficient strategy for vertical self-burrowing. This strategy was able to outperform many other standard burrowing techniques and was able to automatically reach targeted burrowing depths. We hope these results will serve as a proof of concept for how optimization can be used to unlock the secrets of navigating in the subterranean world more efficiently.
|
|
TuAT22-NT Oral Session, NT-G304 |
Add to My Program |
Marine Robotics I |
|
|
Chair: Jing, Xingjian | City University of Hong Kong |
Co-Chair: Zeng, Zheng | Shanghai Jiao Tong University |
|
10:30-12:00, Paper TuAT22-NT.1 | Add to My Program |
Towards Centimeter-Scale Underwater Mobile Robots: An Architecture for Capable µAUVs |
|
Spino, Pascal | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: Marine Robotics, Actuation and Joint Mechanisms, Micro/Nano Robots
Abstract: Underwater robots are indispensable for aquatic exploration, yet their size and complexity often limit broader application. This research presents a pioneering micro autonomous underwater vehicle (µAUV) design. This robot is distinguished by its utilization of mass-produced drone components, novel jet propulsion mechanisms, and multifunctional spherical shell. Its architecture is modular, appendage-free, and largely seal-free. Preliminary tests highlight its motion capabilities and set new benchmarks for centimeter-scale µAUV advancements.
|
|
10:30-12:00, Paper TuAT22-NT.2 | Add to My Program |
Untethered Bimodal Robotic Fish with Tunable Bistability |
|
Chao, Xu | The Hong Kong Polytechnic University |
Hameed, Imran | The Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Jing, Xingjian | City University of Hong Kong |
Keywords: Marine Robotics, Mechanism Design, Biomimetics
Abstract: In nature, fish are excellent swimmers due to their flexible and precise control of tail, which allows them to freely transform between the smooth flapping and the motion of rapid response so that they can move with dexterity. Here, inspired by the versatile motion abilities of fish, a novel robotic fish has been developed, featuring the capability of adaptable bistability. Through tuning the bistability, the robot can acquire two locomotion modes, namely monostable and bistable modes, and it can also swim at different energy barrier that needs to be overcome to realize the bistable motion. The theoretical models are derived to facilitate the control of the robot and the understanding of its nonlinear behavior. The impact of the tunable bistability on the swimming and turning performance is investigated through extensive experiments. The study effectively demonstrates the robotic fish’s capability to swiftly and efficiently navigate through mode switches, enabled by its tunable bistability. This feature is essential for underwater robots to perform tasks in intricate environments.
|
|
10:30-12:00, Paper TuAT22-NT.3 | Add to My Program |
Tendon-Driven Continuum Robot for Deep-Sea Application |
|
Sourkounis, Cora Maria | Leibniz University Hannover |
Kwasnitschka, Tom | GEOMAR Helmholtz Centre for Ocean Research Kiel |
Raatz, Annika | Leibniz Universität Hannover |
Keywords: Marine Robotics, Mechanism Design, Tendon/Wire Mechanism
Abstract: The extreme conditions of the deep sea require the use of large and expensive diving robots designed to withstand the high pressure in these depths. In order to reduce the costs for sediment sampling in the deep sea and thus facilitate the explorations of rare deep-sea ecosystems, the goal of this research is to design an alternative manipulator for deepsea suction sampling. Instead of relying on heavy hydraulic rigid manipulators that deep-sea diving robots are commonly equipped with, we introduce a new concept for a lightweight actuation system that can be used in combination with a traditional diving robot and a suction sampling system. The proposed concept consists of a series of rigid links connected by angled swivel joints. Each segment is actuated by tendons, which allows for continuous bending. The system can be adapted to various sizes of host systems, and the links and joints are printed in place, simplifying the manufacturing process.
|
|
10:30-12:00, Paper TuAT22-NT.4 | Add to My Program |
WAVE: An Open-Source underWater Arm-Vehicle Emulator |
|
Rosette, Marcus | Oregon State University |
Kolano, Hannah | Oregon State University |
Holm, Chris | Oregon State University |
Hollinger, Geoffrey | Oregon State University |
Marburg, Aaron | University of Washington |
Pickett, Madison | University of Washington |
Davidson, Joseph | Oregon State University |
Keywords: Marine Robotics, Mobile Manipulation, Mechanism Design
Abstract: Underwater vehicle manipulator systems (UVMS) are increasingly popular platforms for performing subsea operations that require precision manipulation. While there is high demand for fully autonomous or even semi-autonomous systems, most UVMS still require human support teams. Developing new hardware and algorithms for autonomous underwater manipulation is challenging. Simulations do not capture the full complexity of the underwater environment, and deploying a UVMS at sea for testing/validation is resource intensive and expensive. In this paper, we present a physical testbed for underwater manipulation that bridges the gap between simulation and full field trials. The underWater ArmVehicle Emulator (WAVE) is a 10-degree of freedom system designed to replicate an inspection-class UVMS. WAVE includes an underwater perception sensor and has 2 operating modes: rigid or passive-mode. In passive-mode, the ROV body can pitch similar to how a dynamically-coupled underactuated UVMS without pitch control would rotate during manipulation tasks. To validate the overall design and passive pitch concept, we evaluated the testbed during underwater experiments in energetic conditions at a wave basin. To support continued research and development in underwater robotics, we make the design open-access and freely available to the community.
|
|
10:30-12:00, Paper TuAT22-NT.5 | Add to My Program |
Nezha-F: Design and Analysis of a Foldable and Self-Deployable HAUV |
|
Bai, YuLin | Shanghai Jiao Tong University |
Jin, Yufei | Shanghai Jiao Tong University |
Liu, ChunHu | Shanghai Jiao Tong University |
Zeng, Zheng | Shanghai Jiao Tong University |
Lian, Lian | Shanghai Jiaotong University |
Keywords: Marine Robotics, Simulation and Animation, Mechanism Design
Abstract: This paper introduces a small hybrid aerial underwater vehicle (HAUV), which we named Nezha-F, that can fly in the air, perform vertical profiling underwater, and vertically take off and land from both the water surface and ground. A foldable and self-deployable arm mechanism linked to and driven by a piston variable buoyancy system (PVBS) is proposed to reduce the excessive underwater drag caused by aerial structures. By having a compact size and successfully balanced aerial and underwater performance without adding excessive actuators, this design provides a feasible idea for the miniaturization of amphibious floats. The dynamic characteristics of the small PVBS are linear fitted, and modeled. The originally nonlinear actuator performance is linearized by the post-fitting mapping. Asymmetric dead zones of the actuator are removed by adding compensation to the algorithm. During a 10-day field test, the vehicle showed good aerial performance and underwater control effect. Several full mission cycle tests proved the vehicle’s ability in semi-autonomous operation and robust domain crossing and verified the vehicle’s endurance during each mission stage.
|
|
10:30-12:00, Paper TuAT22-NT.6 | Add to My Program |
Snapp: An Agile Robotic Fish with 3-D Maneuverability for Open Water Swim |
|
Ng, Timothy Ju Kin | The University of Hong Kong |
Chen, Nan | The University of Hong Kong |
Zhang, Fu | University of Hong Kong |
Keywords: Marine Robotics, Biologically-Inspired Robots, Biomimetics
Abstract: Fish exhibit impressive locomotive performance and agility in complex underwater environments, using their undulating tails and pectoral fins for propulsion and maneuverability. Replicating these abilities in robotic fish is challenging; existing designs focus on either fast swimming or directional control at limited speeds, mainly within a confined environment. To address these limitations, we designed Snapp, an integrated robotic fish capable of swimming in open water with high speeds and full 3-dimensional maneuverability. A novel cyclic-differential method is layered on the mechanism. It integrates propulsion and yawsteering for fast course corrections. Two independent pectoral fins provide pitch and roll control. We evaluated Snapp in open water environments and demonstrated significant improvements in speed and maneuverability, achieving swimming speeds of 1.5 m/s (1.7 body lengths per second) and performing complexmaneuvers, such as a figure-8 and S-shape trajectory. Instantaneous yaw changes of 15◦ in 0.4 s, a minimum turn radius of 0.85 m, and maximum pitch and roll rates of 3.5 and 1 rd/s, respectively, were recorded. Our results suggest that Snapp’s swimming capabilities have excellent practical prospects for open seas and contribute significantly to developing agile robotic fishes.
|
|
10:30-12:00, Paper TuAT22-NT.7 | Add to My Program |
A Novel Omnidirectional Swimming Robot with Articulated-Compliant Legs |
|
Xu, Yaohui | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Li, Hanlin | Yanshan Universityand Shenzhen Institute of Advanced Technology |
Yu, Furui | Shaanxi University of Science & Technology, Shenzhen Institute O |
Zuo, Qiyang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Xie, Fengran | Shenzhen Polytechnic |
Xie, Xiang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
He, Kai | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Keywords: Marine Robotics, Biologically-Inspired Robots, Dynamics
Abstract: Stability, adaptability and maneuverability are the most important performance indexes for the underwater biomimetic robot, especially when it comes to the narrow space operation. However, these aspects are contradictory sometimes. In this paper, we present an omnidirectional swimming robot inspired by the whirligig beetle, with the design goals of good stability, adaptability, and maneuverability. First, the design of the robot, which is featured with four novel articulated-compliant robotic legs, is given. Second, its hydrodynamic model is formulated by using Kirchhoff’s equation as well as Lagrangian method, and the hydrodynamic force is calculated by the quasisteady flow model. Third, extensive experiments are carried out to examine its thrust generation and speed. It is found that the omnidirectional robot has a significant improvement compared with the conventional one with single passive joint in the leg. More specifically, its swimming speed is fast as 0.34 m/s at the frequency of 1.4 Hz, showing a 30.8% increase. Finally, multimodal swimming of the robot is demonstrated by configuring various locomotive pattern of the articulatedcompliant legs, such as swimming forward, retreating, lateral swimming to the left or right, zero-radius turning, no zeroradius turning. The passing and collision experiments present the robot’s potential applications in narrow spaces. Overall, this omnidirectional swimming robot shows a great balance among stability, adaptability and maneuve
|
|
10:30-12:00, Paper TuAT22-NT.8 | Add to My Program |
Marine Sediment Sampling with an Underwater Legged Robot |
|
Astolfi, Anna | Scuola Superiore Sant'Anna |
Chellapurath, Mrudul | Scuola Superiore Sant'Anna |
Picardi, Giacomo | Instituto De Ciencias Del Mar (ICM)—Consejo Superior De Investig |
Capriotti, Martina | University of Camerino |
Mladinich, Kayla | University of Connecticut |
Laschi, Cecilia | National University of Singapore |
Stefanni, Sergio | Stazione Zoologica Anton Dohrn |
Calisti, Marcello | The University of Lincoln |
Keywords: Marine Robotics, Legged Robots, Mechanism Design
Abstract: We present a novel approach to marine sediment sampling, which makes use of a hexapedal robotic platform, namely SILVER2, provided with a sediment sampling system. This approach tackles the disadvantages of state-of-the-art methods for sediment sampling in terms of increased station-keeping capabilities, low disturbance of the substrate, and precise position control. The sediment sampling system has been designed according to user requirements for microplastics (MPs) analysis from sampled sediment, which include the sampling depth, the sampled volume per area, and the possibility to collect replicas without returning to the boat or to the shore. We also defined a protocol for sediment collection and extensively tested the system both in a tank and in field experiments in different spots along the Tyrrhenian coast, in Italy. Sediments collected throughout the tests have been analyzed, extracting information about the quantity and composition of MPs, in order to provide an overview of the complete procedure. This work represents an important step towards the use of legged robots in marine operations, and contributes to highlight the importance of multidisciplinary collaborations among roboticists and scientists to develop novel solutions and increase the sampling capabilities of end-users.
|
|
10:30-12:00, Paper TuAT22-NT.9 | Add to My Program |
Terrain-Adaptive Locomotion Control for an Underwater Hexapod Robot: Sensing Leg-Terrain Interaction with Proprioceptive Sensors |
|
Chen, Lepeng | Northwestern Polytechnical University |
Cui, Rongxin | Northwestern Polytechnical University |
Yan, Weisheng | Northwestern Polytechnical University |
Xu, Hui | Northwestern Polytechnical University |
Zhang, Shouxu | Northwestern Polytechnical University |
Yu, Haitao | Northwestern Polytechnical University |
Keywords: Marine Robotics, Legged Robots, Motion Control
Abstract: Underwater hexapod robot, driven by six C-shaped legs and eight thrusters, has the potential to traverse diverse terrains with unknown deformable properties, which can lead to unknown leg-terrain interaction forces. However, using exteroceptive sensors such as cameras and sonars to recognize underwater terrain's deformable properties is hard. Here, we propose a method to perceive the interaction forces and feed them into a controller for determining thrust inputs. The key idea lies in using supervised learning to obtain the properties from reliable proprioceptive sensory data. First, we propose a new expression called Zero Moment Point (ZMP) bias that can indirectly represent the leg-terrain interaction force, removing the effects caused by gravity, buoyancy, and thrust. Second, we gather a walking cycle's discrete ZMP biases and then parameterize them as polynomials. Then, we use several previous walking cycles' parameterized biases to predict the current walking cycle's biases to generate the needed pitch and roll moments. Finally, we propose a terrain-adaptive locomotion control for the robot, which uses thrust to compensate for the interaction force.
|
|
TuAT23-NT Oral Session, NT-G401 |
Add to My Program |
Aerial Systems: Mechanics and Control I |
|
|
Chair: Suzuki, Satoshi | Chiba University |
Co-Chair: Ryll, Markus | Technical University Munich |
|
10:30-12:00, Paper TuAT23-NT.1 | Add to My Program |
Rapid Resistography with Passive Overhead-Perching Mechanism in an Unmanned Aerial System for Wood Structure Inspection |
|
Lee, Shawndy Michael | Singapore University of Technology and Design |
Liu, Jingmin | Singapore University of Technology & Design |
Chien, Jer Luen | Singapore University of Technology & Design |
Ng, Wei Hien | Singapore University of Technology & Design |
Lim, Milven | Singapore University of Technology & Design |
Foong, Shaohui | Singapore University of Technology and Design |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control
Abstract: This paper presents an aerial robotic platform for rapid remote elevated overhead-perching drill operations for wood health inspection. The platform features an innovative passive prismatic-gripper mechanism affixed to the aerial robot’s top, facilitating overhead drilling. The primary aim is to enhance the safety and efficiency of elevated wood structure inspection using the resistography method, which involves drilling into wooden structures to identify internal voids. The research centers on two key enabling technologies: a gripper mechanism for secure attachment to target surfaces and a tethered drill configuration for drilling operations. The novel gripper mechanism enables drilling on large planar surfaces and even small beam-width structures. The paper concludes with discussions on design simulations and drill resistance experiments, highlighting the effectiveness of the proposed approach in detecting internal cavities within wooden structures.
|
|
10:30-12:00, Paper TuAT23-NT.2 | Add to My Program |
Dual Quaternion Control of UAVs with Cable-Suspended Load |
|
Yuan, Yuxia | Technical University of Munich |
Ryll, Markus | Technical University Munich |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control
Abstract: Modeling the kinematics and dynamics of robotics systems with suspended loads using dual quaternions has not been explored so far. This paper introduces a new innovative control strategy using dual quaternions for UAVs with cable-suspended loads, focusing on the sling load lifting and tracking problems. By utilizing the mathematical efficiency and compactness of dual quaternions, a unified representation of the UAV and its suspended load’s dynamics and kinematics is achieved, facilitating the realization of load lifting and trajectory tracking. The simulation results have tested the proposed strategy’s accuracy, efficiency, and robustness. This study makes a substantial contribution to present this novel control strategy that harnesses the benefits of dual quaternions for cargo UAVs. Our work also holds promise for inspiring future innovations in under-actuated systems control using dualquaternions.
|
|
10:30-12:00, Paper TuAT23-NT.3 | Add to My Program |
Design, Modeling and Control of a Top-Loading Fully-Actuated Cargo Transportation Multirotor |
|
Park, Wooyong | Seoul National University of Science and Technology |
Wu, Xiangyu | University of California, Berkeley |
Lee, Dongjae | Seoul National University |
Lee, Seung Jae | Seoul National University of Science and Technology |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Calibration and Identification
Abstract: Existing multirotor-based cargo transportation does not maintain a constant cargo attitude due to underactuation; however, fragile payloads may require a consistent posture. The conventional method is also cumbersome when loading cargo, and the size of the cargo to be loaded is limited. To overcome these issues, we propose a new fully-actuated multirotor unmanned aerial vehicle platform capable of translational motion while maintaining a constant attitude. Our newly developed platform has a cubic exterior and can freely place cargo at any point on the flat top surface. However, the center-of-mass (CoM) position changes when cargo is loaded, leading to undesired attitudinal motion due to unwanted torque generation. To address this problem, we introduce a new model-free center-of-mass position estimation method named as the MOCE (Model-free Online Center-of-mass Estimation) algorithm, which is inspired by the extremum-seeking control (ESC) technique. Experimental results are presented to validate the performance of the proposed estimation method, effectively estimating the CoM position and showing satisfactory constant-attitude flight performance.
|
|
10:30-12:00, Paper TuAT23-NT.4 | Add to My Program |
Aerial Interaction with Tactile Sensing |
|
Guo, Xiaofeng | Carnegie Mellon Univeristy |
He, Guanqi | Carnegie Mellon University |
Mousaei, Mohammadreza | Carnegie Mellon University |
Geng, Junyi | Pennsylvania State University |
Shi, Guanya | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Force and Tactile Sensing
Abstract: While the field of autonomous Uncrewed Aerial Vehicles (UAVs) has grown rapidly, most applications only focus on passive visual tasks. Aerial interaction aims to execute tasks involving physical interactions, which offers a way to assist humans in high-altitude and high-risk operations. Tactile sensors, being both cost-effective and lightweight, are capable of sensing contact information including force distribution, as well as recognizing local textures. In this paper, we pioneer the use of vision-based tactile sensors on fully actuated UAVs in dynamic aerial manipulation tasks. We introduce a pipeline utilizing tactile feedback for force tracking via a hybrid motion-force controller and a method for wall texture detection during aerial interactions. Our experiments demonstrate that our system can effectively replace or complement traditional force/torque (F/T) sensors. Compared with only using the F/T sensor, our approach offers two solutions: substitution with tactile sensing, achieving comparable flight performance, or integration of tactile sensing with F/T sensor feedback, leading to around 16% improvement in position tracking accuracy. Our algorithm achieves 93.4% accuracy in real-time texture recognition, which further escalates to 100% in post-contact analysis. To the best of our knowledge, this is the first work to incorporate a vision-based tactile sensor into aerial interaction tasks.
|
|
10:30-12:00, Paper TuAT23-NT.5 | Add to My Program |
A Meter-Scale Ornithopter Capable of Jumping Take-Off |
|
Yan, Wei | Shanghai Jiaotong University, Shanghai, China |
Chen, Genliang | Shanghai Jiao Tong University |
Zhang, Zhuang | Westlake University |
Wang, Hao | Shanghai Jiao Tong University |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Mechanism Design
Abstract: Flapping wing air vehicles(FWAV) or ornithopters are bio-inspired aerial robots that mimic the flying principles of insects and birds. Autonomous take-off is an important capability for FWAV to enhance its performance and extend its working time, which is equipped by almost every kind of bird. As a common method of take-off for birds, jumping take-off has a great ability to adapt to different terrain and high energy efficiency compared with running and rotor-based take-off. Despite recent research, there is no FWAV capable of jumping take-off to this day. In this paper, we present a process to realize the jumping take-off of a meter-scale FWAV from flat ground. To lower the mechanical complexity, we eliminate the design of traditional robotic legs. Instead, we realize steady standing through a tripod-like structure that consists of two wings and a jumping mechanism. The flapping wing is directly driven by two independent servos. Three carbon fiber springs are employed to build a lightweight jumping module with high elastic energy. We build the dynamic model to analyze the aerodynamic effect during the jumping phase and realize a stable transition to flapping flight. This work lays the foundation for outdoor flight without human assistance.
|
|
10:30-12:00, Paper TuAT23-NT.6 | Add to My Program |
Autonomous Aerial Perching and Unperching Using Omnidirectional Tiltrotor and Switching Controller |
|
Lee, Dongjae | Seoul National University |
Hwang, Sunwoo | Seoul National University |
Byun, Jeonghyun | Seoul National University |
Lee, Seung Jae | Seoul National University of Science and Technology |
Kim, H. Jin | Seoul National University |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Motion Control
Abstract: Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and perching. To enable stable perching and unperching maneuvers on/from a vertical surface, a lightweight (approximately 1 kg), fully actuated tiltrotor that can hover at 90-degree pitch angle is first developed. We design a perching/unperching module composed of a single servomotor and a magnet, which is then mounted on the tiltrotor. A switching controller including exclusive control modes for transitions between free-flight and perching is proposed. Lastly, we propose a simple yet effective strategy to ensure robust perching in the presence of measurement and control errors and avoid collisions with the perching site immediately after unperching. We validate the proposed framework in experiments where the tiltrotor successfully performs perching and unperching on/from a vertical surface during flight. We further show effectiveness of the proposed transition mode in the switching controller by ablation studies where large overshoot and even collision with a perching site occur. To the best of the authors' knowledge, this work presents the first autonomous aerial unperching framework using a fully actuated tiltrotor.
|
|
10:30-12:00, Paper TuAT23-NT.7 | Add to My Program |
RotorTM: A Flexible Simulator for Aerial Transportation and Manipulation |
|
Li, Guanrui | New York University |
Xinyang, Liu | New York University |
Loianno, Giuseppe | New York University |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Simulation and Animation, Motion Control
Abstract: Low-cost autonomous Micro Aerial Vehicles (MAVs) have great potential to help humans by simplifying and speeding up complex tasks, such as construction, package delivery, and search and rescue. These systems, which may consist of single or multiple vehicles, can be equipped with passive connection mechanisms such as rigid links or cables for transportation and manipulation tasks. However, these systems are inherently complex. They are often underactuated and evolve in nonlinear manifold configuration spaces. In addition, the complexity escalates for systems with cable-suspended load due to the hybrid dynamics that vary with the cables' tension conditions. This paper presents the first aerial transportation and manipulation simulator incorporating different payloads and passive connection mechanisms with full system dynamics, planning, and control algorithms. Furthermore, it includes a novel general model accounting for the transient hybrid dynamics for aerial systems with cable-suspended load to closely mimic real-world systems. Comparisons between simulations and real-world experiments with different vehicles' configurations show the fidelity of the simulator results with respect to real-world settings. The experiments also show the simulator's benefit for the rapid prototyping and transitioning of aerial transportation and manipulation systems to real-world deployment.
|
|
10:30-12:00, Paper TuAT23-NT.8 | Add to My Program |
Simulation and Experimental Validation of an Autonomous Perching and Takeoff Method for a Multirotor UAV on Vertical Surfaces Using a Suction Cup |
|
Chapdelaine, Bruno | National Research Council Canada |
Celce, Mathis | Polytechnique Montréal |
Vidal, Charles | Aerospace Research Centre, National Research Council Canada |
Birglen, Lionel | Ecole Polytechnique De Montreal |
Monsarrat, Bruno | Aerospace Research Centre, National Research Council Canada |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Software, Middleware and Programming Environments
Abstract: This paper details the simulation and experimental validation of an autonomous perching and take-off method for a multirotor unmanned aerial vehicle (UAV) using a suction cup perching mechanism on vertical surfaces. The suction cup interaction with different surface types is characterized with experimental tests to accurately model the perching manoeuvre. The resulting model is used to develop a realistic hardware-in-the-loop (HIL) simulation of the perching and take-off manoeuvre of the UAV in Gazebo. A control method is developed to automate the perching and take-off manoeuvre. The method is tested in simulation and is experimentally validated with flight tests. Comparisons between simulation and experimental data demonstrate that the simulation is accurate and can be used to continue the development of autonomous perching methods.
|
|
TuAT24-NT Oral Session, NT-G402 |
Add to My Program |
Visual-Inertial SLAM |
|
|
Chair: Huang, Guoquan | University of Delaware |
Co-Chair: Zhang, Hong | Southern University of Science and Technology |
|
10:30-12:00, Paper TuAT24-NT.1 | Add to My Program |
Field-VIO: Stereo Visual-Inertial Odometry Based on Quantitative Windows in Agricultural Open Fields |
|
Sun, Jianjing | Anhui University |
Wu, Shuang | Hefei Institutes of Physical Science, Chinese Academy of Science |
Dong, Jun | Hefei Institutes of Physical Science, Chinese Academy of Science |
He, JunMing | Hefei Institutes of Physical Science, Chinese Academy of Science |
Keywords: Visual-Inertial SLAM, Agricultural Automation, Field Robots
Abstract: In agricultural open fields, accurate autonomous localization of robots requires long-term data correlation to reduce cumulative error. Our article presents a Stereo Visual-Inertial Odometry (VIO) system based on ORB-SLAM3 to address the malfunction of the Loop Closure Detection (LCD) methods in this environment. In this method, we first propose a concept of quantitative windows to describe the robot’s trajectory along the crop rows. We design a driving state quantification algorithm and accurately separate the quantitative windows between the crop rows. Our system constructs spatial constraints according to the parallelism between the quantitative windows. We apply an anomaly correction method to maintain the constructed parallel matching relationship and implement holistic pose correction for keyframes within abnormal quantitative windows. Our system demonstrated excellent performance over long distances in experiments on the Rosario dataset, verifying its effectiveness in reducing cumulative positioning error in agricultural open fields.
|
|
10:30-12:00, Paper TuAT24-NT.2 | Add to My Program |
Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry |
|
Li, Haolong | Max Planck Institute for Intelligent Systems |
Stueckler, Joerg | Max Planck Institute for Intelligent Systems |
Keywords: Visual-Inertial SLAM, Calibration and Identification, Wheeled Robots
Abstract: Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual inertial odometry (VIO). Our method calibrates and adapts the dynamics model online to improve the accuracy of forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In experiments, we demonstrate that ST-VIO can not only adapt to wheel or ground changes and improve the accuracy of prediction under new control inputs, but can even improve tracking accuracy.
|
|
10:30-12:00, Paper TuAT24-NT.3 | Add to My Program |
VI-HSO: Hybrid Sparse Monocular Visual-Inertial Odometry |
|
Yang, Wenzhe | Dalian University of Technology |
Zhuang, Yan | Dalian University of Technology |
Luo, Dongting | Dalian University of Technology |
Zhang, Xuetao | Dalian University of Technology |
Wang, Wei | College of Control Science and Engineering, Dalian University Of |
Zhang, Hong | Southern University of Science and Technology |
Keywords: Visual-Inertial SLAM, Localization
Abstract: In this letter, we present VI-HSO, a hybrid sparse monocular visual-inertial odometry system based on two innovative techniques called adaptive interframe alignment (AIA) and dynamic inverse distance filter (DIDF). Although the sparse image alignment algorithm appears efficient for calculating frame-to-frame motion, it tends to fail in case of significant intensity changes and motion blur. To overcome these limitations, we propose an adaptive interframe alignment method that allows for an adaptive selection between the original Lucas-Kanade (LK) method and the inverse compositional method when constructing photometric errors, along with the addition of inertial information in the process. This approach enables the tracking phase to utilize the full image and inertial information. During intense motion, the inverse distance of the new candidate point often fails to converge, leading to either scale drift or tracking failure. We present a dynamic inverse distance filter that can adjust the convergence range to update candidate points' inverse distance. This adjustment is based on the convergence ratio of the inverse distance of keyframes, which enables more convergent map points aiding in robust tracking in regions lacking texture and during rapid rotation. We evaluate the performance of VI-HSO on public datasets and real-world experiments, and our system outperforms state-of-the-art algorithms. The code is published at https://github.com/luodongting/VI-HSO.
|
|
10:30-12:00, Paper TuAT24-NT.4 | Add to My Program |
Square-Root Inverse Filter-Based GNSS-Visual-Inertial Navigation |
|
Hu, Jun | Meituan Inc |
Lang, Xiaoming | Meituan |
Zhang, Feng | Meituan |
Mao, Yinian | Meituan-Dianping Group |
Huang, Guoquan | University of Delaware |
Keywords: Visual-Inertial SLAM, Localization, SLAM
Abstract: While Global Navigation Satellite System (GNSS) is often used to provide global positioning if available, its intermittency and/or inaccuracy calls for fusion with other sensors. In this paper, we develop a novel GNSS-Visual-Inertial Navigation System (GVINS) that fuses visual, inertial, and raw GNSS measurements within the square-root inverse sliding window filtering (SRI-SWF) framework in a tightly coupled fashion, which thus is termed SRI-GVINS. In particular, for the first time, we deeply fuse the GNSS pseudorange, Doppler shift, single-differenced pseudorange, and double-differenced carrier phase measurements, along with the visual-inertial measurements. Inherited from the SRI-SWF, the proposed SRI-GVINS gains significant numerical stability and computational efficiency over the start-of-the-art methods. Additionally, we propose to use a filter to sequentially initialize the reference frame transformation till converges, rather than collecting measurements for batch optimization. We also perform online calibration of GNSS-IMU extrinsic parameters to mitigate the possible extrinsic parameter degradation. The proposed SRI-GVINS is extensively evaluated on our own collected UAV datasets and the results demonstrate that the proposed method is able to suppress VIO drift in real-time and also show the effectiveness of online GNSS-IMU extrinsic calibration. The experimental validation on the public datasets further reveals that the proposed SRI-GVINS outperforms the state-of-the-art methods in terms of both accuracy and efficiency.
|
|
10:30-12:00, Paper TuAT24-NT.5 | Add to My Program |
Omnidirectional Dense SLAM for Back-To-Back Fisheye Cameras |
|
Xie, Weijian | Zhejiang University, SenseTime Research |
Chu, Guanyi | SenseTime |
Qian, Quanhao | SenseTime Research |
Yu, Yihao | Zhejiang Sensetime Technology Development Co., Ltd |
Zhai, Shangjin | Sensetime Research |
Chen, Danpeng | Zhejiang University, Sensetime Research and Tetras.AI |
Wang, Nan | Sensetime |
Bao, Hujun | Zhejiang University |
Zhang, Guofeng | Zhejiang University |
Keywords: Visual-Inertial SLAM, Mapping, Omnidirectional Vision
Abstract: We propose a real-time visual-inertial dense SLAM system that utilizes the online data streams from back-to-back dual fisheye cameras setup, providing 360 degree coverage of the environment. Firstly, we employ a sliding-window-based front-end to estimate real-time poses from the binocular fisheye images and IMU data. Then, we implement a lightweight panoramic depth completion network based on multi-basis depth representation. The network takes panoramic images (obtained by stitching dual-fisheye images with extrinsic and intrinsic parameters) and sparse depths (generated by the front-end local tracking) as input and predicts multiple depth bases along with corresponding confidence as output. The final dense depth is the linear combination of the multiple depth bases. Thanks to the multi-basis depth representation, we can continuously optimize the 360 degree depth with the traditional optimizer to achieve higher global consistency in depth. We conducted experiments on both simulated and real-world datasets to evaluate our method. The results demonstrate that the proposed method outperforms SoTA methods in terms of depth prediction and 3D reconstruction. In addition, we develop a demo that can run on a mobile to demonstrate the real-time capabilities of our method.
|
|
10:30-12:00, Paper TuAT24-NT.6 | Add to My Program |
Visual Inertial Odometry Using Focal Plane Binary Features (BIT-VIO) |
|
Lisondra, Matthew | Toronto Metropolitan University |
Kim, Junseo | Toronto Metropolitan University |
Murai, Riku | Imperial College London |
Zareinia, Kourosh | Ryerson University |
Saeedi, Sajad | Toronto Metropolitan University |
Keywords: Visual-Inertial SLAM, Sensor Fusion
Abstract: Focal-Plane Sensor-Processor Arrays (FPSP)s are an emerging technology that can execute vision algorithms directly on the image sensor. Unlike conventional cameras, FPSPs perform computation on the image plane – at individual pixels – enabling high frame rate image processing while consuming low power, making them ideal for mobile robotics. FPSPs, such as the SCAMP-5, use parallel processing and are based on the Single Instruction Multiple Data (SIMD) paradigm. In this paper, we present BIT-VIO, the first Visual Inertial Odometry (VIO) which utilises SCAMP-5. BIT-VIO is a loosely-coupled iterated Extended Kalman Filter (iEKF) which fuses together the visual odometry running fast at 300 FPS with predictions from 400 Hz IMU measurements to provide accurate and smooth trajectories.
|
|
10:30-12:00, Paper TuAT24-NT.7 | Add to My Program |
PL-EVIO: Robust Monocular Event-Based Visual Inertial Odometry with Point and Line Features (I) |
|
Guan, Weipeng | The University of Hong Kong |
Chen, Peiyu | The University of Hong Kong |
Xie, Yuhan | The University of Hong Kong |
Lu, Peng | The University of Hong Kong |
Keywords: Visual-Inertial SLAM, Sensor Fusion, Aerial Systems: Perception and Autonomy
Abstract: Robust state estimation in challenge situations is still an unsolved problem, especially achieving onboard pose feedback control for aggressive motion. In this paper, we propose robust and real-time event-based visual-inertial odometry (VIO) that incorporates event, image, and inertial measurements. Our approach utilizes line-based event features to provide additional structure and constraint information in human-made scenes, while point-based event and image features complement each other through well-designed feature management. To achieve reliable state estimation, we tightly couple the point-based and line-based visual residuals from the event camera, the point-based visual residual from the standard camera, and the residual from IMU pre-integration using a keyframe-based graph optimization framework. Experiments in the public benchmark datasets show that our method can achieve superior performance compared with the state-of-the-art image-based or event-based VIO. Furthermore, we demonstrate the effectiveness of our pipeline through onboard closed-loop quadrotor aggressive flight and large-scale outdoor experiments. Videos of the evaluations can be found on our website: https://youtu.be/KnWZ4anBMK4.
|
|
10:30-12:00, Paper TuAT24-NT.8 | Add to My Program |
JacobiGPU: GPU-Accelerated Numerical Differentiation for Loop Closure in Visual SLAM |
|
Kumar, Dhruv | Simon Fraser University |
Gopinath, Shishir | Simon Fraser University |
Dantu, Karthik | University of Buffalo |
Ko, Steve | Simon Fraser University |
Keywords: Visual-Inertial SLAM, SLAM
Abstract: In this paper, we introduce JacobiGPU, a technique that uses a GPU to improve the efficiency of loop closure in visual-inertial SLAM systems, particularly when approximating Jacobians using the Finite Difference Method (FDM). Traditional FDM techniques often face computational overhead due to repeated perturbations in pose graphs. We address this overhead with a novel methodology, leveraging strategic graph partitioning and an optimized approach to Jacobian approximation. By integrating JacobiGPU into ORB-SLAM3’s g2o, we enhance the linearization process. Our evaluation, conducted on 12 sequences of varying lengths from the EuRoC and TUM-VI datasets, demonstrated a speedup of up to 4.23x in the linearization stage and an overall enhancement of up to 2.08x in the overall optimization process.
|
|
10:30-12:00, Paper TuAT24-NT.9 | Add to My Program |
MAVIS: Multi-Camera Augmented Visual-Inertial SLAM Using SE2(3) Based Exact IMU Pre-Integration |
|
Wang, Yifu | Tencent |
Ng, Yonhon | Tencent |
Sa, Inkyu | Tencent |
Parra, Alvaro | The University of Adelaide |
Rodriguez, Cristian | Australian Institute for Machine Learning |
Lin, Tao Jun | Australian National University |
Li, Hongdong | Australian National University and NICTA |
Keywords: Visual-Inertial SLAM, SLAM, Localization
Abstract: We present a novel optimization-based Visual-Inertial SLAM system designed for multiple partially overlapped camera systems, named MAVIS. Our framework fully exploits the benefits of wide field-of-view from multi-camera systems, and the metric scale measurements provided by an inertial measurement unit (IMU). We introduce an improved IMU pre-integration formulation based on the exponential function of the automorphism of SE_2(3), which can effectively enhance tracking performance under fast rotational motion and extended integration time. Furthermore, we extend conventional front-end tracking and back-end optimization module designed for monocular or stereo setup towards multi-camera systems, and introduce implementation details that contribute to the performance of our system in challenging scenarios. The practical validity of our approach is supported by our experiments on public datasets. Our MAVIS won the first place in all the vision-IMU tracks (single and multi-session SLAM) on Hilti SLAM Challenge 2023 with 1.7 times the score compared to the second place.
|
|
TuAT25-NT Oral Session, NT-G403 |
Add to My Program |
Localization I |
|
|
Chair: Cho, Younggun | Inha University |
Co-Chair: Oleynikova, Helen | ETH Zurich |
|
10:30-12:00, Paper TuAT25-NT.1 | Add to My Program |
Salience-Guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments |
|
Park, Jooyong | Inha University |
Lee, Jungwoo | Inha University |
Choi, Euncheol | Inha University |
Cho, Younggun | Inha University |
Keywords: Localization, Intelligent Transportation Systems, SLAM
Abstract: In urban environments for delivery robots, particularly in areas such as campuses and towns, many custom features defy standard road semantic categorizations. Addressing this challenge, our paper introduces a method leveraging Salient Object Detection (SOD) to extract these unique features, employing them as pivotal factors for enhanced robot loop closure and localization. Traditional geometric feature-based localization is hampered by fluctuating illumination and appearance changes. Our preference for SOD over semantic segmentation sidesteps the intricacies of classifying a myriad of non-standardized urban features. To achieve consistent ground features, the Motion Compensate IPM (MC-IPM) technique is implemented, capitalizing on motion for distortion compensation and subsequently selecting the most pertinent salient ground features through moment computations. For thorough evaluation, we validated the saliency detection and localization performances to the real urban scenarios. Project page:https://sites.google.com/view/salient-ground- factor/home.
|
|
10:30-12:00, Paper TuAT25-NT.2 | Add to My Program |
Block-Map-Based Localization in Large-Scale Environment |
|
Feng, Yixiao | University of New South Wales |
Jiang, Zhou | Beijing Institute of Technology |
Shi, Yongliang | Tsinghua University |
Feng, Yunlong | ShanghaiTech University |
Chen, Xiangyu | Liverpool John Moores University |
Zhao, Hao | Tsinghua University |
Zhou, Guyue | Tsinghua University |
Keywords: Localization, Mapping
Abstract: Accurate localization is an essential technology for the flexible navigation of robots in large-scale environments. Both SLAM-based and map-based localization will increase the computing load due to the increase in map size, which will affect downstream tasks such as robot navigation and services. To this end, we propose a localization system based on Block Maps (BMs) to reduce the computational load caused by maintaining large-scale maps. Firstly, we introduce a method for generating block maps and the corresponding switching strategies, ensuring that the robot can estimate the state in large-scale environments by loading local map information. Secondly, global localization according to Branch-and-Bound Search (BBS) in the 3D map is introduced to provide the initial pose. Finally, a graph-based optimization method is adopted with a dynamic sliding window that determines what factors are being marginalized whether a robot is exposed to a BM or switching to another one, which maintains the accuracy and efficiency of pose tracking. Comparison experiments are performed on publicly available large-scale datasets. Results show that the proposed method can track the robot pose even though the map scale reaches more than 6 kilometers, while efficient and accurate localization is still guaranteed on NCLT and M2DGR.
|
|
10:30-12:00, Paper TuAT25-NT.3 | Add to My Program |
Subsurface Feature-Based Ground Robot/Vehicle Localization Using a Ground Penetrating Radar |
|
Li, Haifeng | Civil Aviation University of China |
Guo, Jiajun | Civil Aviation University of China |
Song, Dezhen | Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) |
Keywords: Localization, Mapping
Abstract: Robot localization using subsurface features captured by Ground-Penetrating Radar (GPR) complements and improves robustness over existing common sensor modalities, as subsurface features are less sensitive to weather, season and surface scene changes. Here, we propose a novel subsurface feature-based localization method that uses only GPR measurements with a known subsurface map. An efficient feature descriptor, the dominant energy curve (DEC), is designed to identify different locations in cluttered conditions. Specifically, image processing techniques that involve background segmentation, energy point detection, and energy curve refinement are designed to extract DEC features from a 2D radargram. With DECs features obtained, a metric subsurface feature map is constructed. Finally, we perform robot localization by feature matching under a particle swarm optimization framework. We have implemented our method and tested it with the public CMU-GPR dataset. The results show that our algorithm improves accuracy and robustness with real-time performance for robot localization tasks. Specifically, the mean localization error is 0.503 m for all cases.
|
|
10:30-12:00, Paper TuAT25-NT.4 | Add to My Program |
Colmap-PCD: An Open-Source Tool for Fine Image-To-Point Cloud Registration |
|
Bai, Chunge | AgiBot Technology Co. Ltd |
Fu, Ruijie | Carnegie Mellon University |
Gao, Xiang | AgiBot Technology Co. Ltd |
Keywords: Localization, Mapping, Methods and Tools for Robot System Design
Abstract: State-of-the-art techniques for monocular camera reconstruction predominantly rely on the Structure from Motion (SfM) pipeline. However, such methods often yield reconstruction outcomes that lack crucial scale information, and over time, accumulation of images leads to inevitable drift issues. In contrast, mapping methods based on LiDAR scans are popular in large-scale urban scene reconstruction due to their precise distance measurements, a capability fundamentally absent in visual-based approaches. Researchers have made attempts to utilize concurrent LiDAR and camera measurements in pursuit of precise scaling and color details within mapping outcomes. However, the outcomes are subject to extrinsic calibration and time synchronization precision. In this paper, we propose a novel cost-effective reconstruction pipeline that utilizes a pre-established LiDAR map as a fixed constraint to effectively address the inherent scale challenges present in monocular camera reconstruction. To our knowledge, our method is the first to register images onto the point cloud map without requiring synchronous capture of camera and LiDAR data, granting us the flexibility to manage reconstruction detail levels across various areas of interest. To facilitate further research in this domain, we have released Colmap-PCD, an open-source tool leveraging the Colmap algorithm, that enables precise fine-scale registration of images to the point cloud map.
|
|
10:30-12:00, Paper TuAT25-NT.5 | Add to My Program |
COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry |
|
Pfreundschuh, Patrick | ETH Zurich |
Oleynikova, Helen | ETH Zurich |
Cadena Lerma, Cesar | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Andersson, Olov | KTH Royal Institute |
Keywords: Localization, Mapping, SLAM
Abstract: We present COIN-LIO, a LiDAR Inertial Odometry pipeline that tightly couples information from LiDAR intensity with geometry-based point cloud registration. The focus of our work is to improve the robustness of LiDAR-inertial odometry in geometrically degenerate scenarios, like tunnels or flat fields. We project LiDAR intensity returns into an image, and present a novel image processing pipeline that produces filtered images with improved brightness consistency within the image as well as across different scenes. We effectively leverage intensity as an additional modality, using our new feature selection scheme that detects uninformative directions in the point cloud registration and explicitly selects patches with complementary image information. Photometric error minimization in the image patches is then fused with inertial measurements and point-to-plane registration in an iterated Extended Kalman Filter. The proposed approach improves accuracy and robustness on a public dataset. We additionally publish a new dataset, that captures five real-world environments in challenging, geometrically degenerate scenes. By using the additional photometric information, our approach shows drastically improved robustness against geometric degeneracy in environments where all compared baseline approaches fail.
|
|
10:30-12:00, Paper TuAT25-NT.6 | Add to My Program |
MegaParticles: Range-Based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter |
|
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Oishi, Shuji | National Institute of Advanced Industrial Science and Technology |
Yokozuka, Masashi | Nat. Inst. of Advanced Industrial Science and Technology |
Banno, Atsuhiko | National Instisute of Advanced Industrial Science and Technology |
Keywords: Localization, Range Sensing, SLAM
Abstract: This paper presents a 6-DoF range-based Monte Carlo localization method with a GPU-accelerated Stein particle filter. To update a massive amount of particles, we propose a Gauss-Newton-based Stein variational gradient descent (SVGD) with iterative neighbor particle search. This method uses SVGD to collectively update particle states with gradient and neighborhood information, which provides efficient particle sampling. For an efficient neighbor particle search, it uses locality sensitive hashing and iteratively updates the neighbor list of each particle over time. The neighbor list is then used to propagate the posterior probabilities of particles over the neighbor particle graph. The proposed method is capable of evaluating one million particles in real-time on a single GPU and enables robust pose initialization and re-localization without an initial pose estimate. In experiments, the proposed method showed an extreme robustness to complete sensor occlusion (i.e., kidnapping), and enabled pinpoint sensor localization without any prior information.
|
|
10:30-12:00, Paper TuAT25-NT.7 | Add to My Program |
Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization |
|
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Oishi, Shuji | National Institute of Advanced Industrial Science and Technology |
Yokozuka, Masashi | Nat. Inst. of Advanced Industrial Science and Technology |
Banno, Atsuhiko | National Instisute of Advanced Industrial Science and Technology |
Keywords: Localization, Range Sensing, SLAM
Abstract: This paper presents a range inertial localization algorithm for a 3D prior map. The proposed algorithm tightly couples scan-to-scan and scan-to-map point cloud registration factors along with IMU factors on a sliding window factor graph. The tight coupling of the scan-to-scan and scan-to-map registration factors enables a smooth fusion of sensor ego-motion estimation and map-based trajectory correction that results in robust tracking of the sensor pose under severe point cloud degeneration and defective regions in a map. We also propose an initial sensor state estimation algorithm that robustly estimates the gravity direction and IMU state and helps perform global localization in 3- or 4-DoF for system initialization without prior position information. Experimental results show that the proposed method outperforms existing state-of-the-art methods in extremely severe situations where the point cloud data becomes degenerate, there are momentary sensor interruptions, or the sensor moves along the map boundary or into unmapped regions.
|
|
10:30-12:00, Paper TuAT25-NT.8 | Add to My Program |
SPOT: Point Cloud Based Stereo Visual Place Recognition for Similar and Opposing Viewpoints |
|
Carmichael, Spencer | University of Michigan |
Agrawal, Rahul | University of Michigan |
Vasudevan, Ram | University of Michigan |
Skinner, Katherine | University of Michigan |
Keywords: Localization, SLAM
Abstract: Recognizing places from an opposing viewpoint during a return trip is a common experience for human drivers. However, the analogous robotics capability, visual place recognition (VPR) with limited field of view cameras under 180 degree rotations, has proven to be challenging to achieve. To address this problem, this paper presents Same Place Opposing Trajectory (SPOT), a technique for opposing viewpoint VPR that relies exclusively on structure estimated through stereo visual odometry (VO). The method extends recent advances in lidar descriptors and utilizes a novel double (similar and opposing) distance matrix sequence matching method. We evaluate SPOT on a publicly available dataset with 6.7-7.6 km routes driven in similar and opposing directions under various lighting conditions. The proposed algorithm demonstrates remarkable improvement over the state-of-the-art, achieving up to 91.7% recall at 100% precision in opposing viewpoint cases, while requiring less storage than all baselines tested and running faster than all but one. Moreover, the proposed method assumes no a priori knowledge of whether the viewpoint is similar or opposing, and also demonstrates competitive performance in similar viewpoint cases.
|
|
TuAT26-NT Oral Session, NT-G404 |
Add to My Program |
Localization and Navigation |
|
|
Chair: Garg, Sourav | University of Adelaide |
Co-Chair: Fang, Guoxin | The University of Manchester |
|
10:30-12:00, Paper TuAT26-NT.1 | Add to My Program |
An Onboard Framework for Staircases Modeling Based on Point Clouds |
|
Qing, Chun | Shenzhen Technology University |
Zeng, Rongxiang | Shenzhen Technology University |
Wu, Xuan | Shenzhen Technology University |
Shi, Yongliang | Tsinghua University |
Ma, Gan | Shenzhen Technology University |
Keywords: Deep Learning for Visual Perception, Data Sets for Robotic Vision, Vision-Based Navigation
Abstract: The detection of traversable regions on staircases and the physical modeling constitutes pivotal aspects of the mobility of legged robots. This paper presents an onboard framework tailored to the detection of traversable regions and the modeling of physical attributes of staircases by point cloud data. To mitigate the influence of illumination variations and the overfitting due to the dataset diversity, a series of data augmentations are introduced to enhance the training of the fundamental network. A curvature suppression cross- entropy(CSCE) loss is proposed to reduce the ambiguity of prediction on the boundary between traversable and non- traversable regions. Moreover, a measurement correction based on the pose estimation of stairs is introduced to calibrate the output of raw modeling that is influenced by tilted perspectives. Lastly, we collected a dataset pertaining to staircases and introduced new evaluation criteria. Through a series of rigorous experiments conducted on this dataset, we substantiate the superior accuracy and generalization capabilities of our proposed method. Codes, models, and datasets will be available at https://github.com/szturobotics/Stair-detection-and-modelli ng-project.
|
|
10:30-12:00, Paper TuAT26-NT.2 | Add to My Program |
V-STRONG: Visual Self-Supervised Traversability Learning for Off-Road Navigation |
|
Jung, Sanghun | University of Washington |
Lee, JoonHo | University of Washington |
Meng, Xiangyun | University of Washington |
Boots, Byron | University of Washington |
Lambert, Alexander | Univesity of Washington |
Keywords: Deep Learning for Visual Perception, Learning from Experience, Field Robots
Abstract: Reliable estimation of terrain traversability is critical for the successful deployment of autonomous systems in wild, outdoor environments. Given the lack of large-scale annotated datasets for off-road navigation, strictly-supervised learning approaches remain limited in their generalization ability. To this end, we introduce a novel, image-based self-supervised learning method for traversability prediction, leveraging a state-of-the-art vision foundation model for improved out-of-distribution performance. Our method employs contrastive representation learning using both human driving data and instance-based segmentation masks during training. We show that this simple, yet effective, technique drastically outperforms recent methods in predicting traversability for both on- and off-trail driving scenarios. We compare our method with recent baselines on both a common benchmark as well as our own datasets, covering a diverse range of outdoor environments and varied terrain types. We also demonstrate the compatibility of resulting costmap predictions with a model-predictive controller. Finally, we evaluate our approach on zero- and few-shot tasks, demonstrating unprecedented performance for generalization to new environments. Videos and additional material can be found here: https://sites.google.com/view/visual-traversability-learning.
|
|
10:30-12:00, Paper TuAT26-NT.3 | Add to My Program |
Follow the Footprints: Self-Supervised Traversability Estimation for Off-Road Vehicle Navigation Based on Geometric and Visual Cues |
|
Jeon, Yurim | Seoul National University |
Son, E-In | Seoul National University |
Seo, Seung-Woo | Seoul National University |
Keywords: Deep Learning for Visual Perception, Vision-Based Navigation, Deep Learning Methods
Abstract: In this study, we address the off-road traversability estimation problem, that predicts areas where a robot can navigate in off-road environments. An off-road environment is an unstructured environment comprising a combination of traversable and non-traversable spaces, which presents a challenge for estimating traversability. This study highlights three primary factors that affect a robot's traversability in an off-road environment: surface slope, semantic information, and robot platform. We present two strategies for estimating traversability, using a guide filter network (GFN) and footprint supervision module (FSM). The first strategy involves building a novel GFN using a newly designed guide filter layer. The GFN interprets the surface and semantic information from the input data and integrates them to extract features optimized for traversability estimation. The second strategy involves developing an FSM, which is a self-supervision module that utilizes the path traversed by the robot in pre-driving, also known as a footprint. This enables the prediction of traversability that reflects the characteristics of the robot platform. Based on these two strategies, the proposed method overcomes the limitations of existing methods, which require laborious human supervision and lack scalability. Extensive experiments in diverse conditions, including automobiles and unmanned ground vehicles, herbfields, woodlands, and farmlands, demonstrate that the proposed method is compatible for various robot platforms and adaptable to a range of terrains.
|
|
10:30-12:00, Paper TuAT26-NT.4 | Add to My Program |
Learning to Predict Navigational Patterns from Partial Observations |
|
Karlsson, Robin | Nagoya University |
Carballo, Alexander | Gifu University |
Lepe-Salazar, Francisco | Ludolab |
Fujii, Keisuke | Nagoya University |
Ohtani, Kento | Nagoya University |
Takeda, Kazuya | Nagoya University |
Keywords: Vision-Based Navigation, Semantic Scene Understanding, Continual Learning
Abstract: Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This letter presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enable our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception.
|
|
10:30-12:00, Paper TuAT26-NT.5 | Add to My Program |
TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation |
|
Shen, Yehui | NorthEast University |
Liu, Mingmin | SIASUN Robot & Automation CO., Ltd |
Lu, Huimin | National University of Defense Technology |
Chen, Xieyuanli | National University of Defense Technology |
Keywords: Localization, Deep Learning Methods, Transfer Learning
Abstract: Visual place recognition (VPR) plays a pivotal role in autonomous exploration and navigation of mobile robots within complex outdoor environments. While cost-effective and easily deployed, camera sensors are sensitive to lighting and weather changes, and even slight image alterations can greatly affect VPR efficiency and precision. Existing methods overcome this by exploiting powerful yet large networks, leading to significant consumption of computational resources. In this paper, we propose a high-performance teacher and lightweight student distillation framework called TSCM. It exploits our devised cross-metric knowledge distillation to narrow the performance gap between the teacher and student models, maintaining superior performance while enabling minimal computational load during deployment. We conduct comprehensive evaluations on large-scale datasets, namely Pitts- burgh30k and Pittsburgh250k. Experimental results demonstrate the superiority of our method over baseline models in terms of recognition accuracy and model parameter efficiency. Moreover, our ablation studies show that the proposed knowledge distillation technique surpasses other counterparts. Implementation of our method has been released as open source at https://github.com/shenyehui/TSCM.
|
|
10:30-12:00, Paper TuAT26-NT.6 | Add to My Program |
3D-BBS: Global Localization for 3D Point Cloud Scan Matching Using Branch-And-Bound Algorithm |
|
Aoki, Koki | Meijo University |
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Oishi, Shuji | National Institute of Advanced Industrial Science and Technology |
Yokozuka, Masashi | Nat. Inst. of Advanced Industrial Science and Technology |
Banno, Atsuhiko | National Instisute of Advanced Industrial Science and Technology |
Meguro, Junichi | Meijo University |
Keywords: Localization
Abstract: This paper presents an accurate and fast 3D global localization method, 3D-BBS, that extends the existing branch-and-bound (BnB)-based 2D scan matching (BBS) algorithm. To reduce memory consumption, we utilize a sparse hash table for storing hierarchical 3D voxel maps. To improve the processing cost of BBS in 3D space, we propose an efficient roto-translational space branching. Furthermore, we devise a batched BnB algorithm to fully leverage GPU parallel processing. Through experiments in simulated and real environments, we demonstrated that the 3D-BBS enabled accurate global localization with only a 3D LiDAR scan roughly aligned in the gravity direction and a 3D pre-built map. This method required only 878 msec on average to perform global localization and outperformed state-of-the-art global registration methods in terms of accuracy and processing speed.
|
|
10:30-12:00, Paper TuAT26-NT.7 | Add to My Program |
DynaInsRemover: A Real-Time Dynamic Instance-Aware Static 3D LiDAR Mapping Framework for Dynamic Environment |
|
Zhao, Huanfeng | Jilin University |
Yao, Meibao | Jilin University |
Xiao, Xueming | Changchun University of Science and Technology |
Zheng, Bo | Shanghai Aerospace Control Technology Institute |
Keywords: Mapping, Range Sensing
Abstract: Dynamic objects diversify the distribution of point cloud in the map, degrading the performance of the robotic downstream tasks. To address this problem, we present a novel real-time dynamic instance-aware static mapping framework called DynaInsRemover, which exploits the geometric discrepancies between instances to efficiently remove dynamic objects preserve more details of static map. It contains the Instance Occupancy Check module for initial dynamic instance proposal and the Instance Belief Update module for reverting false positives. We quantitatively evaluate our approach performance on SemanticKITTI and validate it in real-world environment. Experimental evaluations show that our method achieves very promising results in dynamic environments. The implementation of our method is available as open source at: https://github.com/Zhaohuanfeng/DynaInsRemover.git.
|
|
10:30-12:00, Paper TuAT26-NT.8 | Add to My Program |
Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation |
|
Wang, Hongcheng | Peking University |
Wang, Yuxuan | Peking University |
Fangwei Zhong, Zfw1993 | Peking Univesity |
Mingdong Wu, Aaron | Peking University |
Zhang, Jianwei | University of Hamburg |
Wang, Yizhou | Peking University |
Dong, Hao | Peking University |
Keywords: Vision-Based Navigation, Representation Learning, Reinforcement Learning
Abstract: Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, e.g., household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound categories; 2) sample inef- ficient in training. Focusing on these two problems, we propose a brain-inspired plug-and-play method to learn a semantic- agnostic and spatial-aware representation for generalizable visual-audio navigation. We meticulously design two auxiliary tasks for respectively accelerating learning representations with the above-desired characteristics. With these two auxiliary tasks, the agent learns a spatially-correlated representation of visual and audio inputs that can be applied to work on environments with novel sounds and maps. Experiment results on realistic 3D scenes (Replica and Matterport3D) demonstrate that our method achieves better generalization performance when zero-shot transferred to scenes with unseen maps and unheard sound categories.
|
|
10:30-12:00, Paper TuAT26-NT.9 | Add to My Program |
Efficient 3D Instance Mapping and Localization with Neural Fields |
|
Tang, George | MIT |
Jatavallabhula, Krishna Murthy | MIT |
Torralba, Antonio | MIT |
Keywords: Mapping, Localization, Semantic Scene Understanding
Abstract: We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images. Towards this, we introduce 3DIML, a novel framework that efficiently learns a label field that may be rendered from novel viewpoints to produce view-consistent instance segmentation masks. 3DIML significantly improves upon training and inference runtimes of existing implicit scene representation based methods. Opposed to prior art that optimizes a neural field in a self-supervised manner, requiring complicated training procedures and loss function design, 3DIML leverages a two-phase process. The first phase, InstanceMap, takes as input 2D segmentation masks of the image sequence generated by a frontend instance segmentation model, and associates corresponding masks across images to 3D labels. These almost view-consistent pseudolabel masks are then used in the second phase, InstanceLift, to supervise the training of a neural label field, which interpolates regions missed by InstanceMap and resolves ambiguities. Additionally, we introduce InstanceLoc, which enables near realtime localization of instance masks given a trained label field and an off-the-shelf image segmentation model by fusing outputs from both. We evaluate 3DIML on sequences from the Replica and ScanNet datasets and demonstrate 3DIML’s effectiveness under mild assumptions for the image sequences. We achieve a 14-24× speedup over existing implicit scene representation methods with comparable quality, showcasing its potential to facilitate faster and more effective 3D scene understanding.
|
|
TuAT27-NT Oral Session, NT-G2 |
Add to My Program |
Grasping I |
|
|
Chair: Angelini, Franco | University of Pisa |
Co-Chair: Leutenegger, Stefan | Technical University of Munich |
|
10:30-12:00, Paper TuAT27-NT.1 | Add to My Program |
Grasp It Like a Pro 2.0: A Data-Driven Approach Exploiting Basic Shapes Decomposition and Human Data for Grasping Unknown Objects |
|
Palleschi, Alessandro | University of Pisa |
Angelini, Franco | University of Pisa |
Gabellieri, Chiara | University of Twente |
Park, Do Won | University of Pisa |
Pallottino, Lucia | Università Di Pisa |
Bicchi, Antonio | Fondazione Istituto Italiano Di Tecnologia |
Garabini, Manolo | Università Di Pisa |
Keywords: Grasping, Multifingered Hands, Perception for Grasping and Manipulation, Human-Driven Grasping
Abstract: With the improvements in their computational and physical intelligence, robots are now capable of operating in real-world environments. However, manipulation and grasping capabilities are still areas that require significant improvements. To address this, we introduce a new data-driven grasp planning algorithm called Grasp it Like a Pro 2.0. This algorithm utilizes a small number of human demonstrations to teach a robot how to grasp arbitrary objects. By decomposing objects into basic shapes, our algorithm generates candidate grasps that can generalize to different object's geometry. The algorithm selects the grasp to execute based on a selection policy that maximizes a novel grasp quality metric introduced in this article. This metric considers the complex interdependencies between the predicted grasp, the local approximation produced by the basic shape decomposition, and the gripper used. We evaluate our approach against multiple baselines using different grippers and objects. The results demonstrate the effectiveness of our method in generating and selecting high-quality and reliable grasps. With a soft underactuated robotic hand, our algorithm achieves a 94.0% success rate in 150 grasps across 30 different objects. Similarly, with a rigid gripper, it achieves an 85.0% success rate in 80 grasps across 16 different objects.
|
|
10:30-12:00, Paper TuAT27-NT.2 | Add to My Program |
Visual-Tactile Fusion for Transparent Object Grasping in Complex Backgrounds |
|
Li, Shoujie | Tsinghua Shenzhen International Graduate School |
Yu, Haixin | Tsinghua Shenzhen International Graduate School |
Ding, Wenbo | Tsinghua University |
Liu, Houde | Shenzhen Graduate School, Tsinghua University |
Ye, Linqi | Shanghai University |
Xia, Chongkun | Tsinghua University |
Wang, Xueqian | Center for Artificial Intelligence and Robotics, Graduate School |
Zhang, Xiao-Ping | Ryerson University |
Keywords: Grasping, Perception for Grasping and Manipulation, Force and Tactile Sensing, Visual-tactile fusion
Abstract: The grasping of transparent objects is challenging but of significance to robots. In this article, a visual-tactile fusion framework for transparent object grasping in complex backgrounds is proposed, which synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects. First, we propose a multi-scene synthetic grasping dataset named SimTrans12K together with a Gaussian-Mask annotation method. Next, based on the TaTa gripper, we propose a grasping network named transparent object grasping convolutional neural network (TGCNN) for grasping position detection,which shows good performance in both synthetic and real scenes. Inspired by human grasping, a tactile calibration method and a visual-tactile fusion classification method are designed, which improve the grasping success rate by 36.7% compared to direct grasping and the classification accuracy by 39.1%. Furthermore, the Tactile Height Sensing (THS) module and the Tactile Position Exploration (TPE) module are added to solve the problem of grasping transparent objects in irregular and visually undetectable scenes. Experimental results demonstrate the validity of the framework.
|
|
10:30-12:00, Paper TuAT27-NT.3 | Add to My Program |
Variable Stiffness Soft Robotic Fingers Using Snap-Fit Kinematic Reconfiguration |
|
Bastien, Jérôme | Polytechnique Montréal |
Birglen, Lionel | Ecole Polytechnique De Montreal |
Keywords: Grasping, Soft Robot Applications, Compliant Joint/Mechanism, Variable Stiffness
Abstract: Versatile and secure grasping in robotic systems remains a difficult challenge to address when objects possess a wide range of different properties (size, weight, friction coefficient, etc.). The human hand is often the primary source of inspiration for many technologies addressing this challenge and a notable feature of our hands is that they can vary their stiffness to match the requirements of the task, e.g. become stiffer or more compliant depending on specific requirements. Many robotic devices have been proposed in the literature mirroring this capability, either using an adjustable internal tension mechanism similar to what happens with human tendons or another physical phenomenon yielding the same effect. This paper proposes a new type of soft robotic fingers using a novel method to produce a variable stiffness achieved by modifying the kinematic structure of the fingers using snap-fit joints, a very simple alternative to most variable stiffness mechanisms. The resulting modification of the geometry and kinematics of the fingers, including their number of degrees of freedom, allows to greatly alter the intrinsic stiffness of the grasp produced by these fingers. A notable feature of the proposed new design is that one pair of fingers can be used to switch the stiffness of another pair if a dual arm robot is used.
|
|
10:30-12:00, Paper TuAT27-NT.4 | Add to My Program |
Grasp Transfer Based on Self-Aligning Implicit Representations of Local Surfaces |
|
Tekden, Ahmet | Chalmers University of Technology |
Deisenroth, Marc Peter | University College London |
Bekiroglu, Yasemin | Chalmers University of Technology, University College London |
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation
Abstract: Objects we interact with and manipulate often share similar parts, e.g. handles, that allow us to transfer our actions flexibly due to their shared functionality. This work addresses the problem of transferring grasp experience or demonstration to a novel object that shares shape similarities with objects a robot has previously experienced. Existing approaches to solving this problem are typically restricted to a specific object category or a parametric shape. Our approach, however, can transfer grasps associated with implicit models of local shapes shared across object categories. Specifically, we employ a single expert grasp demonstration during training to learn a local implicit surface representation model. At inference time, this model can be utilized to transfer grasps to novel objects by identifying the most similar-looking surfaces to the one on which the expert grasp is demonstrated. Our model is trained entirely in simulation and is evaluated on simulated and real-world objects that are not seen during training. Simulation results show that our method acquires better spatial precision and grasp accuracy compared to the baselines. Moreover, our method can successfully perform grasp transfer to unseen object categories, as shown in both simulation and real-world experiments.
|
|
10:30-12:00, Paper TuAT27-NT.5 | Add to My Program |
Amortized Inference for Efficient Grasp Model Adaptation |
|
Noseworthy, Michael | Massachusetts Institute of Technology |
Shaw, Seiji | Massachusetts Institute of Technology |
Kessens, Chad C. | United States Army Research Laboratory |
Roy, Nicholas | Massachusetts Institute of Technology |
Keywords: Probabilistic Inference, Transfer Learning, Deep Learning in Grasping and Manipulation
Abstract: In robotic applications such as bin-picking or block-stacking, learned predictive models have been developed for manipulation of objects with varying but known dynamic properties (e.g., mass distributions and friction coefficients). When a robot encounters a new object, these properties are often difficult to observe and must be inferred through interaction, which can be expensive in both inference time and number of interactions. We propose an encoder/decoder action-feasibility model to efficiently adapt to new objects by estimating their unobserved properties through interaction. The encoder predicts a distribution over the unobserved parameters while the decoder predicts action feasibility, which can be used in an uncertainty-aware planner. An explicit representation of uncertainty in the encoder enables information-gathering heuristics to minimize adaptation interactions. The amortized distributions are efficient to compute and perform comparably to particle-based distributions in a grasping domain. Finally, we deploy our method on a Panda robot to grasp heavy objects.
|
|
10:30-12:00, Paper TuAT27-NT.6 | Add to My Program |
Learning Realistic and Reasonable Grasps for Anthropomorphic Hand in Cluttered Scenes |
|
Duan, Haonan | Institute of Automation, Chinese Academy of Sciences |
Li, Yiming | Idiap Research Institute, École Polytechnique Fédérale De Lausan |
Li, Daheng | University of Chinese Academy of Sciences |
Wei, Wei | Institute of Automation, Chinese Academy of Sciences |
Huang, Yayu | Institute of Automation, Chinese Academy of Sciences |
Wang, Peng | Chinese Acdamy of Sciences |
Keywords: Grasping, Multifingered Hands, Deep Learning in Grasping and Manipulation
Abstract: Grasping is one of the most fundamental skills for humans to interact with objects. However, it remains a challenging problem for anthropomorphic hands, due to the lack of object affordance understanding and high-dimensional grasp planning. In this work, we propose an anthropomorphic hand grasping framework to learn realistic and reasonable grasps in cluttered scenes, which tackles the problem in three items: 1) graspable point segmentation; 2) hand grasp generation and 3) grasp optimization. Specifically, our method generates high-quality hand grasps efficiently without complete object models by learning graspable points, associated grasp configurations from observed point cloud in a parallel manner and optimizing predicted grasps based on hand-object contacts. Simulation experiments show that our model generates physical plausible grasps for the anthropomorphic hand effectively with over 70% success rate. Real-world experiments demonstrate that the model trained in simulation performs satisfactorily in real-world scenarios for unseen objects.
|
|
10:30-12:00, Paper TuAT27-NT.7 | Add to My Program |
FuncGrasp: Learning Object-Centric Neural Grasp Functions from Single Annotated Example Object |
|
Chen, Hanzhi | Technical University of Munich (TUM) |
Xu, Binbin | University of Toronto |
Leutenegger, Stefan | Technical University of Munich |
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: We present FuncGrasp, a framework that can infer dense yet reliable grasp configurations for unseen objects using one annotated object and single-view RGB-D observation via categorical priors. Unlike previous works that only transfer a set of grasp poses, FuncGrasp aims to transfer infinite configurations parameterized by an object-centric continuous grasp function across varying instances. To ease the transfer process, we propose Neural Surface Grasping Fields (NSGF), an effective neural representation defined on the surface to densely encode grasp configurations. Further, we exploit function-to-function transfer using sphere primitives to establish semantically meaningful categorical correspondences, which are learned in an unsupervised fashion without any expert knowledge. We showcase the effectiveness through extensive experiments in both simulators and the real world. Remarkably, our framework significantly outperforms several strong baseline methods in terms of density and reliability for generated grasps.
|
|
10:30-12:00, Paper TuAT27-NT.8 | Add to My Program |
Physical and Digital Adversarial Attacks on Grasp Quality Networks |
|
Alharthi, Naif | King's College London |
Brandao, Martim | King's College London |
Keywords: Grasping, Robot Safety
Abstract: Grasp Quality Networks are important components of grasping-capable autonomous robots, as they allow them to evaluate grasp candidates and select the one with highest chance of success. The widespread use of pick-and-place robots and Grasp Quality Networks raises the question of whether such systems are vulnerable to adversarial attacks, as that could lead to large economic damage. In this paper we propose two kinds of attacks on Grasp Quality Networks, one assuming physical access to the workspace (to place or attach a new object) and another assuming digital access to the camera software (to inject a pixel-intensity change on a single pixel). We then use evolutionary optimization to obtain attacks that simultaneously minimize the noticeability of the attacks and the chance that selected grasps are successful. Our experiments show that both kinds of attack lead to drastic drops in algorithm performance, thus making them important attacks to consider in the cybersecurity of grasping robots. Source code can be found at https://github.com/Naif-W-Alharthi/Physical-and-Digital-Attacks-on-Grasping-Networks
|
|
10:30-12:00, Paper TuAT27-NT.9 | Add to My Program |
Scaling Object-Centric Robotic Manipulation with Multimodal Object Identification |
|
Mitash, Chaitanya | Amazon Robotics |
Hussein, Mostafa | Unvirisity of New Hampshire |
Vanbaar, Jeroen | MERL |
Terhuja, Vikedo | Amazon Robotics |
Katyal, Kapil | Johns Hopkins University |
Keywords: Computer Vision for Automation, Deep Learning in Grasping and Manipulation, Recognition
Abstract: Robotic manipulation is a key enabler for automation in the fulfillment logistics sector. Such robotic systems require perception and manipulation capabilities to handle a wide variety of objects. Existing systems either operate on a closed set of objects or perform object-agnostic manipulation which lacks the capability for deliberate and reliable manipulation at scale. Object identification (ID) unlocks the ability for large-scale, object-centric manipulation by mapping object segments to one of the previously seen objects from a database. Nevertheless, it is often limited by the availability of reference data or coverage for objects in a database. In this work, we propose to perform object identification with multiple reference databases, including images and text references, each with a different coverage and matching challenge. We propose a training strategy that tackles the challenges of learning domain- invariant image embeddings, image-text matching and fusing predictions from different sources. We perform experiments over a recent benchmark with over 190K+ unique objects, extend the dataset with the additional reference sources and propose an evaluation strategy that simulates coverage for different reference sources. Model trained with the proposed learning pipeline shows robust performance over a range of simulation experiments.
|
|
TuAT28-NT Oral Session, NT-G4 |
Add to My Program |
Grippers and Other End-Effectors I |
|
|
Chair: Fang, Bin | Beijing University of Posts and Telecommunications / Tsinghua University |
Co-Chair: Li, Miao | Wuhan University |
|
10:30-12:00, Paper TuAT28-NT.1 | Add to My Program |
Single-Motor Robotic Gripper with Three Functional Modes for Grasping in Confined Spaces |
|
Nishimura, Toshihiro | Kanazawa University |
Watanabe, Tetsuyou | Kanazawa University |
Keywords: Grippers and Other End-Effectors, Mechanism Design, Grasping
Abstract: This study proposes a novel robotic gripper driven by a single motor. The main task is to pick up objects in confined spaces. For this purpose, the developed gripper has three operating modes: grasping, finger-bending, and pull-in modes. Using these three modes, the developed gripper can rotate and translate a grasped object, i.e., can perform in-hand manipulation. This in-hand manipulation is effective for grasping in extremely confined spaces, such as the inside of a box in a shelf, to avoid interference between the grasped object and obstacles. To achieve the three modes using a single motor, the developed gripper is equipped with two novel self-motion switching mechanisms. These mechanisms switch their motions automatically when the motion being generated is prevented. An analysis of the mechanism and control methodology used to achieve the desired behavior are presented. Furthermore, the validity of the analysis and methodology are experimentally demonstrated. The gripper performance is also evaluated through the grasping tests.
|
|
10:30-12:00, Paper TuAT28-NT.2 | Add to My Program |
A Force-Controlled Gripper Capable of Measuring Mechanical Properties of an Object |
|
Tsai, Yi-Shian | National Cheng Kung University Mechanical Engineering Department |
Yeh, Pin-Chun | National Cheng Kung University |
Huang, Chun-Hung | National Cheng Kung University |
Hsueh, I-Cheng | National Cheng Kung University |
Lan, Chao-Chieh | National Cheng Kung University |
Keywords: Grippers and Other End-Effectors, Force Control, Compliant Joints and Mechanisms
Abstract: Various sensorized grippers have been developed to handle delicate objects safely. These grippers have sensors mounted on their fingers’ surface that provide direct force measurements. However, multiple sensors are often required on one finger, leading to significant sensor placement and wire routing complexity. Finger-based sensors are limited to sensing external gripping force, and fingers cannot be easily replaced to meet the requirements of objects with specific geometries. To overcome the complexity and limitations of finger surface sensors, this paper proposes a force-controlled two-fingered gripper that relies on the deformation sensing of elastic elements in the drivetrain to obtain finger force. By using a minimum number of optical encoders placed in the drivetrain, accurate position and force sensing can be achieved at any location of each finger. When gripping an object, the size and stiffness of the object can thus be accurately measured. Simulation and experimental results demonstrate the proposed gripper’s merits. We expect this new gripper to provide a more competitive solution for robots that need to manipulate objects and check their mechanical qualities at the same time.
|
|
10:30-12:00, Paper TuAT28-NT.3 | Add to My Program |
Optimal Design of a Highly Self-Adaptive Gripper with Multi-Phalange Compliant Fingers for Grasping Irregularly Shaped Objects |
|
Liu, Chih-Hsing | National Cheng Kung University |
Yang, Sy-Yeu | NCKU |
Shih, Yi-Chieh | NCKU |
Keywords: Grippers and Other End-Effectors, Soft Robot Materials and Design, Compliant Joints and Mechanisms
Abstract: The development of a robotic gripper for handling objects of various sizes, shapes, weights, and degrees of hardness is a challenging problem in the field of robotics. In order to design a highly self-adaptive gripper capable of conforming to a wide range of objects, this article presents an innovative topology-optimized design of a compliant finger consisting of several multi-material phalanges connected by flexure hinges. The prototype was produced by means of a metamaterial approach, which utilizes 3D-printed infill structures (periodic cells) with different infill densities to represent regions with different equivalent mechanical properties. Adaptability tests were conducted to demonstrate the effectiveness of the proposed design in grasping circular, rectangular, trapezoidal, and concave objects. The results were compared with those of the fingers with single infill densities and a commercially available Festo MultiChoiceGripper, which features a Fin Ray structure. The total contact length between the fingers and the grasped object was used as a measure of the grippers’ adaptability. The test results demonstrate that this novel self-adaptive gripper is comparatively highly adaptable for grasping irregularly shaped objects and is able to carry a maximum payload of 6.76 kg.
|
|
10:30-12:00, Paper TuAT28-NT.4 | Add to My Program |
Accelerating Robotic Picking of Rigid Objects with a Compliant Pneumatic Gripper and an Impact-Aware Trajectory Plan |
|
Ostyn, Frederik | Ghent University |
Vanderborght, Bram | VUB |
Crevecoeur, Guillaume | Ghent University |
Keywords: Grippers and Other End-Effectors, Compliant Joints and Mechanisms, Industrial Robots
Abstract: Industrial robots are capable of moving at high speed. Each time they come into contact with their environment, e.g. to pick up an object, they decelerate to a near standstill. A solution involving a compliant pneumatic gripper and adapted trajectory plan is presented to initiate contact at a higher speed while remaining within hardware limits. By adding overload clutches in either the robot arm or gripper, tolerance to errors is provided. The key parameters such as gripper compliance and maximum allowed initial impact velocity are identified. Results show that by properly optimizing these parameters, robot picking of rigid objects can be accelerated. The complete high-speed picking solution is experimentally verified. A time reduction of 16% was obtained when making contact at 0.65 m/s.
|
|
10:30-12:00, Paper TuAT28-NT.5 | Add to My Program |
Vertical Vibratory Transport of Grasped Parts Using Impacts |
|
Yako, Connor | Stanford University |
Nowak, Jerome | Stanford University |
Yuan, Shenli | SRI International |
Salisbury, Kenneth | Stanford University |
Keywords: Grippers and Other End-Effectors, In-Hand Manipulation
Abstract: In this paper, we use impact-induced acceleration in conjunction with periodic stick-slip to successfully and quickly transport parts vertically against gravity. We show analytically that vertical vibratory transport is more difficult than its horizontal counterpart, and provide guidelines for achieving optimal vertical vibratory transport of a part. Namely, such a system must be capable of quickly realizing high accelerations, as well as supply normal forces at least several times that required for static equilibrium. We also show that for a given maximum acceleration, there is an optimal normal force for transport. To test our analytical guidelines, we built a vibrating surface using flexures and a voice coil actuator that can accelerate a magnetic ram into various surfaces to generate impacts. The surface was used to transport a part against gravity. Experimentally obtained motion tracking data confirmed the theoretical model. A series of grasping tests with a vibrating-surface equipped parallel jaw gripper confirmed the design guidelines.
|
|
10:30-12:00, Paper TuAT28-NT.6 | Add to My Program |
Bionic Soft Fingers with Hybrid Variable Stiffness Mechanisms for Multimode Grasping |
|
Wang, Xiangbo | College of Quality and Technology Supervising, Hebei University, |
Zhang, Tianran | Beihang University |
Yu, Hongze | Beijing University of Posts and Telecommunications |
Wen, Zhenwei | Beijing University of Aeronautics and Astronautic |
Fang, Lide | Hebei University |
Liu, Huaping | Tsinghua Univ |
Sun, Fuchun | Tsinghua Univerisity |
Lixue, Tang | Capital Medical University |
Fang, Bin | Beijing University of Posts and Telecommunications / Tsinghua Un |
Keywords: Grippers and Other End-Effectors, Grasping, Soft Sensors and Actuators
Abstract: This paper presents a novel Bionic Soft Finger (BSF) that aims to overcome the limitations of conventional rigid manipulators in terms of adaptability and safety, as well as the challenges faced by soft hands regarding carrying capacity and stability. The BSF design uses a hybrid variable stiffness mechanism combining memory alloy actuators with particle jamming to achieve the desired bending angle and actuator stiffness. Our innovative approach utilizes a bionic finger design that incorporates a memory alloy skeleton and a water-cooled recirculation system, leading to a substantial reduction in the time required for each operation. Through the integration of particle jamming, we have enhanced the overall stiffness and performance of the manipulator, enabling load capacities of up to 3N per finger and more than twice the stiffness of a normal condition. Additionally, our design enables multimode grasping and incorporates a liquid metal strain sensor (METT) for real-time monitoring of finger bending angles. Comparative analyses demonstrate that our design exhibits superior stiffness and enables five-mode grasping in comparison to pneumatic actuators. We believe that bionic soft fingers present a promising solution for enhancing adaptability, safety, and performance in human-robot interaction applications.
|
|
10:30-12:00, Paper TuAT28-NT.7 | Add to My Program |
Design and Fabrication of a Novel Miniature Magnetic Gripper |
|
Li, Mengde | The Institute of Technological Sciences, Wuhan University, Hubei |
Zhao, Fuqiang | School of Power and Mechanical Engineering, Wuhan University, Hu |
Li, Xiangli | WuHan University |
Li, Mingchang | Wuhan University |
Liu, Sheng | Wuhan University |
Li, Miao | Wuhan University |
Keywords: Grasping
Abstract: Small-scale robots hold significant promise in the field of minimally invasive surgery (MIS). In this paper, we present a miniature magnetic gripper and develop a datadriven kinematic model. The gripper comprises four fingers, wherein each finger has a maximum size not exceeding 3mm, 4mm and 5.5mm in three dimensions. By integrating permanent magnets and elastic ropes as internal actuation elements into the fingers, the gripper is equipped with the capability to open-close under an external magnetic field, facilitating the manipulation of small objects in confined spaces. Modeling and analysis of the magnetic gripper are undertaken, wherein the relationship between the open angle and the external magnetic field is established. The average error between the experimentally observed open angles and the model-predicted values is 2.31◦. Subsequent experiments demonstrated the necessity of the magnetic gripper model for precise manipulation, verified its excellent sensitivity to magnetic fields, and demonstrated its potential for future applications in MIS.
|
|
10:30-12:00, Paper TuAT28-NT.8 | Add to My Program |
Design of Highly Repeatable and Multi-Functional Grippers for Precision Handling with Articulated Robots |
|
Gümbel, Philip | Technische Universität Braunschweig, Insitute for Machine Tools |
Dröder, Klaus | Technische Universität Braunschweig |
Keywords: Grippers and Other End-Effectors, Embedded Systems for Robotic and Automation, Industrial Robots
Abstract: This paper presents a novel approach to designing, a low-cost gripper that is highly repeatable and functionally integrated. The gripper is optimized to compensate for gripping errors with particular consideration to potential challenges of articulated robots. The primary design goal is to achieve maximum repeatability during the gripping and releasing stages of a pick-and-place process for a chip-like silicon die. The design is centered around a custom printed circuit board integrates functionality for vision-based error compensation, vacuum level monitoring, part contact detection, and detection of abnormal vibrations. We detail our design requirements and specific design choices for the mechanical and electronic design and provide qualitative and quantitative experimental validation of the achieved repeatability and the integrated functions.
|
|
10:30-12:00, Paper TuAT28-NT.9 | Add to My Program |
Generalized Partially Destructive Disassembly Planning for Robotic Disassembly |
|
Hansjosten, Malte | Karlsruhe Institute of Technology (KIT) |
Baumgärtner, Jan | Karlsruhe Institute of Technology |
Fleischer, Jürgen | Karlsruhe Institute of Technology (KIT) |
Keywords: Motion and Path Planning, Disassembly, Industrial Robots
Abstract: While robotic assembly is a well researched topic, recycling and disassembly of products are also becoming ever more important as we transition to a more sustainable economy. In disassembly, we are typically only interested in a subset of product parts, which opens the possibility of using destructive processes such as tearing, cutting, or milling to speed up the disassembly. Currently, such destructive actions are only included as predefined case-specific actions such as milling away a screw head. By contrast, this paper presents a generalized approach to destructive disassembly planning that can automatically derive destructive disassembly actions from a symbolic representation of the disassembly state. Viable destructive actions are identified and verified only based on the underlying geometric model, circumventing the need for their explicit definition. We showcase the performance of this system both virtually on several test parts and physically by destructively and non-destructively disassembling a model of an electric motor using a robot manipulator with a multitool end effector.
|
|
TuAT29-NT Oral Session, NT-G5 |
Add to My Program |
Object Detection and Pose Estimation |
|
|
Chair: Wei, Jiaxin | Technical University of Munich |
Co-Chair: Del Bue, Alessio | Istituto Italiano Di Tecnologia |
|
10:30-12:00, Paper TuAT29-NT.1 | Add to My Program |
IFFNeRF: Initialisation Free and Fast 6DoF Pose Estimation from a Single Image and a NeRF Model |
|
Bortolon, Matteo | Istituto Italiano Di Tecnologia; Fondazione Bruno Kessler; Unive |
Tsesmelis, Theodore | Istituto Italiano Di Tecnologia |
James, Stuart | Durham University |
Poiesi, Fabio | Fondazione Bruno Kessler |
Del Bue, Alessio | Istituto Italiano Di Tecnologia |
Keywords: Deep Learning for Visual Perception
Abstract: We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF model. From these sampled points, we cast rays and deduce the color for each ray through pixel-level view synthesis. The camera pose can then be estimated as the solution to a Least Squares problem by selecting correspondences between the query image and the resulting bundle. We facilitate this process through a learned attention mechanism, bridging the query image embedding with the embedding of parameterized rays, thereby matching rays pertinent to the image. Through synthetic and real evaluation settings, we show that our method can improve the angular and translation error accuracy by 80.1% and 67.3%, respectively, compared to iNeRF while performing at 34fps on consumer hardware and not requiring the initial pose guess.
|
|
10:30-12:00, Paper TuAT29-NT.2 | Add to My Program |
VeloVox: A Low-Cost and Accurate 4D Object Detector with Single-Frame Point Cloud of Livox LiDAR |
|
Ma, Tao | Shanghai AI Laboratory |
Zheng, Zhiwei | UC Berkeley |
Zhou, Hongbin | Shanghai AI Lab |
Cai, Xinyu | Shanghai AI Laboratory |
Yang, Xuemeng | Shanghai Artificial Intelligence Laboratory |
Li, Yikang | IDG Capital |
Shi, Botian | Shanghai AI Laboratory |
Li, Hongsheng | Chinese University of Hong Kong |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Combining motion prediction in LiDAR-based 3D object detection is an effective method for improving overall accuracy, especially the downstream autonomous driving tasks. The recent development of low-cost LiDARs (e.g. Livox LiDAR) enables us to explore such 4D perception systems with a lower budget and higher performance. In this paper, we propose a 4D object detector, VeloVox, to establish accurate object detection and velocity estimation with a single-frame point cloud of Livox LiDAR. Based on the non-repetitive scanning pattern and point-level temporal nature, we propose a two-stage module to enhance the spatial-temporal point feature interaction along the time dimension. The aggregated feature also benefits a more accurate proposal refinement. To demonstrate the performance, comparison of VeloVox with several SOTA detector based baselines is evaluated on our in-house dataset and synthesized dataset built under Carla simulation. Code will be released at https://github.com/PJLab-ADG/VeloVox.
|
|
10:30-12:00, Paper TuAT29-NT.3 | Add to My Program |
MTRadSSD: A Multi-Task Single-Stage Detector for Object Detection and Free Space Analysis in Radar Point Clouds |
|
Li, Yinbao | Jiaxing Joospeed Electronics Technology Co. Ltd |
Yu, Songshan | Jiaxing Jospeed Electronics Technology Co. Ltd |
Dongfeng, Wang | Jiaxing Joospeed Electronics Technology Co. Ltd |
Jiao, Jingen | Jiaxing Joospeed Electronics Technology Co. Ltd |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Vision-Based Navigation
Abstract: Environmental perception tasks such as object detection and free space detection based on 3+1D radar severely suffer from the disorder and sparsity of point cloud. To tackle this problem, we propose a novel Multi-Task Radar-based Single Stage Detector, termed MTRadSSD, where we adopt instance-aware sampling strategies to discover multi-class road users and propose an occupancy map tool based on kernel density estimation (KDE) to make predictions in bird’s eye view (BEV). The denoised occupancy map also plays key role in generating polygon represented free space in the scene. As a result, our elaborated sampling strategies effectively retained useful semantic information and narrowed the difference of detection performance across object categories. Meanwhile, our MTRadSSD outperforms those state-of-the-art approaches in terms of real-time requirement and detection accuracy. In detail, the proposed method achieves an satisfactory speed of ˜16.7 ms per frame in experiments on the public radar point cloud dataset View-of-Delft (VOD). With IoU thresholds 0.5/0.25/0.25 the average prediction precision (AP) of easy-level objects (cars, pedestrians and cyclists) reaches at competitive 52.2%, 61.1%, 86.3%, respectively, while mean IoU of free space is 87.8%. Especially, the occupancy map also makes difference in improving prediction precision of object orientation dramatically to averaged 64.0%.
|
|
10:30-12:00, Paper TuAT29-NT.4 | Add to My Program |
Toward Accurate Camera-Based 3D Object Detection Via Cascade Depth Estimation and Calibration |
|
Wang, Chaoqun | The Chinese University of Hong Kong, Shenzhen |
Qin, Yiran | CUHKsz |
Kang, Zijian | NIO. Inc |
Ma, Ningning | NIO |
Zhang, Ruimao | The Chinese University of Hong Kong (Shenzhen) |
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization, Computer Vision for Automation
Abstract: Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces, as well as the accuracy of object localization within the 3D space. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization. Different from previous methods which directly predict depth distributions by using a supervised estimation model, we propose a cascade framework consisting of two depth-aware learning paradigms. First, a depth estimation (DE) scheme leverages relative depth information to realize the effective feature lifting from 2D to 3D spaces. Furthermore, a depth calibration (DC) scheme introduces depth reconstruction to further adjust the 3D object localization perturbation along the depth axis. In practice, the DE is explicitly realized by using both the absolute and relative depth optimization loss to promote the precision of depth prediction, while the capability of DC is implicitly embedded into the detection Transformer through a depth denoising mechanism in the training phase. The entire model training is accomplished through an end-to-end manner. We propose a baseline detector and evaluate the effectiveness of our proposal with +2.2%/+2.7% NDS/mAP improvements on NuScenes benchmark, and gain a comparable performance with 55.9%/45.7% NDS/mAP. Furthermore, we conduct extensive experiments to demonstrate its generality based on various detectors with about +2% NDS improvements.
|
|
10:30-12:00, Paper TuAT29-NT.5 | Add to My Program |
DA-RAW: Domain Adaptive Object Detection for Real-World Adverse Weather Conditions |
|
Jeon, Minsik | Agency for Defense Development |
Seo, Junwon | Agency for Defense Development |
Min, Jihong | Agency for Defense Development |
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: Despite the success of deep learning-based object detection methods in recent years, it is still challenging to make the object detector reliable in adverse weather conditions such as rain and snow. For the robust performance of object detectors, unsupervised domain adaptation has been utilized to adapt the detection network trained on clear weather images to adverse weather images. While previous methods do not explicitly address weather corruption during adaptation, the domain gap between clear and adverse weather can be decomposed into two factors with distinct characteristics: a style gap and a weather gap. In this paper, we present an unsupervised domain adaptation framework for object detection that can more effectively adapt to real-world environments with adverse weather conditions by addressing these two gaps separately. Our method resolves the style gap by concentrating on style-related information of high-level features using an attention module. Using self-supervised contrastive learning, our framework then reduces the weather gap and acquires instance features that are robust to weather corruption. Extensive experiments demonstrate that our method outperforms other methods for object detection in adverse weather conditions.
|
|
10:30-12:00, Paper TuAT29-NT.6 | Add to My Program |
SyMFM6D: Symmetry-Aware Multi-Directional Fusion for Multi-View 6D Object Pose Estimation |
|
Duffhauss, Fabian | Bosch Center for Artificial Intelligence |
Koch, Sebastian | Ulm University, Robert Bosch GmbH |
Ziesche, Hanna | Bosch BCAI |
Anh Vien, Ngo | Bosch GmbH |
Neumann, Gerhard | Karlsruhe Institute of Technology |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Object Detection, Segmentation and Categorization
Abstract: Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach fuses the RGB-D frames from multiple perspectives in a deep multi-directional fusion network and predicts predefined keypoints for all objects in the scene simultaneously. Based on the keypoints and an instance semantic segmentation, we efficiently compute the 6D poses by least-squares fitting. To address the ambiguity issues for symmetric objects, we propose a novel training procedure for symmetry-aware keypoint detection including a new objective function. Our SyMFM6D network significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation. We furthermore show the effectiveness of our symmetry-aware training procedure and demonstrate that our approach is robust towards inaccurate camera calibration and dynamic camera setups.
|
|
10:30-12:00, Paper TuAT29-NT.7 | Add to My Program |
Mutual Information-Calibrated Conformal Feature Fusion for Uncertainty-Aware Multimodal 3D Object Detection at the Edge |
|
Stutts, Alex Christopher | University of Illinois Chicago |
Erricolo, Danilo | University of Illinois at Chicago |
Ravi, Sathya | University of Illinois Chicago |
Tulabandhula, Theja | University of Illinois Chicago |
Trivedi, Amit Ranjan | University of Illinois at Chicago (UIC), Chicago, USA |
Keywords: Deep Learning for Visual Perception, Sensor Fusion, Probability and Statistical Methods
Abstract: In the expanding landscape of AI-enabled robotics, robust quantification of predictive uncertainties is of great importance. Three-dimensional (3D) object detection, a critical robotics operation, has seen significant advancements; however, the majority of current works focus only on accuracy and ignore uncertainty quantification. Addressing this gap, our novel study integrates the principles of conformal inference (CI) with information theoretic measures to perform lightweight, Monte Carlo-free uncertainty estimation within a multimodal framework. Through a multivariate Gaussian product of the latent variables in a Variational Autoencoder (VAE), features from RGB camera and LiDAR sensor data are fused to improve the prediction accuracy. Normalized mutual information (NMI) is leveraged as a modulator for calibrating uncertainty bounds derived from CI based on a weighted loss function. Our simulation results show an inverse correlation between inherent predictive uncertainty and NMI throughout the model's training. The framework demonstrates comparable or better performance in KITTI 3D object detection benchmarks to similar methods that are not uncertainty-aware, making it suitable for real-time edge robotics.
|
|
10:30-12:00, Paper TuAT29-NT.8 | Add to My Program |
RGB-Based Category-Level Object Pose Estimation Via Decoupled Metric Scale Recovery |
|
Wei, Jiaxin | Technical University of Munich |
Song, Xibin | Baidu |
Liu, Weizhe | Tencent |
Kneip, Laurent | ShanghaiTech University |
Li, Hongdong | Australian National University and NICTA |
Ji, Pan | Tencent |
Keywords: Deep Learning for Visual Perception, Visual Learning, Computer Vision for Automation
Abstract: While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors. RGB-only methods provide an alternative to this problem yet suffer from inherent scale ambiguity stemming from monocular observations. In this paper, we propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information, mainly facilitating the search for inlier 2D-3D correspondence. Meanwhile, a separate branch is designed to directly recover the metric scale of the object based on category-level statistics. Finally, we advocate using the RANSAC-PnP algorithm to robustly solve for 6D object pose. Extensive experiments have been conducted on both synthetic and real datasets, demonstrating the superior performance of our method over previous state-of-the-art RGB-based approaches, especially in terms of rotation accuracy. Code: https://github.com/goldoak/DMSR.
|
|
10:30-12:00, Paper TuAT29-NT.9 | Add to My Program |
Implicit Coarse-To-Fine 3D Perception for Category-Level Object Pose Estimation from Monocular RGB Image |
|
Li, Jia | Shandong University |
Jin, Li | Shandong University |
Song, Xibin | Baidu |
Chen, Yeheng | Zhejiang Lab |
Li, Nan | Zhejiang Lab |
Qin, Xueying | Shandong University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Deep Learning Methods
Abstract: Category-level object pose estimation demonstrates robust generalization capabilities that benefit robotics applications. However, exclusive reliance on RGB images without leveraging any 3D information introduces ambiguity in the translation and size of objects, leading to suboptimal performance. In this paper, we propose a framework for category-level pose estimation from a single RGB image in an end-to-end manner, i.e., Feature Auxiliary Perception Network (FAP-Net). To address inaccurate pose estimation caused by the inherent ambiguity of RGB images, we design a coarse-to-fine approach that first harnesses geometry supervision to facilitate coarse 3D feature perception and subsequently refines the features based on pose and size constraints. Experimental results on REAL275 and CAMERA25 demonstrate that FAP-Net achieves significant improvements (14.7% on 10°10cm and 11.4% on IoU50 on the real-scene REAL275 dataset) over the state-of-the-art and real-time inference (42 FPS).
|
|
TuAT30-NT Oral Session, NT-G6 |
Add to My Program |
AI-Based Methods |
|
|
Chair: Hamaya, Masashi | OMRON SINIC X Corporation |
Co-Chair: Dou, Qi | The Chinese University of Hong Kong |
|
10:30-12:00, Paper TuAT30-NT.1 | Add to My Program |
Vision-Language Interpreter for Robot Task Planning |
|
Shirai, Keisuke | Kyoto University |
Beltran-Hernandez, Cristian Camilo | Omron Sinic X |
Hamaya, Masashi | OMRON SINIC X Corporation |
Hashimoto, Atsushi | Omron Sinic X |
Tanaka, Shohei | OMRON SINIC X Corporation |
Kawaharazuka, Kento | The University of Tokyo |
Tanaka, Kazutoshi | OMRON SINIC X Corporation |
Ushiku, Yoshitaka | OMRON SINIC X Corpolation |
Mori, Shinsuke | Kyoto University |
Keywords: AI-Based Methods, Data Sets for Robot Learning
Abstract: Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By generating PDs from language instruction and scene observation, we can drive symbolic planners in a language-guided framework. We propose a Vision-Language Interpreter (ViLaIn), a new framework that generates PDs using state-of-the-art LLM and vision-language models. ViLaIn can refine generated PDs via error message feedback from the symbolic planner. Our aim is to answer the question: How accurately can ViLaIn and the symbolic planner generate valid robot plans? To evaluate ViLaIn, we introduce a novel dataset called the problem description generation (ProDG) dataset. The framework is evaluated with four new evaluation metrics. Experimental results show that ViLaIn can generate syntactically correct problems with more than 99% accuracy and valid plans with more than 58% accuracy.
|
|
10:30-12:00, Paper TuAT30-NT.2 | Add to My Program |
Trust-Region Neural Moving Horizon Estimation for Robots |
|
Wang, Bingheng | National University of Singapore |
Chen, Xuyang | National University of Singapore |
Zhao, Lin | National University of Singapore |
Keywords: AI-Based Methods, Machine Learning for Robot Control, Aerial Systems: Mechanics and Control
Abstract: Accurate disturbance estimation is essential for safe robot operations. The recently proposed neural moving horizon estimation (NeuroMHE), which uses a portable neural network to model the MHE's weightings, has shown promise in further pushing the accuracy and efficiency boundary. Currently, NeuroMHE is trained through gradient descent, with its gradient computed recursively using a Kalman filter. This paper proposes a trust-region policy optimization method for training NeuroMHE. We achieve this by providing the second-order derivatives of MHE, referred to as the MHE Hessian. Remarkably, we show that many of the intermediate results used to obtain the gradient, especially the Kalman filter, can be efficiently reused to compute the MHE Hessian. This offers linear computational complexity with respect to the MHE horizon. As a case study, we evaluate the proposed trust region NeuroMHE on real quadrotor flight data for disturbance estimation. Our approach demonstrates highly efficient training in under 5 min using only 100 data points. It outperforms a state-of-the-art neural estimator by up to 68.1% in force estimation accuracy, utilizing only 1.4% of its network parameters. Furthermore, our method showcases enhanced robustness to network initialization compared to the gradient descent counterpart.
|
|
10:30-12:00, Paper TuAT30-NT.3 | Add to My Program |
Multi-Category Decomposition Editing Network for the Accurate Visual Inspection of Texture Defects (I) |
|
Zhu, He | Hust |
Junyi, Li | Hust |
Yang, Hua | Huazhong University of Science and Technology |
Chen, Jiankui | Huazhong University of Science and Technology |
Yin, Zhouping | Professor, School of Mechanical Scienceand Engineering, Huazhong |
Keywords: AI-Based Methods, Computer Vision for Manufacturing
Abstract: Spotting blemished areas automatically on a textured surface is a particular challenge, as both nominal and defective surface samples are inconsistent in large-scale industrial manufacturing. The most efficient solution uses the memory bank extracted from the nominal samples to detect outliers. We approach our strategy, the multi-category decomposition editing network (MCDEN), from a similar viewpoint. Notably, we do not use defect-free samples. Instead, we use virtual results to construct a defect library. MCDEN decomposes abnormalities to basic elements from the library while editing outlier features to reconstruct the texture normality, offering a rational segmentation map through decomposition and reconstruction. Based on the strategy, MCDEN is more interpretable than most neural network methods since interpretability is particularly important in industry to ensure stability. Experiments on texture surface samples from the MVTAD dataset confirm the efficacy of MCDEN with a pixel-level AUC score of 96.6%. In other experiments collected from semi-manufactured inkjet printing OLED panels, MCDEN demonstrates competitive results with a 99.2% detection rate and rapid real-time detection capability.
|
|
10:30-12:00, Paper TuAT30-NT.4 | Add to My Program |
Kinematic-Aware Prompting for Generalizable Articulated Object Manipulation with LLMs |
|
Xia, Wenke | Renmin University of China |
Wang, Dong | Shanghai Artificial Intelligence Laboratory |
Pang, Xincheng | Renmin University of China |
Wang, Zhigang | Shanghai AI Laboratory |
Zhao, Bin | Northwestern Polytechnical University |
Hu, Di | Renmin University of China |
Li, Xuelong | Northwestern Polytechnical University |
Keywords: AI-Based Methods, Learning from Demonstration, Manipulation Planning
Abstract: Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories with only 17 demonstrations. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at href{https://github.com/GeWu-Lab/LLM_articulated_object_manipulation}{https://github.com/GeWu-Lab/LLM_articulated_object_manipulation}.
|
|
10:30-12:00, Paper TuAT30-NT.5 | Add to My Program |
ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning |
|
Zhou, Zhehua | University of Alberta |
Song, Jiayang | University of Alberta |
Yao, Kunpeng | Massachusetts Institute of Technology |
Shu, Zhan | University of Alberta |
Ma, Lei | The University of Tokyo & University of Alberta |
Keywords: AI-Based Methods, Manipulation Planning, Task and Motion Planning
Abstract: Motivated by the substantial achievements of Large Language Models (LLMs) in the field of natural language processing, recent research has commenced investigations into the application of LLMs for complex, long-horizon sequential task planning challenges in robotics. LLMs are advantageous in offering the potential to enhance the generalizability as task-agnostic planners and facilitate flexible interaction between human instructors and planning systems. However, task plans generated by LLMs often lack feasibility and correctness. To address this challenge, we introduce ISR-LLM, a novel framework that improves LLM-based planning through an iterative self-refinement process. The framework operates through three sequential steps: preprocessing, planning, and iterative self-refinement. During preprocessing, an LLM translator is employed to convert natural language input into a Planning Domain Definition Language (PDDL) formulation. In the planning phase, an LLM planner formulates an initial plan, which is then assessed and refined in the iterative self-refinement step by a validator. We examine the performance of ISR-LLM across three distinct planning domains. Our experimental results show that ISR-LLM is able to achieve markedly higher success rates in sequential task planning compared to state-of-the-art LLM-based planners. Moreover, it also preserves the broad applicability and generalizability of working with natural language instructions.
|
|
10:30-12:00, Paper TuAT30-NT.6 | Add to My Program |
Efficient Hybrid Neuromorphic-Bayesian Model for Olfaction Sensing: Detection and Classification |
|
Kausar, Rizwana | Khalifa University |
Zayer, Fakhreddine | Khalifa University |
Viegas, Jaime | Khalifa University |
Dias, Jorge | Khalifa University |
Keywords: AI-Based Methods
Abstract: Abstract—Olfaction sensing in autonomous robotics faces challenges in dynamic operations, energy efficiency, and edge processing. It necessitates a machine learning algorithm capable of managing real-world odor interference, ensuring resource efficiency for mobile robotics, and accurately estimating gas features for critical tasks such as odor mapping, localization, and alarm generation. This paper introduces a hybrid approach that exploits neuromorphic computing in combination with probabilistic inference to address these demanding requirements. Our approach implements a combination of a convolutional spiking neural network for feature extraction and a Bayesian spiking neural network for odor detection and identification. The developed algorithm is rigorously tested on a dataset for sensor drift compensation for robustness evaluation. Additionally, for efficiency evaluation, we compare the energy consumption of our model with a non-spiking machine learning algorithm under identical dataset and operating conditions. Our approach demonstrates superior efficiency alongside comparable accuracy outcomes.
|
|
10:30-12:00, Paper TuAT30-NT.7 | Add to My Program |
Disentangled Neural Relational Inference for Interpretable Motion Prediction |
|
Dax, Victoria Magdalena | Stanford University |
Li, Jiachen | University of California, Riverside |
Sachdeva, Enna | Honda Research Institute |
Agarwal, Nakul | Honda Research Institute |
Kochenderfer, Mykel | Stanford University |
Keywords: AI-Based Methods, Behavior-Based Systems, Probabilistic Inference
Abstract: Effective interaction modeling and behavior prediction of dynamic agents play a significant role in interactive motion planning for autonomous robots. Although existing methods have improved prediction accuracy, few research efforts have been devoted to enhancing prediction models’ interpretability and out-of-distribution (OOD) generalizability. This work addresses these two challenging aspects by designing a variational auto-encoder framework that integrates graph-based representations and time-sequence models to efficiently capture spatio-temporal relations between interactive agents and predict their dynamics. Our model infers dynamic interaction graphs in a latent space augmented with interpretable edge features that characterize the interactions. Moreover, we aim to enhance model interpretability and performance in OOD scenarios by disentangling the latent space of edge features, thereby strengthening model versatility and robustness. We validate our approach through extensive experiments on both simulated and real-world datasets. The results show superior performance compared to existing methods in modeling spatio-temporal relations, motion prediction, and identifying time-invariant latent features.
|
|
10:30-12:00, Paper TuAT30-NT.8 | Add to My Program |
DeFlow: Decoder of Scene Flow Network in Autonomous Driving |
|
Zhang, Qingwen | KTH Royal Institute of Technology |
Yang, Yi | KTH Royal Institute of Technology |
Fang, Heng | KTH Royal Institute of Technology |
Geng, Ruoyu | Hong Kong University of Science and Technology |
Jensfelt, Patric | KTH - Royal Institute of Technology |
Keywords: AI-Based Methods, Deep Learning Methods
Abstract: Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving. Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running. However, the voxelization process often results in the loss of point-specific features. This gives rise to a challenge in recovering those features for scene flow tasks. Our paper introduces DeFlow which enables a transition from voxel-based features to point features using Gated Recurrent Unit (GRU) refinement. To further enhance scene flow estimation performance, we formulate a novel loss function that accounts for the data imbalance between static and dynamic points. Evaluations on the Argoverse 2 scene flow task reveal that DeFlow achieves state-of-the-art results on large-scale point cloud data, demonstrating that our network has better performance and efficiency compared to others. The code is available at https://github.com/KTH-RPL/deflow.
|
|
10:30-12:00, Paper TuAT30-NT.9 | Add to My Program |
Subequivariant Reinforcement Learning Framework for Coordinated Motion Control |
|
Wang, Haoyu | Shanghai University of Engineering Science |
Tan, Xiaoyu | National University of Singapore |
Qiu, Xihe | Shanghai University of Engineering Science |
Qu, Chao | Inftech |
Keywords: Behavior-Based Systems, Reinforcement Learning, Control Architectures and Programming
Abstract: Effective coordination is crucial for motion control with reinforcement learning, especially as the complexity of agents and their motions increases. However, many existing methods struggle to account for the intricate dependencies between joints. We introduce CoordiGraph, a novel architecture that leverages subequivariant principles from physics to enhance coordination of motion control with reinforcement learning. This method embeds the principles of equivariance as inherent patterns in the learning process under gravity influence, which aids in modeling the nuanced relationships between joints vital for motion control. Through extensive experimentation with sophisticated agents in diverse environments, we highlight the merits of our approach. Compared to current leading methods, CoordiGraph notably enhances generalization and sample efficiency.
|
|
TuAT31-NT Oral Session, NT-G7 |
Add to My Program |
Industrial Robotics and Automation |
|
|
Chair: Stoeckl, Florian | DHBW Karlsruhe |
Co-Chair: Harada, Kanako | The University of Tokyo |
|
10:30-12:00, Paper TuAT31-NT.1 | Add to My Program |
A Large-Scale Suction-Based Climbing Parallel Robot for Wall Painting Application |
|
Rosyid, Abdur | Khalifa University |
El-Khasawneh, Bashar | Khalifa University |
Keywords: Field Robots, Industrial Robots, Parallel Robots
Abstract: This paper presents a large-scale climbing robot that employs a parallel mechanism with three translational degrees of freedom as its locomotion method. Using a robot frame having a triangular pyramid shape, the robot provides a good stability during the locomotion and task execution. Three suction cups, called the perimeter cups, are attached to the vertices of the robot’s pyramid base, whereas three other suction cups called the middle cups, are attached to the end-effector of the parallel mechanism. The climbing motion is made by attaching and releasing the perimeter and middle cups one after another. The synchronization between the parallel mechanism’s motion and the suction cups during locomotion, as well as the improved gait trajectory, was established to ensure successful climbing. The control scheme of the robot integrates the servo control, the suction control, and the application control in a modular fashion. The successful climbing of the robot proves the scalability of the proposed climbing robot using active suction cups with an optimized design. Finally, a painting application was presented to demonstrate the robot’s capability to perform a wall painting task.
|
|
10:30-12:00, Paper TuAT31-NT.2 | Add to My Program |
A Collision-Aware Cable Grasping Method in Cluttered Environment |
|
Zhang, Lei | University of Hamburg |
Bai, Kaixin | University of Hamburg |
Li, Qiang | Shenzhen Technology University |
Chen, Zhaopeng | University of Hamburg |
Zhang, Jianwei | University of Hamburg |
Keywords: Industrial Robots, Grasping, Transfer Learning
Abstract: We introduce a Cable Grasping-Convolutional Neural Network (CG-CNN) designed to facilitate robust cable grasping in cluttered environments. Utilizing physics simulations, we generate an extensive dataset that mimics the intricacies of cable grasping, factoring in potential collisions between cables and robotic grippers. We employ the Approximate Convex Decomposition technique to dissect the non-convex cable model, with grasp quality autonomously labeled based on simulated grasping attempts. The CG-CNN is refined using this simulated dataset and enhanced through domain randomization techniques. Subsequently, the trained model predicts grasp quality, guiding the optimal grasp pose to the robot's controller for execution. Grasping efficacy is assessed across both synthetic and real-world settings. Given our model's implicit collision sensitivity, we achieved commendable success rates of 92.3% for known cables and 88.4% for unknown cables, surpassing contemporary state-of-the-art approaches. Supplementary materials can be found at https://leizhang-public.github.io/cg-cnn/.
|
|
10:30-12:00, Paper TuAT31-NT.3 | Add to My Program |
A Simple Computationally Efficient Path ILC for Industrial Robotic Manipulators |
|
Schwegel, Michael | TU Wien |
Kugi, Andreas | TU Wien |
Keywords: Industrial Robots, Motion Control, Incremental Learning
Abstract: In this paper, a numerically efficient flexible control scheme for the absolute accuracy of industrial robots is presented and experimentally validated. A model-based controller that leverages all typically available parameters is combined with an online path iterative learning controller (ILC). The ILC law is employed to compensate for the unknown residual error dynamics caused by elastic and transmission effects. The proposed approach combines several benefits, including the possibility of a continuous execution of trials, a straightforward generalization of the learned data to different execution speeds, and learning from partial trials. The experimental validations on a 6-axis industrial robot with a laser tracker absolute measurement system show a 95% improvement in absolute accuracy after two trials. When the laser tracker is removed, the learned feedforward controller can sustain the accuracy achieved even without trial-by-trial learning.
|
|
10:30-12:00, Paper TuAT31-NT.4 | Add to My Program |
RoboGrind: Intuitive and Interactive Surface Treatment with Industrial Robots |
|
Alt, Benjamin | ArtiMinds Robotics |
Stoeckl, Florian | DHBW Karlsruhe |
Müller, Silvan | Baden-Württemberg Cooperative State University |
Braun, Christopher | University of Stuttgart, Institute of Industrial Manufacturing A |
Raible, Julian | University of Stuttgart |
Alhasan, Saad | DHBW - Karlsruhe |
Rettig, Oliver | DHBW Karlsruhe |
Ringle, Lukas Daniel | ArtiMinds Robotics |
Katic, Darko | Karlsruhe Institute for Technology (KIT) |
Jäkel, Rainer | Karlsruhe Institute of Technology |
Beetz, Michael | University of Bremen |
Strand, Marcus | Baden-Wuerttemberg Cooperative State University Karlsruhe |
Huber, Marco F. | University of Stuttgart |
Keywords: Industrial Robots, Software Tools for Robot Programming, Software Architecture for Robotic and Automation
Abstract: Surface treatment tasks such as grinding, sanding or polishing are a vital step of the value chain in many industries, but are notoriously challenging to automate. We present RoboGrind, an integrated system for the intuitive, interactive automation of surface treatment tasks with industrial robots. It combines a sophisticated 3D perception pipeline for surface scanning and automatic defect identification, an interactive voice controlled wizard system for the AI-assisted bootstrapping and parameterization of robot programs, and an automatic planning and execution pipeline for force-controlled robotic surface treatment. RoboGrind is evaluated both under laboratory and real-world conditions in the context of refabricating fiberglass wind turbine blades.
|
|
10:30-12:00, Paper TuAT31-NT.5 | Add to My Program |
An LLM-Driven Framework for Multiple-Vehicle Dispatching and Navigation in Smart City Landscapes |
|
Chen, Ruiqing | ShanghaiTech University |
Song, Wenbin | ShanghaiTech University |
Zu, Weiqin | ShanghaiTech University |
Dong, Zixin | Shanghaitech University |
Guo, Ze | Harbin Institute of Technology |
Sun, Fanglei | ShangTech University |
Tian, Zheng | ShanghaiTech University |
Wang, Jun | University College London |
Keywords: Automation Technologies for Smart Cities, Multi-Robot Systems, Autonomous Vehicle Navigation
Abstract: In the context of smart cities, autonomous vehicles, such as unmanned delivery vehicles and taxis are gradually gaining acceptance. However, their application scenarios remain significantly fragmented. Typically, an Autonomous Multi-Functional Vehicle (AMFV) is not engaged in other scenarios when idle in a specific one. Currently, a unified system capable of coordinating and using these resources efficiently is lacking. Moreover, there is an absence of an advanced navigation algorithm for facilitating coordinated navigation among Heterogeneous Vehicles (HVs). To address these issues, we propose the LLM-driven Multi-vehicle Dispatching and navigation (LiMeda) framework. It comprises an LLM-driven scheduling module that facilitates efficient allocation considering task scenarios and vehicle information, which addresses the issue of incompatible vehicle resources across various smart city scenarios. And the other is a navigation module, founded on the Heterogeneous Agent Reinforcement Learning (HARL) framework we previously proposed, which can effectively perform cooperative navigation tasks among heterogeneous agents, assisting the cooperative task completion by HVs in a smart city. Experimental results show our method outperforms both traditional scheduling algorithms and Reinforcement Learning navigation algorithms in metric terms. Additionally, it shows remarkable scalability and generalization under varying city scales, vehicle numbers, and task numbers.
|
|
10:30-12:00, Paper TuAT31-NT.6 | Add to My Program |
SCRNet: A Retinex Structure-Based Low-Light Enhancement Model Guided by Spatial Consistency |
|
Zhang, Miao | Tsinghua Shenzhen International Graduate School |
Shen, Yiqing | Johns Hopkins University |
Zhong, Shenghui | Zhongfa Aviation Institute,Beihang University |
Pan, Guofeng | Shenzhen Yijiahe Technologies |
Lu, Shuai | Tsinghua Shenzhen International Graduate School |
Keywords: Building Automation, Deep Learning Methods, Deep Learning for Visual Perception
Abstract: Images captured by robotics under low-light conditions are often plagued by several challenges, including diminished contrast, increased noise, loss of fine details, and unnatural color reproduction. These factors can significantly hinder the performance of computer vision tasks such as object detection and image segmentation. As a result, improving the quality of low-light images is of paramount importance for practical applications in the computer vision domain. To effectively address these challenges, we present a novel low-light image enhancement model, termed Spatial Consistency Retinex Network (SCRNet), which leverages the Retinex-based structure and is guided by the principle of spatial consistency. Specifically, our proposed model incorporates three levels of consistency: channel level, semantic level, and texture level, inspired by the principle of spatial consistency. These levels of consistency enable our model to adaptively enhance image features, ensuring more accurate and visually pleasing results. Extensive experimental evaluations on various low-light image datasets demonstrate that our proposed SCRNet outshines existing state-of-the-art methods, highlighting the potential of SCRNet as an effective solution for enhancing low-light images.
|
|
10:30-12:00, Paper TuAT31-NT.7 | Add to My Program |
Autonomous Field-Of-View Adjustment Using Adaptive Kinematic Constrained Control with Robot-Held Microscopic Camera Feedback |
|
Lin, Hung-Ching | University of Tokyo |
Marques Marinho, Murilo | The University of Manchester |
Harada, Kanako | The University of Tokyo |
Keywords: Robotics and Automation in Life Sciences, Robust/Adaptive Control
Abstract: Robotic systems for manipulation in millimeter scale often use a camera with high magnification for visual feedback of the target region. However, the limited field-of-view (FoV) of the microscopic camera necessitates camera motion to capture a broader workspace environment. In this work, we propose an autonomous robotic control method to constrain a robot-held camera within a designated FoV. Furthermore, we model the camera extrinsics as part of the kinematic model and use camera measurements coupled with a U-Net based tool tracking to adapt the complete robotic model during task execution. As a proof-of-concept demonstration, the proposed framework was evaluated in a bi-manual setup, where the microscopic camera was controlled to view a tool moving in a pre-defined trajectory. The proposed method allowed the camera to stay 94.1% of the time within the real FoV, compared to 54.4% without the proposed adaptive control.
|
|
10:30-12:00, Paper TuAT31-NT.8 | Add to My Program |
RoSSO: A High-Performance Python Package for Robotic Surveillance Strategy Optimization Using JAX |
|
John, Yohan | UC Santa Barbara |
Hughes, Connor | UC Santa Barbara |
Diaz-Garcia, Gilberto | University of California, Santa Barbara |
Marden, Jason | University of Colorado at Boulder |
Bullo, Francesco | UCSB |
Keywords: Surveillance Robotic Systems, Optimization and Optimal Control, Multi-Robot Systems
Abstract: To enable the computation of effective randomized patrol routes for single- or multi-robot teams, we present RoSSO, a Python package designed for solving Markov chain optimization problems. We exploit machine-learning techniques such as reverse-mode automatic differentiation and constraint parametrization to achieve superior efficiency compared to general-purpose nonlinear programming solvers. Additionally, we supplement a game-theoretic stochastic surveillance formulation in the literature with a novel greedy algorithm and multi-robot extension. We close with numerical results for a police district in downtown San Francisco that demonstrate RoSSO's capabilities on our new formulations and the prior work.
|
|
10:30-12:00, Paper TuAT31-NT.9 | Add to My Program |
Semi-Autonomous Surface-Tracking Tasks Using Omnidirectional Mobile Manipulators |
|
Suarez Zapico, Carlos | Edinburgh Centre for Robotics |
Petillot, Yvan R. | Heriot-Watt University |
Erden, Mustafa Suphi | Heriot-Watt University |
Keywords: Redundant Robots, Force and Tactile Sensing, Whole-Body Motion Planning and Control
Abstract: Despite the potential of mobile manipulators and applications where robots require a force-controlled physical interaction with the environment, the majority of robot automation nowadays is still based on fixed manipulators for free-motion tasks (e.g. welding, pick and place, or painting). In this work, we propose a control solution for omnidirectional mobile manipulators in force-tracking tasks, interacting with unknown surface geometries and with a human teleoperator in the control loop. Keeping a teleoperator in the loop makes the system widely applicable to unstructured environments. A human can take care, with little effort, of the mobile base navigation, self-collisions, and collisions with the environment, as well as selecting the area of the asset surface to process. The teleoperator interfaces with the robot platform by commanding motion in the mobile base in order to increase the workspace and maneuverability of the arm. The operator can also command the movement of the end-effector, sliding on the surface geometry to process a specific area. Alternatively, he can let the controller execute a parametric trajectory (spiral or raster) for an autonomous area coverage and meanwhile command the base in order to keep the arm in configurations with good dexterity. The autonomous controller, on the other hand, takes responsibility for following the unknown contour on the manipulated surface by only taking observations from a force/torque sensor attached to the arm's wrist, exerting a prescribed force, and handling the motion control in the base and the arm so that both can follow their respective task requests. Overall, we have developed a user-friendly control scheme, where an operator with little training and using a joystick, can guide the robot system to perform a physically interactive task on the surface of an asset.
|
|
TuAT32-NT Oral Session, NT-G8 |
Add to My Program |
Intelligent Transportation Systems I |
|
|
Chair: Suzuki, Tatsuya | Nagoya University |
Co-Chair: Dolan, John M. | Carnegie Mellon University |
|
10:30-12:00, Paper TuAT32-NT.1 | Add to My Program |
Towards Optimal Lane-Changing Coordination of CAVs in Multi-Lane Mixed Traffic Scenarios |
|
Ding, Yan | Xi'an Jiaotong University |
Mao, Yijun | Xi'an Jiaotong University |
Jiao, Chongshan | Xi'an Jiaotong University |
Ren, Pengju | Xi'an Jiaotong University |
Keywords: Intelligent Transportation Systems
Abstract: Lane changing is a fundamental but challenging operation for moving vehicles. Connected and Automated Vehicles(CAVs) enable autonomous vehicles to cooperate with each other to accomplish the lane changing tasks, profiting from their communication ability. However, dispatching CAVs in mixed traffic remains difficult due to the stochastic behaviors and uncertain intentions of Human-Driven Vehicles(HDVs). To tackle this issue, this paper devises a coordination approach based on Conflict-Based Search(CBS) theory. Firstly, HDVs are accurately modeled as constraints to enable usage of CBS in the mixed traffic. Additionally, virtual goals are introduced to search CAVs’ priority and outlets along with path finding. Furthermore, we optimize the performance of CBS in dense traffic by defining the concept of following vehicles. Experiments show that performance is improved by utilizing new conflict prioritizing rules and a heuristic value calculation method that derived from following vehicles. Finally, we introduce grouping vehicles to extend the proposed method for solving extremely dense and large instances at a scale of more than one hundred without significant loss in efficiency.
|
|
10:30-12:00, Paper TuAT32-NT.2 | Add to My Program |
Reducing Non-IID Effects in Federated Autonomous Driving with Contrastive Divergence Loss |
|
Do, Tuong | AIOZ |
Nguyen, Binh | AIOZ |
Tran, Quang | AIOZ |
Nguyen, Hien | AIOZ |
Tjiputra, Erman | AIOZ |
Chiu, Te-Chuan | National Tsing Hua University |
Nguyen, Anh | University of Liverpool |
Keywords: Intelligent Transportation Systems
Abstract: Federated learning has been widely applied in autonomous driving since it enables training a learning model among vehicles without sharing users' data. However, data from autonomous vehicles usually suffer from the non-independent-and-identically-distributed (non-IID) problem, which may cause negative effects on the convergence of the learning process. In this paper, we propose a new contrastive divergence loss to address the non-IID problem in autonomous driving by reducing the impact of divergence factors from transmitted models during the local learning process of each silo. We also analyze the effects of contrastive divergence in various autonomous driving scenarios, under multiple network infrastructures, and with different centralized/distributed learning schemes. Our intensive experiments on three datasets demonstrate that our proposed contrastive divergence loss significantly improves the performance over current state-of-the-art approaches.
|
|
10:30-12:00, Paper TuAT32-NT.3 | Add to My Program |
ODD-Based Query-Time Scenario Mutation Framework for Autonomous Driving Scenario Databases |
|
Tang, Yun | University of Warwick |
Raj, Dhanush | University of Warwick |
Zhao, Xingyu | University of Warwick |
Zhang, Xizhe | University of Warwick |
Bruto da Costa, Antonio | University of Warwick |
Khastgir, Siddartha | WMG, University of Warwick, UK |
Jennings, Paul | WMG, University of Warwick, UK |
Keywords: Intelligent Transportation Systems
Abstract: Large-scale scenario databases may contain hundreds of thousands of scenarios for the verification and validation (V&V) of autonomous vehicles (AV). Scenarios in the database are often labelled with semantic Operational Design Domain (ODD) tags (e.g., WeatherRainy, RoadTypeHighway and ActorTypeTruck) to be queried via exact tag matching. Such a scenario database design has two major limitations, i.e. combinatorial scenario generation inevitably leads to many redundant scenarios, and each ODD query matches only a small number of scenarios in the database (0.2% in our case study), rendering most of the database wealth wasted. We propose a novel scenario database design and the first ODD-based query-time scenario mutation framework to address the limitations. Our case study results show that the proposed framework has the potential to fully utilize all the database scenarios at query time while eliminating scenario redundancy in the database (in our case study, given the same ODD query, the number of final matched scenarios increased by 36 times, diversity increased by 99 times, and scenario database utilization rate increased from 0.2% to 36%).
|
|
10:30-12:00, Paper TuAT32-NT.4 | Add to My Program |
Cooperation for Scalable Supervision of Autonomy in Mixed Traffic |
|
Hickert, Cameron | Massachusetts Institute of Technology |
Li, Sirui | MIT |
Wu, Cathy | MIT |
Keywords: Intelligent Transportation Systems, Autonomous Agents, Human Factors and Human-in-the-Loop, Scalable Supervision
Abstract: Advances in autonomy offer the potential for dramatic positive outcomes in a number of domains, yet enabling their safe deployment remains an open problem. This work’s motivating question is: In safety-critical settings, can we avoid the need to have one human supervise one machine at all times? The work formalizes this scalable supervision problem by considering remotely located human supervisors and investigating how autonomous agents can cooperate to achieve safety. This article focuses on the safety-critical context of autonomous vehicles (AVs) merging into traffic consist- ing of a mixture of AVs and human drivers. The analysis establishes high reliability upper bounds on human supervision requirements. It further shows that AV cooperation can improve supervision reliability by orders of magnitude and counterintuitively requires fewer supervisors (per AV) as more AVs are adopted. These analytical results leverage queuing-theoretic analysis, order statistics, and a conservative, reachability-based approach. A key takeaway is the potential value of cooperation in enabling the deployment of autonomy at scale. While this work focuses on AVs, the scalable supervision framework may be of independent interest to a broader array of autonomous control challenges.
|
|
10:30-12:00, Paper TuAT32-NT.5 | Add to My Program |
Hierarchical Learned Risk-Aware Planning Framework for Human Driving Modeling |
|
Ludlow, Nathan | Brigham Young University |
Lyu, Yiwei | Carnegie Mellon University |
Dolan, John M. | Carnegie Mellon University |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle Navigation, Human-Aware Motion Planning
Abstract: This paper presents a novel approach to modeling human driving behavior, designed for use in evaluating autonomous vehicle control systems in a simulation environments. Our methodology leverages a hierarchical forward-looking, risk-aware estimation framework with learned parameters to generate human-like driving trajectories, accommodating multiple driver levels determined by model parameters. This approach is grounded in multimodal trajectory prediction, using a deep neural network with LSTM-based social pooling to predict the trajectories of surrounding vehicles. These trajectories are used to compute forward-looking risk assessments along the ego vehicle's path, guiding its navigation. Our method aims to replicate human driving behaviors by learning parameters that emulate human decision-making during driving. We ensure that our model exhibits robust generalization capabilities by conducting simulations, employing real-world driving data to validate the accuracy of our approach in modeling human behavior. The results reveal that our model effectively captures human behavior, showcasing its versatility in modeling human drivers in diverse highway scenarios.
|
|
10:30-12:00, Paper TuAT32-NT.6 | Add to My Program |
DESTINE: Dynamic Goal Queries with Temporal Transductive Alignment for Trajectory Prediction |
|
Karim, Rezaul | York University |
Mohamad Alizadeh Shabestary, Soheil | Huawei Technologies Canada |
Rasouli, Amir | Huawei Technologies Canada |
Keywords: Intelligent Transportation Systems, Intention Recognition, Long term Interaction
Abstract: Predicting temporally consistent road users' trajectories in a multi-agent setting is a challenging task due to the unknown characteristics of agents and their varying intentions. Besides using semantic map information and modeling interactions, it is important to build an effective mechanism capable of reasoning about behaviors at different levels of granularity. To this end, we propose Dynamic goal quErieS with temporal Transductive alIgNmEnt (DESTINE) method. Unlike prior approaches, our approach 1) dynamically predicts agents' goals irrespective of particular road structures, such as lanes, allowing the method to produce a more accurate estimation of destinations; 2) achieves map-compliant predictions by generating future trajectories in a coarse-to-fine fashion, where the coarser predictions at a lower frame rate serve as intermediate goals; and 3) uses an attention module designed to temporally align predicted trajectories via a masked attention operation. Using the common Argoverse benchmark dataset, we show that our method achieves state-of-the-art performance on various metrics, and further investigate the contributions of proposed modules via comprehensive ablation studies.
|
|
10:30-12:00, Paper TuAT32-NT.7 | Add to My Program |
Parallel Optimization with Hard Safety Constraints for Cooperative Planning of Connected Autonomous Vehicles |
|
Huang, Zhenmin | The Hong Kong University of Science and Technology |
Liu, Haichao | The Hong Kong University of Science and Technology |
Shen, Shaojie | Hong Kong University of Science and Technology |
Ma, Jun | The Hong Kong University of Science and Technology |
Keywords: Intelligent Transportation Systems, Path Planning for Multiple Mobile Robots or Agents, Optimization and Optimal Control
Abstract: The development of connected autonomous vehicles (CAVs) facilitates the enhancement of traffic efficiency in complicated scenarios. Difficulties remain unsolved in developing an effective and efficient coordination strategy for CAVs. In this paper, we formulate the cooperative autonomous driving task of CAVs as an optimal control problem with safety conditions enforced as hard constraints, and propose a computationally-efficient parallel optimization framework to generate strategies for CAVs with the travel efficiency improved and the hard safety constraints satisfied. Specifically, all constraints involved are addressed appropriately with convex approximation, such that the convexity property of the reformulated optimization problem is exhibited. Then, a parallel optimization algorithm is presented to solve the reformulated optimization problem, with an embodied iterative nearest neighbor search strategy to determine the optimal passing sequence. It is noteworthy that the travel efficiency is enhanced and the computation burden is considerably alleviated with the proposed innovation development. We also examine the proposed method in CARLA simulator and perform thorough comparisons to demonstrate the effectiveness and efficiency of the proposed approach.
|
|
10:30-12:00, Paper TuAT32-NT.8 | Add to My Program |
Editing Driver Character: Socially-Controllable Behavior Generation for Interactive Traffic Simulation |
|
Chang, Wei-Jer | University of California, Berkeley |
Tang, Chen | University of California Berkeley |
Li, Chenran | University of California, Berkeley |
Hu, Yeping | University of California, Berkeley |
Tomizuka, Masayoshi | University of California |
Zhan, Wei | Univeristy of California, Berkeley |
Keywords: Intelligent Transportation Systems, Simulation and Animation
Abstract: Traffic simulation plays a crucial role in evaluating and improving autonomous driving planning systems. After being deployed on public roads, autonomous vehicles need to interact with human road participants with different social preferences (e.g., selfish or courteous human drivers). To ensure that autonomous vehicles take safe and efficient maneuvers in different interactive traffic scenarios, we should be able to evaluate autonomous vehicles against reactive agents with different social characteristics in the simulation environment. We propose a socially-controllable behavior generation (SCBG) model for this purpose, which allows the users to specify the level of courtesy of the generated trajectory while ensuring realistic and human-like trajectory generation through learning from real-world driving data. Specifically, we define a novel and differentiable measure to quantify the level of courtesy of driving behavior, leveraging marginal and conditional behavior prediction models trained from real-world driving data. The proposed courtesy measure allows us to auto-label the courtesy levels of trajectories from real-world driving data and conveniently train an SCBG model generating trajectories based on the input courtesy values. We examine SCBG on the WOMD and show that we are able to control the SCBG model to generate realistic behaviors with desired courtesy levels. SCBG is able to identify different motion patterns of courteous behaviors according to the scenarios.
|
|
10:30-12:00, Paper TuAT32-NT.9 | Add to My Program |
Cognitive-Digital-Twin-Based Driving Assistance |
|
Diao, Junyu | Shanghaitech University |
Tang, Renzhi | ShanghaiTech University |
Gu, Yi | Shanghaitech Technology |
Tian, Sen | Southwestern University of Finance and Economics |
Jiang, Zhihao | ShanghaiTech University |
Keywords: Cognitive Modeling, Human-Centered Automation, Intention Recognition
Abstract: Advanced driver assistance systems (ADAS) have been developed to enhance driving safety by issuing timely warnings to drivers. However, current ADAS do not take into account the driver's cognitive state when delivering warnings, which can result in false alarms and impact the driver's trust in the system. To address this issue, we propose a Cognitive Digital-twin-based Assistance System (CDAS) that issues warnings tailored to the driver's perception of the driving environment and driving style. In this paper, we present a model of the driver's decision-making process that explicitly captures their perception of the driving environment, their utility evaluation of predicted future environments, and their driving style in terms of minimum acceptable risk. The cognitive digital twin of the driver is then created and updated by minimizing the discrepancy between the predicted and actual behaviors of the driver. With the cognitive digital twin, the CDAS warns the driver when there is a significant discrepancy between the predicted driving strategy based on partial observation and that based on full observation. This approach can more accurately identify risks that the driver is not aware of and provide warnings only when necessary. We conducted human and simulated experiments in a virtual driving environment, and our results demonstrate that our proposed CDAS has a similar perception of risky behaviors compared to humans. Furthermore, the digital twin learning framework can identi
|
|
TuAL-EX Poster Session, Exhibition Hall |
Add to My Program |
Late Breaking Results Poster I |
|
|
|
10:30-12:00, Paper TuAL-EX.1 | Add to My Program |
Soft Actuators and Metamaterials Based on Liquid-Vapor Phase Change |
|
Zhong, Yiding | Zhejiang University |
Tang, Wei | Zhejiang University |
Xu, Huxiu | Zhejiang University |
Zou, Jun | Zhejiang University |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Soft Sensors and Actuators
Abstract: Due to the elimination of bulky pumps and valves, liquid-vapor phase change composite actuators have become a promising technology to replace traditional soft pneumatic/hydraulic actuators, yet challenging in lacking complex programmable deformation and distributed deformation design, continuous energy consumption to maintain deformation, and slow actuating response speed. Here, we introduce a class of programmable thermochromic phase change soft actuators (PTSAs) and a class of phase change mechanical metamaterials (PMMs). PTSAs integrate low boiling point fluids for actuation and thermochromic microcapsules for color change. The initial shape and deformation type of PTSAs can be programmed into a variety of forms, with the innovative "two-dimensional" architecture. PMMs include temperature responsive low boiling point fluids and magnetically responsive carbonyl iron powders. Due to the periodic lattice structures, PMMs can be customized to diverse structures and deformations by designing the pattern configurations and positional relationships of a series of basic actuating units. Relying on the magnetic responsiveness, PMMs can achieve magnetically assisted shape locking and energy storage. Theoretical models and finite element simulations are developed to guide the design process. Using PTSAs and PMMs, we develop a series of applications that demonstrate their broad application prospects in soft robotics, wearable devices, flexible electronics, and other fields.
|
|
10:30-12:00, Paper TuAL-EX.2 | Add to My Program |
Active Mechanical Haptics with High-Fidelity Perceptions for Immersive Virtual Reality |
|
Zhang, Zhuang | Westlake University |
Jiang, Hanqing | Westlake University |
Keywords: Haptics and Haptic Interfaces, Soft Robot Materials and Design, Virtual Reality and Interfaces
Abstract: Human-centered mechanical sensory perceptions enable us to immerse ourselves in the physical environment by actively touching or holding objects so that we may feel their existence and their fundamental properties (e.g., stiffness or hardness). In a virtual environment, the replication of these active perceptions can create authentic haptic experiences, serving as an essential supplement for visual and auditory experiences. We present here a first-person, human-triggered haptic device enabled by curved origami that allows humans to actively experience touching of objects with various stiffness perceptions from soft to hard and from positive to negative ranges. This new device represents a significant shift away from third-person, machine-triggered, and passive haptics currently in practice. The device is synchronized with the virtual environment by changing its configuration to adapt various interactions by emulating body-centered physical perceptions, including hardness, softness, and sensations of crushing and weightlessness. The high-fidelity stiffness perceptions achieve an unprecedented experience of “what a user sees or is immersed in, is what the user feels or steps on”. Quantitative evaluations demonstrate that the active haptic device creates a highly immersive virtual environment, outperforming existing vibration-based passive devices. These concepts and resulting technologies create new opportunities and application potential for a more authentic virtual world.
|
|
10:30-12:00, Paper TuAL-EX.3 | Add to My Program |
The Flying Shovel Picker: A Drone-Mounted Shovel-Based Rotational Dual Arm System for Picking up Indeterminate Objects |
|
Senevirathna, Nilupul Nuwan | Shibaura Institute of Technology |
Premachandra, Chinthaka | Shibaura Institute of Technology |
Keywords: Aerial Systems: Mechanics and Control, RGB-D Perception, Dual Arm Manipulation
Abstract: This research delves into the dynamic synergy between drones and specialized gripping tools, envisioning a future where drones excel in skillfully retrieving objects from challenging and remote locations. Within this context, the core challenges associated with the deployment of drones for object retrieval are meticulously examined and addressed. These challenges encompass the accurate identification and approach of target objects, the achievement of precise landings, the delicate yet secure gripping of objects, the development of predictive capabilities, and the maintenance of airborne stability during transit. This study introduces innovative and pragmatic solutions to these multifaceted challenges, propelling us closer to a transformative era in remote object retrieval with drones. The culmination of this research not only identifies these challenges but also charts a course toward a future where drones become indispensable tools in seamlessly accessing, securing, and transporting objects from previously inaccessible realms. Through these groundbreaking advancements, the potential applications of drone technology are poised for unprecedented expansion, promising a new era of precision and efficiency in remote object retrieval.
|
|
10:30-12:00, Paper TuAL-EX.4 | Add to My Program |
Design and Evaluation on a Compliant Lower Limb Exoskeleton for Elderly Assistance |
|
Jin, Yinan | Zhejiang University of Technology |
Li, Zetong | Zhejiang University of Technology |
Cai, Shibo | Zhejiang University of Technology |
Bao, Guanjun | Zhejiang University of Technology, China |
Keywords: Human-Centered Robotics, Compliant Joints and Mechanisms, Performance Evaluation and Benchmarking
Abstract: The population of world elderly is increasing rapidly. Because of deterioration in gait-related parameters such as muscle activity, most of elder face difficulties in walking, which is the most basic life activity for them. Indeed, elderly people’s life quality might be significantly affected by impaired mobility. Therefore, developing wearable lower limb assistive exoskeletons should be a feasible solution. Assistive powered exoskeletons are capable of providing additional torque to support various activities of mobility impaired subjects, such as walking, sitting to standing, or standing to sitting. Rigid exoskeletons, which are inspired by industrial robots, usually have less degrees-of-freedom (DOFs) and require a quite big power supply for actuation, making themselves complicated in daily use. Compliant exoskeletons may have better application prospects in rehabilitation and elderly assistance. They have lighter weight, simpler mechanism and can be compliantly actuated when aiding elderly. Compliant exoskeletons are more widely applicable to different individuals. Therefore, the development of compliant lower limb assistive exoskeletons has great significance of social value and attracted interests from researchers.
|
|
10:30-12:00, Paper TuAL-EX.5 | Add to My Program |
TDEVO: Towards a Robust All-Day Visual Odometry by a Multimodal Fusion System |
|
Gu, Gong | Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Visual Tracking, Sensor Fusion, Vision-Based Navigation
Abstract: We present TDEVO, a novel multi-modal visual odometry system consists of an LWIR camera (Thermal Camera), a Depth camera, and an Event camera. The integration of these sensors enhances the system’s resilience to variations in environmental illumination and texture information. Based on our proposed alignment algorithm, the system is extensible to incorporate additional modalities of visual sensors.
|
|
10:30-12:00, Paper TuAL-EX.6 | Add to My Program |
Non-Prehensile Object Transport by Nonholonomic Robots Connected by Linear Deformable Elements |
|
Zhi, Hui | The Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Flexible Robotics, Mobile Manipulation, Task and Motion Planning
Abstract: This paper presents a new method to transport objects with mobile robots automatically via non-prehensile actions. Our approach utilizes a pair of nonholonomic robots connected by a deformable tube to manipulate objects of irregular shapes toward target locations efficiently. To autonomously perform this task, we developed a local integrated planning and control strategy that solves the problem in two steps (viz. enveloping and transport) based on the model predictive control (MPC) framework. The deformable underactuated system is simplified by a linear kinematic model. The enveloping problem is formulated as the minimization of multiple criteria that represent the enclosing error of the object by the variable morphology system. The transport problem is tackled by formulating the non-prehensile dragging action as an inequality constraint specified by the body frame of the deformable system along the path. Reactive obstacle avoidance is ensured by a maximum margin-based term that utilizes the system's geometry and the feedback proximity to the environment. To validate the performance of the proposed methodology, we report a detailed experimental study with vision-guided robotic prototypes conducting multiple autonomous object transport tasks.
|
|
10:30-12:00, Paper TuAL-EX.7 | Add to My Program |
Non-Prehensile Tool-Object Manipulation by Integrating Large Language Model-Based Planning and Manoeuvrability-Driven Controls |
|
Lee, Hoi-Yin | The Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Dual Arm Manipulation, Task and Motion Planning, Manipulation Planning
Abstract: Our paper presents a Large Language Model-based tool-object manipulation system with a non-prehensile control for a dual-arm robotic system. Our platform offers manoeuvrability-driven controls for motion and path planning in tool manipulation for object transportation. Our extensive experimentation demonstrates its effectiveness in providing an accurate performance with the integration of human natural language instruction and visual data even in a long-horizon task or in an environmentally constrained situation.
|
|
10:30-12:00, Paper TuAL-EX.8 | Add to My Program |
Formation Control of Multiple Nonholonomic Robots Along Parametric Curves |
|
Zhang, Bin | The Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Nonholonomic Mechanisms and Systems, Multi-Robot Systems
Abstract: Formation control technology can be used in various areas, such as object transportation, rescue, and environmental surveillance. In the previous literature, the desired patterns that agents need to form are usually specified by some absolute or relative positions, and those patterns are usually simple in terms of morphology, such as polygons and lines. This could cause problems when dealing with complicated cases where the desired pattern has a complex structure, such as curves. In this work, we consider adopting a more general way, the parametric curves, to represent the desired pattern the agents need to form and develop novel formation control methods to drive agents to form those parametric curves. In this way, our method has high flexibility to deal with formation asks with complex desired patterns.
|
|
10:30-12:00, Paper TuAL-EX.9 | Add to My Program |
Addressing Long-Horizon Sparse Reward Robotics Tasks: An Approach Leveraging Variational Autoencoders for Implicit Subgoal Planning |
|
Wang, Fangyuan | The Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Task and Motion Planning, Manipulation Planning, Autonomous Agents
Abstract: Humans perform long-term tasks by breaking them down into simpler subtasks that require planning and reasoning capabilities. Although robots can perform complex tasks, current methods are limited to short-term or human-guided tasks. Empowering robots to reason and plan for long-term tasks could expand their capabilities and advance the field of robotic automation. We propose an algorithm to tackle the challenges of long-horizon tasks in robotics. It divides tasks into simpler subgoals using a Variational Autoencoder (VAE)-based Subgoal Generator, a Hindsight Sampler, and a Value Selector. The Subgoal Generator has an explicit encoder model that generates subgoals and an implicit decoder model that predicts the final goal. The Hindsight Sampler selects valid subgoals from an offline dataset, and the Value Selector filters optimal subgoals from subgoal candidates. We tested VAESI on several long-horizon tasks in simulation and the real world and achieved promising results compared to other baseline methods.
|
|
10:30-12:00, Paper TuAL-EX.10 | Add to My Program |
A Novel CNN-BiLSTM Ensemble Model with Attention Mechanism for Sit-To-Stand Phase Identification Using Wearable Inertial Sensors |
|
Chen, Xin | Zhejiang University of Technology |
Cai, Shibo | Zhejiang University of Technology |
Yu, Longjie | Zhejiang University of Technology |
Li, Xiaoling | Zhejiang University of Technology |
Fan, Bingfei | Zhejiang University of Technology |
Du, Mingyu | Zhejiang University of Technology |
Liu, Tao | Zhejiang University |
Bao, Guanjun | Zhejiang University of Technology, China |
Keywords: Rehabilitation Robotics, Intention Recognition, Physical Human-Robot Interaction
Abstract: Sit-to-stand transition phase identification is vital in the control of a wearable exoskeleton robot for assisting patients to stand. We aim to propose a method for segmenting and identifying the sit-to-stand phase using two inertial sensors. First, we defined the sit-to-stand transition into five phases, namely, the initial sitting phase, the flexion momentum phase, the momentum transfer phase, the extension phase, and the stable standing phase based on the preprocessed acceleration and angular velocity data. We then employed a threshold method to recognize the initial sitting and the stable standing phases. Finally, we designed a novel CNN-BiLSTM-Attention algorithm to identify the three transition phases. Fifteen subjects were recruited to perform sit-to-stand transition experiments. A combination of the acceleration and angular velocity data features for the sit-to-stand transition phase identification were validated for the model performance improvements. The integration of the CNN, Bi-LSTM, and Attention modules demonstrated the reasonableness of the proposed algorithms. The experimental results showed that the proposed CNN-BiLSTM-Attention algorithm achieved the highest average classification accuracy of 99.5% for all five phases when compared to both traditional machine learning algorithms and deep learning algorithms on our customized dataset. The proposed sit-to-stand phase recognition algorithm could serve as a foundation for the control of wearable exoskeletons.
|
|
10:30-12:00, Paper TuAL-EX.11 | Add to My Program |
Safe Hierarchical Reactive Control of a Magnetic Microrobot with Multiple Constraints and Unknown Dynamics |
|
Liu, Yueyue | Jiangnan University |
Wang, Haoyu | Jiangnan University |
Fan, Qigao | Jiangnan University |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Nanomanufacturing
Abstract: In this presentation, three contributions are summarized as follows: 1.A hierarchical reactive control framework, i.e., high level kinematic trajectory reshaping controller and low level dynamics tracking controller, is designed for the magnetic microrobots which can ensure both the safety and stability. 2.A control barrier function (CBF) based kinematic planning method is designed to address various constrained environments for the magnetic microrobots, which can achieve reactive motion in the limited constraint range. 3.To obtain the robust and precious tracking performance in the magnetic microrobot dynamics, a control framework employing an adaptive neural network (NN) is crafted to accommodate model uncertainties and system nonlinearities. Besides, the stability and convergence of the low level controller are analysed.
|
|
10:30-12:00, Paper TuAL-EX.12 | Add to My Program |
3D Localization of Object Buried within Granular Materials Using a 3-Axis Tactile Sensor |
|
Chen, Zhengqi | Queen Mary University of London |
Versace, Elisabetta | Queen Mary University of London |
Jamone, Lorenzo | Queen Mary University London |
Keywords: Force and Tactile Sensing, Localization, Range Sensing
Abstract: In contrast to vision, which provides rich spatial information about objects, tactile sensing offers sparse data concentrated around the contact point. However, tactile sensing becomes crucial for reliable interaction with the environment, such as in the search for buried objects. Existing methods for this task rely solely on drag force at a single point of the end-effector, posing challenges for 3D localization. This paper presents an alternative approach utilizing a distributed 3-axis tactile sensor to predict real-time 3D localization (direction, depth, and distance) of object buried within granular material. Our learning-based model leverages the symmetry of tactile array feedback, eliminating the requirement for manual measurements of granular media properties. The proposed model underwent evaluation from real-world data with objects buried in various positions, revealing that employing multiple distributed tactile units enhances prediction accuracy.
|
|
10:30-12:00, Paper TuAL-EX.13 | Add to My Program |
Dynamics Modeling and Trajectory Tracking Control of a 6-DOF Modular Serial Orthogonal Manipulator |
|
Wen, Shuhuan | Yanshan University |
Min, Jiatai | Yanshan University |
Yu, Zhanqi | Yanshan |
Li, Yunxiao | Yanshan University |
|
|
10:30-12:00, Paper TuAL-EX.14 | Add to My Program |
Micro-X4: An Origami Robot for Biological Manipulation |
|
Feng, Bo | Zhejiang University |
Keywords: Micro/Nano Robots, Parallel Robots, Biological Cell Manipulation
Abstract: We demonstrate a 4-DoF micro origami parallel robot, Micro-X4, driven by 4 servo motors, which has large workspace, high stiffness and excellent repeatability. Configurable tools, like knife, needle, micro gripper, one or more, can be assembled on its output platform, which make cell puncture, cutting and injection efficient. We believe that Micro-X4 will have a promising future in microassembly and microsurgery.
|
|
10:30-12:00, Paper TuAL-EX.15 | Add to My Program |
Actual Shape-Based Obstacle Avoidance Synthesized by Velocity�Acceleration Minimization for Redundant Manipulators: An Optimization Perspective |
|
Ma, Boyu | Harbin Institute of Technology |
Liu, Yang | Harbin Institute of Technology |
Xie, Zongwu | Harbin Institute of Technology, China |
Li, Yuntao | Harbin Institute of Technology |
Shi, Jiaxiao | Harbin Institute of Technology |
Xie, Zongwu | Harbin Institute of Technology |
Keywords: Collision Avoidance, Manipulation Planning, Redundant Robots
Abstract: From the optimization perspective, this article proposes a novel actual shape-based obstacle avoidance synthesized by velocity–acceleration minimization (ASOA-VAM) scheme that performs operational tasks safely in a complex environment utilizing redundant manipulators. Concretely, an actual shape-based obstacle avoidance (ASOA) strategy with a variable magnitude escape acceleration using the Gilbert–Johnson–Keerthi distance algorithm is presented. Trajectory tracking, the end-effector’s errors feedback, and the joint multilevel physical limits (joint angle, -velocity, and -acceleration limits) avoidance are also incorporated into this optimization scheme. Meanwhile, the velocity–acceleration minimization (VAM) measure is developed. Combining the ASOA strategy with the VAM measure, the ASOA-VAM scheme is formed and further reformulated as a quadratic program (QP). Moreover, a recurrent neural network with theoretically provable convergence is designed to solve the QP online. Finally, simulations, comparisons, and experiments of a 7-degree-of-freedom manipulator with engineering applications illustrate the ASOA-VAM scheme’s effectiveness, accuracy, superiority, and physical realizability.
|
|
10:30-12:00, Paper TuAL-EX.16 | Add to My Program |
Vision-Based Tactile Information Extraction and Localization for Dexterous Grasping |
|
Yan, Teng | Shenzhen University |
Cai, Yaobang | Shenzhen Technology University |
Xia, Tian | Shenzhen University |
Li, Wenxian | Shenzhen Technology University |
Zhang, Yang | Shenzhen Technology University |
Keywords: Perception for Grasping and Manipulation, Grasping
Abstract: Due to the difficulty in acquiring tactile perception information during mechanical dexterous hand grasping and the complexity of multi-finger contact, robotic dexterous hand grasping has become a challenging problem. This study mainly accomplishes two tasks: 1) Acquiring object surface tactile information using only vision and 2) Real-time estimation of fingertip contact coordinates during dexterous hand grasping. The implementation methods are: 1) Proposing a point cloud texture feature extraction method based on normal vectors and grayscale value variance. 2) Utilizing the NVIDIA Isaac Sim platform to build the Dexterous Hand (RH8D) model, ensuring the simulation environment matches the physical properties and operational conditions of the real world, and simulating human-like grasping with 2, 3, 4, and 5 fingers, including everyday items of different materials and sizes, accurately calculating the spatial coordinates of fingertip contact points. Experimental results show that this method can extract clear object surface texture information through vision only, and accurately locate the contact points of the dexterous hand (with precision up to 10-3 m) in real time, providing a low-cost method for acquiring multimodal sensory information for robotic grasping technology. To promote scientific transparency and support the reproducibility of this study, the related source code and dataset have been made open source. Our project is available at https://github.com/Fenbid0605
|
|
10:30-12:00, Paper TuAL-EX.17 | Add to My Program |
Living Cells Actuated Bio-Syncretic Swimmer Based on Wireless Electric and Magnetic Control |
|
Yang, Lianchao | Shenyang Institute of Automation, Chinese Academy of Sciences |
Zhang, Chuang | Shenyang Institute of Automation Chinese Academy of Sciences |
Zhang, Qi | Shenyang Institute of Automation, Chinese Academy of Sciences |
Wang, Wenxue | Shenyang Institute of Automation, CAS |
Xi, Ning | The University of Hong Kong |
Liu, Lianqing | Shenyang Institute of Automation |
Keywords: Biologically-Inspired Robots, Biomimetics, Soft Robot Materials and Design
Abstract: Bio-syncretic robots actuated by living cells have attracted considerable attention due to their great potential to enhance robotic actuation performance, and significant progresses has been made in recent years. However, most of the current actuation control method used for bio-syncretic robot may restrict the robot kinematic dexterity. In this work, a bio-syncretic bionic dolphin swimmer actuated by a cultured living muscle tissue and steered by remote magnetic field has been developed. The robot could demonstrate wirelessly controllable swim with desired speed and direction, by adjusting the pulse stimulation of electrodes and remote magnetic field of electromagnetic coil. This work may be beneficial to the development of not only the bio-syncretic robots, but also the micro and miniature robots consist of electromechanical systems.
|
|
10:30-12:00, Paper TuAL-EX.18 | Add to My Program |
Design and Optimization of a Disinfection Robot System Framework Based on Clustering Model |
|
Ye, Jiajie | The University of Hong Kong |
Sheng, Yongji | The University of Hong Kong |
Wang, Siyu | The University of Hong Kong |
Ma, Ye | The University of Hong Kong |
Liu, Xinyu | The University of Hong Kong |
Tsoi, Tracy | The University of Hong Kong |
Xi, Ning | The University of Hong Kong |
Keywords: Mobile Manipulation, AI-Enabled Robotics, Service Robotics
Abstract: This work proposes a design and optimization method for a disinfection robot system framework based on a clustering model. The framework consists of a clustering module and a foundation framework. The clustering module performs clustering based on the Event-based Spatial Temporal and Logic Relationship (CESTLR), considering the spatial, temporal, and logical levels, which can effectively detect and associate normal/abnormal events in a dynamic environment. The foundation framework generates an initial task schedule based on the clustering results and interacts with the clustering module through real-time data to optimize the task schedule. This method can reliably cope with unexpected/uncertain events in a dynamic environment and improve service performance. Experimental results show that the proposed method outperforms baseline algorithms in terms of detection capability, service capability, and scheduling capability.
|
|
10:30-12:00, Paper TuAL-EX.19 | Add to My Program |
Multi-Task Learning for Monocular Depth Estimation and Semantic Edge Detection |
|
Wu, Deming | Shanghaitech |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Wang, Lei | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhu, Mengli | Shanghai Institute of Microsystem and Information Technology |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Deep Learning for Visual Perception, Visual Learning
Abstract: Depth estimation and semantic edge detection are two critical tasks in computer vision that have significantly progressed. How to associatively predict the depth and the semantic edge has yet to be explored. In this work, we proposed a flexible two-branch framework that can make the two tasks take advantage of each other. Specifically, an Enhanced Edge Weighting strategy is designed for the semantic edge detection branch, which learns weight information from the by-product of the depth branch, depth edge, to enhance edge perception in features. Meanwhile, we made depth estimation benefit from semantic edge detection by introducing the Depth Edge Semantic Classification module. Furthermore, a double reconstruction approach and semantic edge-guided disparity smoothing loss are presented to mitigate the ambiguities of the self-supervised manner for depth estimation. Experiments on the Cityscapes dataset demonstrate that our framework outperforms the state-of-the-art method in depth estimation and significantly improves semantic edge detection.
|
|
10:30-12:00, Paper TuAL-EX.20 | Add to My Program |
Semantic Segmentation Based on Feature Domain Adaptation |
|
Li, Jiao | Shanghai Institute of Microsystem and Information Technology |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Wang, Lei | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhu, Mengli | Shanghai Institute of Microsystem and Information Technology |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Deep Learning for Visual Perception, Recognition, Visual Learning
Abstract: To address the annotation cost, unsupervised domain adaptation is proposed to adapt the network trained on labeled synthetic data to unlabeled real-world data. However, most of these methods focus on domain distributions in the input and output stages. Therefore, a novel network named FeatDANet is presented to align feature-level domain distributions at each encoder layer. Specifically, two attention-based modules, IFAM and DFLM, are designed and implemented by mixing queries and keys between domains for advisable domain adaptation. Furthermore, FeatDANet is constructed as a self-training network with three weight-sharing branches, and an improved pseudo-labels learning strategy is suggested by identifying more confident pseudolabels and maximizing the use of pseudo-labels. Extensive experiments show that FeatDANet achieves state-of-the-art performances on the tasks of GTA→Cityscapes and Synthia→Cityscapes.
|
|
10:30-12:00, Paper TuAL-EX.21 | Add to My Program |
Self-Supervised Visual-Inertial Odometry with Scale Recovery |
|
Zhang, Tianyu | Chinese Academy of Sciences |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Wang, Lei | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhu, Mengli | Shanghai Institute of Microsystem and Information Technology |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Vision-Based Navigation, Deep Learning for Visual Perception
Abstract: Accurate localization for intelligent robots remains a significant challenge, and self-supervised visual-inertial odometry (VIO) has emerged as a promising solution. However, existing self-supervised VIO works consider inertial information as the ordinary data input, losing its ability to recover absolute scales and ignoring the modality difference of acceleration and angular velocity in inertial data. We proposed a novel self-supervised VIO framework that augments the odometry-related information implicit in inertial data. For the specific implementation, a self-attention-based IMU network is designed to denoise the raw IMU data and obtain the poses based on the denoised IMU data through an integrator. A Self-attention-based Scale Recovery module is proposed to recover the absolute scale by constructing the pose consistency constraint between it and the visual-inertial fused pose. Additionally, to avoid the interference of acceleration on rotation estimation, we designed a Decoupled PoseNet that employs different inputs and networks to learn rotation and translation. Odometry, scale, and depth evaluations on the KITTI odometry and Malaga datasets all reveal that our framework achieves state-of-the-art performance.
|
|
10:30-12:00, Paper TuAL-EX.22 | Add to My Program |
Geometry-Based Efficient Solution for PnP Problem |
|
Sun, Qixuan | Shanghai Institute of Microsystem and Information Technology |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Wang, Lei | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhu, Mengli | Shanghai Institute of Microsystem and Information Technology |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Vision-Based Navigation
Abstract: The Perspective-n-Point (PnP) problem aims to estimate pose from known 3D map points and their projections. Efficient PnP (EPnP), one of the classical PnP solvers, represents camera pose with control points, which are easier to estimate utilizing the least square formulation. However, the geometry refinement procedure performed by most EPnP-based methods is separated from the solution of LS formulation, which creates difficulty in balancing between minimizing the loss function and preserving the essential geometry properties of control points. To handle the problem, we integrated geometry constraints into control points formulation and reformulated the LS to quadratic constraints quadratic programming. We deduced an innovative analytical solution to the constrained EPnP problem, which is faster than the customarily applied numerical methods. An uncertainty-aware least square registration procedure is designed to compute camera pose from control points. Experiments in synthetic and natural data show that our methods outperform other state-of-the-arts.
|
|
10:30-12:00, Paper TuAL-EX.23 | Add to My Program |
Development of a Controllable Damping Constant Force Suspended Backpack |
|
Ju, Haotian | Harbin Institute of Technology |
Zhao, Sikai | Harbin Institute of Technology |
Guo, Songhao | Harbin Institute of Technology |
Li, Hongwu | Harbin Institute of Technology |
Wang, Ziqi | Harbin Institute of Technology |
Liu, Junchen | Harbin Institute of Technology |
Xiong, Quan | National University of Singapore |
Zhao, Jie | Harbin Institute of Technology |
Zhu, Yanhe | Harbin Institute of Technology |
Keywords: Prosthetics and Exoskeletons, Reinforcement Learning, Mechanism Design
Abstract: Suspended backpacks have been acknowledged for their advantages in load carriage. However, the existing suspended backpack could not eliminate the accelerative vertical force due to the nonzero suspension stiffness, limiting its adaptability to different load carriage tasks. In this paper, a controllable damping constant force suspended backpack was developed. The constant force mechanism was designed to close the suspension stiffness to zero and minimize the inertia force of the load. The controllable damping device was developed to solve the mismatch between the load and the constant force mechanism due to the system's friction. A controllable damping factor control method based on a Q-learning algorithm was proposed and simulated. The calibration experiments verified the correctness of the theory of controllable damping devices. Load displacement experiments showed that the controllable damping constant force suspended backpack could make the load return to the center from the slide deviation position, thus preventing the load from colliding with the limit position. The variable damping factor control method obtained using the Q-learning algorithm was superior to the constant damping factor control method. The maximum accelerative vertical force of the load was reduced by 92% compared to the ordinary backpack.
|
|
10:30-12:00, Paper TuAL-EX.24 | Add to My Program |
Kinodynamic Motion Planning Via Funnel Control under High-Level Tasks with Deadlines in Obstacle-Cluttered Environments |
|
Verginis, Christos | Uppsala University |
Dimarogonas, Dimos V. | KTH Royal Institute of Technology |
Kavraki, Lydia | Rice University |
Keywords: Planning under Uncertainty, Integrated Planning and Control, Formal Methods in Robotics and Automation
Abstract: We consider the problem of motion planning for high-dimensional robots under uncertain dynamics and timed temporal-logic specifications in obstacle-cluttered environments. Such specifications can describe complex, high-level objectives encoding both time and spatial constraints. Current approaches cannot address uncertainty in the dynamics while simultaneously accommodating time constraints for highly articulated robots, such as robotic manipulators. We propose an algorithm that combines sampling-based motion planning with feedback control and formal-verification techniques such that a robot with uncertain dynamics accomplishes a timed temporal-logic specification. In particular, we use Kinodynamic motion planning via Funnel control (KDF) to achieve timed robot navigation; KDF combines geometric sampling-based motion planning with funnel control to efficiently synthesize motion controllers that achieve safe robot navigation in the workspace in predetermined time intervals without using any information on the dynamics (a.k.a. differential constraints) or potential exogenous disturbances. This allows us to abstract the robot motion as a timed transition system and, using formal- verification methodologies, to synthesize controllers that achieve the timed temporal-logic specification. Experimental results on a 6-DOF robotic arm verify the efficiency of the proposed approach.
|
|
10:30-12:00, Paper TuAL-EX.25 | Add to My Program |
A Variable Stiffness Modular Structure |
|
Li, Xiaozheng | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Gao, Xing | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Cao, Chongjing | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Keywords: Mechanism Design, Actuation and Joint Mechanisms
Abstract: Hybridization of rigidity and flexibility has been a research focus in variable stiffness structures. However, existing technologies suffer from issues such as large volume, complex mechanisms, and limited range of stiffness variation. In this study, we propose an electrostatically adsorbed variable stiffness module, which leverages the high adsorption capacity of electrostatics to achieve lightweight, wide stiffness variation range, and hybrid rigidity-flexibility goals. This approach provides a novel solution for achieving rigidity-flexibility hybridization in flexible manipulators and holds promise for application in minimally invasive surgical robotic arms. By endowing the robotic arm with good compliance for safe human-robot interaction while maintaining certain rigidity to enhance operational precision and force output, this solution could significantly improve the performance of surgical robotic systems.
|
|
10:30-12:00, Paper TuAL-EX.26 | Add to My Program |
MRSRS: Modular Reconfigurable Space Robotic System for Future Space Exploration |
|
Li, Yuntao | Harbin Institute of Technology |
Zhao, Jingdong | Harbin Institute of Technology |
Guo, Chuangqiang | Harbin Institute of Technology |
Xu, Zichun | Harbin Institute of Technology, School of Mechatronics Engineeri |
Ma, Boyu | Harbin Institute of Technology |
Liu, Hong | Harbin Institute of Technology |
Keywords: Space Robotics and Automation, Product Design, Development and Prototyping, Field Robots
Abstract: Space robotic systems are the core equipment for on-orbit service missions. This work introduces the design, development, and experimental investigation of an advanced modular reconfigurable space robotic system (MRSRS) for future space exploration. MRSRS achieves its configuration reconfiguration in two methods: changing DOF through the standard interface (SI) and changing link lengths through the passive telescopic arm (PTA). SI and PTA are the core technologies of the MRSRS. First, a lightweight and reliable standard interface (PETLOCK) with load, power, and data transfer capacities is developed. PETLOCK features a genderless 3D face and an innovative locking mechanism, which ensures PETLOCK has a large misalignment tolerance and a high mechanical load capacity. PETLOCK can achieve power and data transmission through POGO pins. Additionally, PETLOCK can obtain visual information through a camera. Meanwhile, a PTA is developed. There is a locking mechanism inside the PTA. When the two ends of the robotic manipulator are fixed, the locking mechanism unlocks, PTA’s length change is achieved by joints trajectory planning. The PTA adopt a same locking mechanism as PETLOCK. Finally, the reconfiguration experiments of MRSRS were conducted. Compared to fixed-configuration space robots, the MRSRS has the characteristics of high adaptability, high robustness, and long-term economy. The MRSRS will open up a new field for future space exploration.
|
|
10:30-12:00, Paper TuAL-EX.27 | Add to My Program |
Co-Bot for Utility Solar Farms |
|
Santillan, Christopher | Iowa State University |
Bhattacharya, Sourabh | Iowa State University |
Daining, Stephen | Vermeer Corporation |
Keywords: Human Factors and Human-in-the-Loop, Human-Robot Collaboration, Human-Aware Motion Planning
Abstract: Can a CO-BOT help the solar industry with their labor shortage? The solar industry is expected to be the backbone of renewal energy with an exponential growth in the upcoming decades. However, the biggest challenge to this growth is labor shortage. Installing solar panels can be an exhausting, repetitive and tedious job, that most people would avert from doing. We propose a collaborative robot, that could work with a human to simplify this task. The robot would do the heavy lifting, while the human does the fine tuning of the installation.
|
|
TuBA1-CC Award Session, CC-Main Hall |
Add to My Program |
Human-Robot Interaction |
|
|
Chair: Laschi, Cecilia | National University of Singapore |
Co-Chair: Soh, Harold | National University of Singapore |
|
13:30-15:00, Paper TuBA1-CC.1 | Add to My Program |
POLITE: Preferences Combined with Highlights in Reinforcement Learning |
|
Holk, Simon | KTH Royal Institute of Technology |
Marta, Daniel | KTH Royal Institute of Technology |
Leite, Iolanda | KTH Royal Institute of Technology |
Keywords: Human Factors and Human-in-the-Loop, Reinforcement Learning, Representation Learning
Abstract: Many solutions to address the challenge of robot learning have been devised, namely through exploring novel ways for humans to communicate complex goals and tasks in reinforcement learning (RL) setups. One way that experienced recent research interest directly addresses the problem by considering human feedback as preferences between pairs of trajectories (sequences of state-action pairs). However, when simply attributing a single preference to a pair of trajectories that contain many agglomerated steps, key pieces of information are lost in the process. We amplify the initial definition of preferences to account for highlights: state-action pairs of relatively high information (high/low reward) within a preferred trajectory. To include the additional information, we design novel regularization methods within a preference learning framework. To this extent, we present our method which is able to greatly reduce the necessary amount of preferences, by permitting the highlighting of favoured trajectories, in order to reduce the entropy of the credit assignment. We show the effectiveness of our work in both simulation and a user study, which analyzes the feedback given and its implications. We also use the total collected feedback to train a robot policy for socially compliant trajectories in a simulated social navigation environment. We release code and video examples at https://sites.google.com/view/rl-polite
|
|
13:30-15:00, Paper TuBA1-CC.2 | Add to My Program |
CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting |
|
Schaldenbrand, Peter | Carnegie Mellon University |
Parmar, Gaurav | Carnegie Mellon University |
Zhu, Jun-Yan | Carnegie Mellon University |
McCann, James | Carnegie Mellon University |
Oh, Jean | Carnegie Mellon University |
Keywords: Human-Robot Collaboration, Art and Entertainment Robotics, Deep Learning Methods
Abstract: Prior robot painting and drawing work, such as FRIDA, has focused on decreasing the sim-to-real gap and expanding input modalities for users, but the interaction with these systems generally exists only in the input stages. To support interactive, human-robot collaborative painting, we introduce the Collaborative FRIDA (CoFRIDA) robot painting framework, which can co-paint by modifying and engaging with content already painted by a human collaborator. To improve text-image alignment, FRIDA's major weakness, our system uses pre-trained text-to-image models; however, pre-trained models in the context of real-world co-painting do not perform well because they (1) do not understand the constraints and abilities of the robot and (2) cannot perform co-painting without making unrealistic edits to the canvas and overwriting content. We propose a self-supervised fine-tuning procedure that can tackle both issues, allowing the use of pre-trained state-of-the-art text-image alignment models with robots to enable co-painting in the physical world. Our open-source approach, CoFRIDA, creates paintings and drawings that match the input text prompt more clearly than FRIDA, both from a blank canvas and one with human created work. More generally, our fine-tuning procedure successfully encodes the robot's constraints and abilities into a pre-trained text-to-image model, showcasing promising results as an effective method for reducing sim-to-real gaps.
|
|
13:30-15:00, Paper TuBA1-CC.3 | Add to My Program |
MateRobot: Material Recognition in Wearable Robotics for People with Visual Impairments |
|
Zheng, Junwei | Karlsruhe Institute of Technology |
Zhang, Jiaming | Karlsruhe Institute of Technology |
Yang, Kailun | Hunan University |
Peng, Kunyu | Karlsruhe Institute of Technology |
Stiefelhagen, Rainer | Karlsruhe Institute of Technology |
Keywords: Human-Centered Robotics, Object Detection, Segmentation and Categorization
Abstract: People with Visual Impairments (PVI) typically recognize objects through haptic perception. Knowing objects and materials before touching is desired by the target users but under-explored in the field of human-centered robotics. To fill this gap, in this work, a wearable vision-based robotic system, MateRobot, is established for PVI to recognize materials and object categories beforehand. To address the computational constraints of mobile platforms, we propose a lightweight yet accurate model MateViT to perform pixel-wise semantic segmentation, simultaneously recognizing both objects and materials. Our methods achieve respective 40.2% and 51.1% of mIoU on COCOStuff-10K and DMS datasets, surpassing the previous method with +5.7% and +7.0% gains. Moreover, on the field test with participants, our wearable system reaches a score of 28 in the NASA-Task Load Index, indicating low cognitive demands and ease of use. Our MateRobot demonstrates the feasibility of recognizing material property through visual cues and offers a promising step towards improving the functionality of wearable robots for PVI. The source code has been made publicly available at https://junweizheng93.github.io/publications/MATERobot/MATE Robot.html.
|
|
13:30-15:00, Paper TuBA1-CC.4 | Add to My Program |
Robot-Assisted Navigation for Visually Impaired through Adaptive Impedance and Path Planning |
|
Balatti, Pietro | Istituto Italiano Di Tecnologia |
Ozdamar, Idil | HRI2 Lab., Istituto Italiano Di Tecnologia. Dept. of Informatics |
Sirintuna, Doganay | HRI2 Lab., Istituto Italiano Di Tecnologia. Dept. of Informatics |
Fortini, Luca | Istituto Italiano Di Tecnologia |
Leonori, Mattia | Istituto Italiano Di Tecnologia |
Gandarias, Juan M. | University of Malaga |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Human-Centered Robotics, Human-Aware Motion Planning, Physical Human-Robot Interaction
Abstract: This paper presents a framework to navigate visually impaired people through unfamiliar environments by means of a mobile manipulator. The Human-Robot system consists of three key components: a mobile base, a robotic arm, and the human subject who gets guided by the robotic arm via physically coupling their hand with the cobot's end-effector. These components, receiving a goal from the user, traverse a collision-free set of waypoints in a coordinated manner, while avoiding static and dynamic obstacles through an obstacle avoidance unit and a novel human guidance planner. With this aim, we also present a legs tracking algorithm that utilizes 2D LiDAR sensors integrated into the mobile base to monitor the human pose. Additionally, we introduce an adaptive pulling planner responsible for guiding the individual back to the intended path if they veer off course. This is achieved by establishing a target arm end-effector position and dynamically adjusting the impedance parameters in real-time through a impedance tuning unit. To validate the framework we present a set of experiments both in laboratory settings with 12 healthy blindfolded subjects and a proof-of-concept demonstration in a real-world scenario.
|
|
13:30-15:00, Paper TuBA1-CC.5 | Add to My Program |
Incremental Learning of Full-Pose Via-Point Movement Primitives on Riemannian Manifolds |
|
Daab, Tilman | Karlsruhe Institute of Technology (KIT) |
Jaquier, Noémie | Karlsruhe Institute of Technology (KIT) |
Dreher, Christian R. G. | Karlsruhe Institute of Technology (KIT) |
Meixner, Andre | Karlsruhe Institute of Technology (KIT) |
Krebs, Franziska | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Incremental Learning, Learning from Demonstration
Abstract: Movement primitives (MPs) are compact representations of robot skills that can be learned from demonstrations and combined into complex behaviors. However, merely equipping robots with a fixed set of innate MPs is insufficient to deploy them in dynamic and unpredictable environments. Instead, the full potential of MPs remains to be attained via adaptable, large-scale MP libraries. In this paper, we propose a set of seven fundamental operations to incrementally learn, improve, and re-organize MP libraries. To showcase their applicability, we provide explicit formulations of the spatial operations for libraries composed of Via-Point Movement Primitives (VMPs). By building on Riemannian manifold theory, our approach enables the incremental learning of all parameters of position and orientation VMPs within a library. Moreover, our approach stores a fixed number of parameters, thus complying with the essential principles of incremental learning. We evaluate our approach to incrementally learn a VMP library from sequentially-provided motion capture data.
|
|
13:30-15:00, Paper TuBA1-CC.6 | Add to My Program |
Supernumerary Robotic Limbs to Support Post-Fall Recoveries for Astronauts |
|
Ballesteros, Erik | Massachusetts Institute of Technology |
Lee, Sang-Yoep | Seoul National University |
Carpenter, Kalind | Jet Propulsion Laboratory |
Asada, Harry | MIT |
Keywords: Human Performance Augmentation, Human-Robot Collaboration, Physical Human-Robot Interaction
Abstract: This paper proposes the utilization of Supernumerary Robotic Limbs (SuperLimbs) for augmenting astronauts during an Extra-Vehicular Activity (EVA) in a partial-gravity environment. We investigate the effectiveness of SuperLimbs in assisting astronauts to their feet following a fall. Based on preliminary observations from a pilot human study, we categorized post-fall recoveries into a sequence of statically stable poses called ``waypoints". The paths between the waypoints can be modeled with a simplified kinetic motion applied about a specific point on the body. Following the characterization of post-fall recoveries, we designed a task-space impedance control with high damping and low stiffness, where the SuperLimbs provide an astronaut with assistance in post-fall recovery while keeping the human-in-the-loop scheme. In order to validate this control scheme, a full-scale wearable analog space suit was constructed and tested with a SuperLimbs prototype. Results from the experimentation found that without assistance, astronauts would impulsively exert themselves to perform a post-fall recovery, which resulted in high energy consumption and instabilities maintaining an upright posture, concurring with prior NASA studies. When the SuperLimbs provided assistance, the astronaut's energy consumption and deviation in their tracking as they performed a post-fall recovery was reduced considerably.
|
|
TuBA2-CC Award Session, CC-301 |
Add to My Program |
Mechanisms and Design |
|
|
Chair: Chen, I-Ming | Nanyang Technological University |
Co-Chair: Yang, Guilin | Ningbo Institute of Material Technology and Engineering, Chinese Academy of Sciences |
|
13:30-15:00, Paper TuBA2-CC.1 | Add to My Program |
Lissajous Curve-Based Vibrational Orbit Control of a Flexible Vibrational Actuator with a Structural Anisotropy |
|
Miyazaki, Yuto | Graduate School of Engineering, Osaka University |
Higashimori, Mitsuru | Osaka University |
Keywords: Flexible Robotics
Abstract: This paper proposes a novel flexible vibrational actuator with a structural anisotropy and its control method to diversify the vibrational behavior. First, the analytical model of the proposed actuator, which comprises a rectangular cross-sectional flexible beam and a rotational-type motor, is introduced. Regarding the structural anisotropy, the rotational axis of the motor is nonparallel to both principal axes of bending stiffness of the beam. Then, the vibrational phenomenon of the actuator is theoretically revealed. It is shown that using the synthetic wave input constituting two sine waves based on the resonance frequencies for the principal axes of the beam, the vibrational orbit of the tip of the beam can be controlled in the same manner as the Lissajous curve. Finally, the proposed method is experimentally validated. The Lissajous curve-based vibrational orbit control is performed using a prototype actuator. Furthermore, an application to underactuated-type locomotor is demonstrated.
|
|
13:30-15:00, Paper TuBA2-CC.2 | Add to My Program |
Dynamic Modeling of Wing-Assisted Inclined Running with a Morphing Multi-Modal Robot |
|
Sihite, Eric | California Institute of Technology |
Ramezani, Alireza | Northeastern University |
Morteza, Gharib | CALTECH |
Keywords: Biologically-Inspired Robots, Motion Control, Dynamics
Abstract: Robot designs can take many inspirations from nature, where there are many examples of highly resilient and fault-tolerant locomotion strategies to navigate complex terrains by using multi-functional appendages. For example, Chukar and Hoatzin birds can repurpose their wings for quadrupedal walking and wing-assisted incline running (WAIR) to climb steep surfaces. We took inspiration from nature and designed a morphing robot with multi-functional thruster-wheel appendages that allows the robot to change its mode of locomotion by transforming into a rover, quad-rotor, mobile inverted pendulum (MIP), and other modes. In this work, we derive a dynamic model and formulate a nonlinear model predictive controller to perform WAIR to showcase the unique capabilities of our robot. We implemented the model and controller in a numerical simulation and experiments to show their feasibility and the capabilities of our transforming multi-modal robot.
|
|
13:30-15:00, Paper TuBA2-CC.3 | Add to My Program |
Design and Modeling of a Nested Bi-Cavity-Based Soft Growing Robot for Grasping in Constrained Environments |
|
Yong, Haochen | Huazhong University of Science and Technology |
Xu, Fukang | Huazhong University of Science and Technology |
Li, Chenfei | Huazhong University of Science and Technology |
Ding, Han | Huazhong University of Science and Technology |
Wu, Zhigang | Huazhong University of Science and Technology |
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Grasping
Abstract: Soft growing robots with unique navigation (tip extension by eversion) hold great promise in rescue, medical, and industrial applications. Equipping them with grasping capability would enhance their usefulness in constrained environments for various applications. However, in traditional designs, the tip’s eversion naturally conflicts with grasping, and the addition of grippers at the tip would limit navigation inevitably in constrained environments. To realize grasping in such scenes without extra devices, we propose a nested bi-cavity-based growing soft robot (BIBOT). The new design consists of two coaxially nested cavities, where the inner and outer cavities extend synchronously by inversion and eversion of the film rolls. Such a bi-cavity design enables the BIBOT to navigate and grasp without relative movements between the body and environment, and avoids contact between the object and its surroundings as well. Further, a kinematics model is established and verified to precisely control its lengthening and steering by a feed mechanism. Finally, its capability in a constrained environment is demonstrated by navigating and grasping an object in a curved pipe with a variable internal diameter.
|
|
13:30-15:00, Paper TuBA2-CC.4 | Add to My Program |
Optimized Design and Fabrication of Skeletal Muscle Actuators for Bio-Syncretic Robots |
|
Yang, Lianchao | Shenyang Institute of Automation, Chinese Academy of Sciences |
Zhang, Chuang | Shenyang Institute of Automation Chinese Academy of Sciences |
Wang, Ruiqian | Shenyang Institute of Automation, Chinese Academy of Sciences |
Zhang, Yiwei | Shenyang Institute of Automation, Chinese Academy of Sciences |
Liu, Lianqing | Shenyang Institute of Automation |
Keywords: Biologically-Inspired Robots, Soft Robot Materials and Design, Soft Sensors and Actuators
Abstract: In recent years, bio-syncretic robots actuated by living materials have received widespread attention. Among the common living materials, engineered skeletal muscle tissue (eSKT) has been the focus of researchers due to its high contraction force and good controllability. However, the current performance of eSKT is far from that of natural skeletal muscle tissue. In this paper, an optimized design method for eSKTs has been proposed. By combining simulation analysis with experiments, the eSKTs with multiple strips have been developed. The results show that under a specific volume (250 μL), the optimized strip structures can enhance the stability of eSKT and facilitate the penetration of nutrients and oxygen, leading to improved fusion of myoblasts and the directional arrangement of myotubes, thus improving the performance of eSKT. The eSKT with multiple strips exhibits a significant contraction force and has been successfully utilized in a bio-syncretic robot to demonstrate its actuation capability. This work may provide insights into the development of the field of bio-syncretic robots and even tissue engineering.
|
|
TuBT1-CC Oral Session, CC-303 |
Add to My Program |
Planning under Uncertainty II |
|
|
Chair: Kavraki, Lydia | Rice University |
Co-Chair: Berenson, Dmitry | University of Michigan |
|
13:30-15:00, Paper TuBT1-CC.1 | Add to My Program |
Stochastic Implicit Neural Signed Distance Functions for Safe Motion Planning under Sensing Uncertainty |
|
Quintero-Peña, Carlos | Rice University |
Thomason, Wil | Rice University |
Kingston, Zachary | Rice University |
Kyrillidis, Anastasios | Rice University |
Kavraki, Lydia | Rice University |
Keywords: Planning under Uncertainty, Motion and Path Planning
Abstract: Motion planning under sensing uncertainty is critical for robots in unstructured environments, to guarantee safety for both the robot and any nearby humans. Most work on planning under uncertainty does not scale to high-dimensional robots such as manipulators, assumes simplified geometry of the robot or environment, or requires per-object knowledge of noise. Instead, we propose a method that directly models sensor-specific aleatoric uncertainty to find safe motions for high-dimensional systems in complex environments, without exact knowledge of environment geometry. We combine a novel implicit neural model of stochastic signed distance functions with a hierarchical optimization-based motion planner to plan low-risk motions without sacrificing path quality. Our method also explicitly bounds the risk of the path, offering trustworthiness. We empirically validate that our method produces safe motions and accurate risk bounds and is safer than baseline approaches.
|
|
13:30-15:00, Paper TuBT1-CC.2 | Add to My Program |
Constrained Hierarchical Monte Carlo Belief-State Planning |
|
Jamgochian, Arec | Stanford University |
Buurmeijer, Hugo | Stanford University |
Wray, Kyle | N/a |
Corso, Anthony | Stanford University |
Kochenderfer, Mykel | Stanford University |
Keywords: Planning under Uncertainty, Constrained Motion Planning, Integrated Planning and Control
Abstract: Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.
|
|
13:30-15:00, Paper TuBT1-CC.3 | Add to My Program |
Estimating 3D Uncertainty Field: Quantifying Uncertainty for Neural Radiance Fields |
|
Shen, Jianxiong | IRI, CSIC-UPC |
Ren, Ruijie | IRI-CSIC |
Ruiz, Adrià | INRIA |
Moreno-Noguer, Francesc | CSIC |
Keywords: Planning under Uncertainty
Abstract: Current methods based on Neural Radiance Fields (NeRF) significantly lack the capacity to quantify uncertainty in their predictions, particularly on the unseen space including the occluded and outside scene content. This limitation hinders their extensive applications in robotics, where the reliability of model predictions has to be considered for robotic exploration and planning in the unknown environments. To address this, we propose a novel approach to estimate a 3D Uncertainty Field based on the learned incomplete scene geometry, which explicitly identifies these unseen regions in the scene. By considering the accumulated transmittance along each camera ray, our Uncertainty Field infers 2D pixel-wise uncertainty, exhibiting high values for rays directly casting towards occluded or external scene content. To quantify the uncertainty on the learned surface, we model a stochastic radiance field. Our experiments demonstrate that our approach is the only one that can explicitly reason about high uncertainty both on 3D unseen regions and its involved 2D rendered pixels, compared with recent methods. Furthermore, we illustrate that our designed uncertainty field is ideally suited for real-world robotics tasks, such as next-best-view selection.
|
|
13:30-15:00, Paper TuBT1-CC.4 | Add to My Program |
Online Adaptation of Sampling-Based Motion Planning with Inaccurate Models |
|
Faroni, Marco | Politecnico Di Milano |
Berenson, Dmitry | University of Michigan |
Keywords: Motion and Path Planning, Planning under Uncertainty, Integrated Planning and Learning
Abstract: Robotic manipulation relies on analytical or learned models to simulate the system dynamics. These models are often inaccurate and based on offline information, so that the robot planner is unable to cope with mismatches between the expected and the actual behavior of the system (e.g., the presence of an unexpected obstacle). These situations require the robot to use information gathered online to correct its planning strategy and adapt to the actual system response. We propose a sampling-based motion planning approach that uses an estimate of the model error and online observations to correct the planning strategy at each new replanning. Our approach adapts the cost function and the sampling distribution of a sampling-based kinodynamic motion planner when the outcome of the executed transitions is different from the expected one (e.g., when the robot unexpectedly collides with an obstacle) so that future trajectories will avoid unreliable motions. To infer the properties of a new transition, we introduce the notion of context-awareness, i.e., we store local environment information for each executed transition and avoid new transitions with context similar to previous unreliable ones. This is helpful for leveraging online information even if the simulated transitions are far (in the state-and-action space) from the executed ones. Simulation and experimental results show that the proposed approach increases the success rate in execution and reduces the number of replannings needed to reach the goal.
|
|
13:30-15:00, Paper TuBT1-CC.5 | Add to My Program |
Autonomous 3D Exploration in Large-Scale Environments with Dynamic Obstacles |
|
Wiman, Emil | Linköping University |
Widén, Ludvig | Linköping University |
Tiger, Mattias | AI and Integrated Computer Systems (AIICS), Linköping University |
Heintz, Fredrik | Linköping University |
Keywords: Planning under Uncertainty, Collision Avoidance, Task and Motion Planning
Abstract: Exploration in dynamic and uncertain real-world environments is an open problem in robotics and it constitutes a foundational capability of autonomous systems operating in most of the real-world. While 3D exploration planning has been extensively studied, the environments are assumed static or only reactive collision avoidance is carried out. We propose a novel approach to not only avoid dynamic obstacles but also include them in the plan itself, to deliberately exploit the dynamic environment in the agent's favor. The proposed planner, Dynamic Autonomous Exploration Planner (DAEP), extends AEP to explicitly plan with respect to dynamic obstacles. Furthermore, addressing prior errors within AEP in DAEP has resulted in enhanced exploration within static environments. To thoroughly evaluate exploration planners in such settings we propose a new enhanced benchmark suite with several dynamic environments, including large-scale outdoor environments. DAEP outperforms state-of-the-art planners in dynamic and large-scale environments and is shown to be more effective at both exploration and collision avoidance.
|
|
13:30-15:00, Paper TuBT1-CC.6 | Add to My Program |
MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation |
|
Liang, Jing | University of Maryland |
Gao, Peng | University of Massachussets Amherst |
Xiao, Xuesu | George Mason University |
Sathyamoorthy, Adarsh Jagan | University of Maryland |
Elnoor, Mohamed | University of Maryland |
Lin, Ming C. | University of Maryland at College Park |
Manocha, Dinesh | University of Maryland |
Keywords: Planning under Uncertainty, Task and Motion Planning, Motion and Path Planning
Abstract: We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments while ensuring comprehensive coverage of all traversable directions. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model that is enhanced with traversability constraints and an optimization formulation used for the coverage. We highlight the benefits of our approach over state-of-the-art trajectory generation approaches and demonstrate its performance in challenging and large outdoor environments, including around buildings, across intersections, along trails, and off-road terrain, using a Clearpath Husky and a Boston Dynamics Spot robot. In practice, our approach results in a 6% improvement in coverage of traversable areas and an 89% reduction in trajectory portions residing in non-traversable regions. Our video is here: https://youtu.be/3eJ2soAzXnU
|
|
13:30-15:00, Paper TuBT1-CC.7 | Add to My Program |
IBBT: Informed Batch Belief Trees for Motion Planning under Uncertainty |
|
Zheng, Dongliang | Georgia Tech |
Tsiotras, Panagiotis | Georgia Tech |
Keywords: Planning under Uncertainty, Motion and Path Planning, Constrained Motion Planning
Abstract: In this work, we propose the Informed Batch Belief Trees (IBBT) algorithm for motion planning under motion and sensing uncertainties. The original stochastic motion planning problem is divided into a deterministic motion planning problem and a graph search problem. First, we solve the deterministic planning problem using Rapidly-exploring Random Graph (RRG) to construct a nominal trajectory graph. Then, an informed cost-to-go heuristic for the original problem is computed based on the nominal trajectory graph. Finally, we grow a belief tree by searching the graph using the proposed heuristic. IBBT interleaves batch state sampling, nominal trajectory graph construction, heuristic computing, and searching over the graph to find belief space motion plans. IBBT is an anytime, incremental algorithm. With an increasing number of batches of samples added to the graph, the algorithm finds improved plans. IBBT is efficient by reusing results between sequential iterations. The belief tree search is an ordered search guided by an informed heuristic. We test IBBT in different planning environments. Our numerical investigation confirms that IBBT finds non-trivial motion plans and is faster compared with previous similar methods.
|
|
13:30-15:00, Paper TuBT1-CC.8 | Add to My Program |
Integrating Predictive Motion Uncertainties with Distributionally Robust Risk-Aware Control for Safe Robot Navigation in Crowds |
|
Ryu, Kanghyun | University of California, Berkeley |
Mehr, Negar | University of California Berkeley |
Keywords: Planning under Uncertainty, Robot Safety, Collision Avoidance
Abstract: Ensuring safe navigation in human-populated environments is crucial for autonomous mobile robots. Although recent advances in machine learning offer promising methods to predict human trajectories in crowded areas, it remains unclear how one can safely incorporate these learned models into a control loop due to the uncertain nature of human motion, which can make predictions of these models imprecise. In this work, we address this challenge and introduce a distributionally robust chance-constrained model predictive control (DRCC-MPC) which: (i) adopts a probability of collision as a pre-specified, interpretable risk metric, and (ii) offers robustness against discrepancies between actual human trajectories and their predictions. We consider the risk of collision in the form of a chance constraint, providing an interpretable measure of robot safety. To enable real-time evaluation of chance constraints, we consider conservative approximations of chance constraints in the form of distributionally robust Conditional Value at Risk constraints. The resulting formulation offers computational efficiency as well as robustness with respect to out-of-distribution human motion. With the parallelization of a sampling-based optimization technique, our method operates in real-time, demonstrating successful and safe navigation in a number of case studies with real-world pedestrian data.
|
|
13:30-15:00, Paper TuBT1-CC.9 | Add to My Program |
A GP-Based Robust Motion Planning Framework for Agile Autonomous Robot Navigation and Recovery in Unknown Environments |
|
Mohammad, Nicholas | University of Virginia |
Higgins, Jacob | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Planning under Uncertainty, Motion and Path Planning, Collision Avoidance
Abstract: For autonomous mobile robots, uncertainties in the environment and system model can lead to failure in the motion planning pipeline, resulting in potential collisions. In order to achieve a high level of robust autonomy, these robots should be able to proactively predict and recover from such failures. To this end, we propose a Gaussian Process (GP) based model for proactively detecting the risk of future motion planning failure. When this risk exceeds a certain threshold, a recovery behavior is triggered that leverages the same GP model to find a safe state from which the robot may continue towards the goal. The proposed approach is trained in simulation only and can generalize to real world environments on different robotic platforms. Simulations and physical experiments demonstrate that our framework is capable of both predicting planner failures and recovering the robot to states where planner success is likely, all while producing agile motion.
|
|
TuBT2-CC Oral Session, CC-311 |
Add to My Program |
Mechanism Design II |
|
|
Chair: Choi, Hyouk Ryeol | Sungkyunkwan University |
Co-Chair: Zhang, Hongying | National University of Singapore |
|
13:30-15:00, Paper TuBT2-CC.1 | Add to My Program |
ReC-Gripper: A Reconfigurable Combined Suction and Fingered Gripper for Various Logistics Picking and Stowing Tasks |
|
Um, Seunghwan | SungKyunKwan University |
Jeong, Heeyeon | Sungkyunkwan University |
Kim, ChunSoo | SKKU |
Rhee, Issac | Sungkyunkwan University |
Choi, Hyouk Ryeol | Sungkyunkwan University |
Keywords: Mechanism Design, Grippers and Other End-Effectors, Grasping
Abstract: This article presents a gripper comprising finger and suction with reconfigurable attributes. With the reconfiguration feature, the proposed gripper has a configuration suitable for different working environments of logistic order picking. The finger part of the gripper was configured with the parallelogram remote center of motion (RCM) mechanism to implement reconfigurable features. With the RCM mechanism, the gripper implements the function of zeroed offset, which removes the gap between the finger and the suction gripper, and the function of the supporting finger. The gripper shows higher grasping stability and practicality than existing grippers in order-picking tasks. First, the design of the mechanism and model constituting the gripper is described. Afterward, a quantitative evaluation of the performance of this gripper compared to the existing ones in the bin and shelf environment is conducted. In this section, the gripper shows 32.912% improved performance in representative tasks. Finally, the practical aspects of this gripper are described through a quantitative evaluation.
|
|
13:30-15:00, Paper TuBT2-CC.2 | Add to My Program |
Development of the Assembling System for Structure Transformable Humanoid with Attach-Lock-Detachable Magnetic Coupling |
|
Makabe, Tasuku | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Mechanism Design, Humanoid Robot Systems, Assembly
Abstract: We propose the method to adapt humanoids the ability to change the body structures that modular robots have by using Attach-Lock-Detachable Magnetic Couplings(ALDMag) to give the ability to detach and attach the robot body with an arm-type robot, and the system to manage the connection state of modularized body elements. Robots and we can use the ALDMag to attach and detach mechanical and electrical connections without actuators. Using xacro for writing the file of the robot model description of each module, we can construct a system that allows the robot to attach and detach modules during task operation. We demonstrated the effectiveness of the proposed method by achieving assembly experiments of a small robot with a life-size arm and experiments with environmental contacts by the small robot.
|
|
13:30-15:00, Paper TuBT2-CC.3 | Add to My Program |
Design of a Deployable Continuum Robot Using Elastic Kirigami-Origami |
|
Li, Yunong | Harbin Institute of Technology, Shenzhen |
Huang, Hailin | Harbin Institute of Technology, Shenzhen |
Li, Bing | Harbin Institute of Technology (Shenzhen) |
Keywords: Mechanism Design, Kinematics
Abstract: Inspired by Yoshimura origami, this study presents a novel deployable modular continuum robot that achieves configuration maintenance by utilizing active cables and passive elastic deformation of kirigami-origami. The synchronous motion of each module is improved by using slider-crank mechanisms. Using screw theory, the comprehensive kinematics of the proposed deployable kirigami-origami robot were analyzed, explaining how elastic restoring force is generated in each module. A physical prototype was developed, and the performance of this origami-inspired continuum robot was evaluated by comparing the motion properties of the proposed robot with the robot without elastic rings and synchronism mechanisms. Besides, position accuracy, trajectory tracking ability, stiffness, and load capacity experiments were also conducted. By integrating a pneumatic soft hand at the end of the proposed robot, an object-grasping experiment was conducted to verify the feasibility.
|
|
13:30-15:00, Paper TuBT2-CC.4 | Add to My Program |
Design and Control of a Transformable Multi-Mode Mobile Robot |
|
Li, Haoran | Guangdong University of Technology |
Bu, Yongzhong | Guangdong University of Technology |
Bu, Yongjian | Guangdong University of Technology |
Mao, Shixin | University of Science and Technology of China |
Guan, Yisheng | Guangdong University of Technology |
Zhu, Haifei | Guangdong University of Technology |
Keywords: Mechanism Design, Kinematics
Abstract: Conventional mobile robots typically include a single locomotion mode and require additional arms to transport objects. To address the challenges of traversing in diverse environments and transporting objects, a novel transformable multi-mode Mecanum-wheeled mobile robot is proposed in this paper. Owing to its unique foreleg design, the robot could operate either in the quadrilateral four-wheel mode, collinear four-wheel mode, or upright two-wheel mode; it could even smoothly switch between any two modes by re-arranging the foreleg wheels. When standing with its forelegs raised, the robot can carry objects and transport them to a predetermined destination. The design and operational modes of the robot were explored in detail. The kinematics and control of the different operational modes were analyzed and experimentally verified. The results indicate that the developed robot can perform versatile locomotion to accomplish object transportation in diverse environments by utilizing its foreleg-wheel mechanisms. Furthermore, the robot experiences an additional angular velocity because of an asymmetric arrangement of the front and rear Mecanum wheels, which differs from conventional symmetric arrangements.
|
|
13:30-15:00, Paper TuBT2-CC.5 | Add to My Program |
HyperLeg: Biomechanics-Inspired High-DOF Leg and Toe Mechanism for Highly Dynamic Motions |
|
Kim, Do-yun | KOREATECH University |
Yun, Seong-Ho | Koreatech |
Lee, Joong-Kyung | Korea University of Technology and Education (KOREATECH) |
Yoon, JongJun | Koreatech |
Nam, Dongyun | Korea University of Technology and Education (KOREATECH) |
Maeng, Chan-Young | Korea University of Technology and Education (KOREATECH) |
Kim, Yong-Jae | Korea University of Technology and Education |
Keywords: Mechanism Design, Legged Robots, Compliant Joints and Mechanisms
Abstract: A human foot with high degrees of freedom (DOF) that has multi-DOF toe joints and a two-DOF ankle joint provides multiple benefits, such as increased stride length and walking speed, impact mitigation, and enhanced balancing. However, creating such mechanisms for legged robots has been challenging due to increased complexity, heavy weight, and vulnerability to impact. In this paper, a novel leg and toe mechanism inspired by human biomechanics, featuring a one-DOF knee joint, two-DOF ankle joint, and one-DOF toe joint, is developed. All actuators are located at the proximal part of the thigh frame to minimize the distal mass. High payload timing belts and unique linkage mechanisms are utilized in the transmission to achieve high backdrivability and high joint stiffness. Actuation torques are intentionally coupled inspired by human anatomy, enduring the high propulsive force to the ground for dynamic movements, such as jumping. The implemented leg and toe mechanisms weigh 8.16 kg, and the height from the ground to the hip center is 786 mm. The proposed mechanism has been proven to be effective through force test and distance jump experiments.
|
|
13:30-15:00, Paper TuBT2-CC.6 | Add to My Program |
Design of a Towing System by Multi Autonomous Sailboats |
|
Liang, Cheng | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Lin, Bairun | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Qian, Huihuan (Alex) | The Chinese University of Hong Kong, Shenzhen |
Keywords: Mechanism Design, Motion Control, Actuation and Joint Mechanisms
Abstract: For researchers or administrators of relevant institutions who need to collect hydrological data of a certain water area, using autonomous sailboats to tow floating detection equipment is an energy-saving and convenient scheme for deploying detectors. However, due to the limited pulling force provided by a single autonomous sailboat, this scheme is not suitable for the floating equipment with large mass. This paper proposes a new approach for multiple autonomous sailboats to tow floating objects. A system of linear arrangement and connection of two autonomous sailboats is considered as an appropriate solution for towing heavy floating objects because of its ability to provide greater pulling force. The main part of the article introduces a new design of multi sailboats towing system which can tow floating objects to sail with or against wind. Repetitive experiments have been conducted at the test site equipped with motion capture system to find the best strategy to control the sails and rudder, in order to increase towing system’s pulling force and tacking success rate. Three connection modes are proposed, compared and tested. The best one is applied to the sailboats towing system and improves its performance.
|
|
13:30-15:00, Paper TuBT2-CC.7 | Add to My Program |
Non-Intrusive LiDAR Protection Module Emulating Bio-Inspired Wiping Motion for Outdoor Unmanned Vehicles |
|
Kim, Youngrae | Daegu Gyeongbuk Institute of Science and Technology (DGIST), Dae |
Lim, Seunghyun | DGIST |
Lee, Hanmin | Korea Institute of Machinery & Materials |
Kim, Seokchan | Korea Institute of Machinery & Materials |
Kim, Ji-Chul | Korea Institute of Machinery and Materials |
Yun, Dongwon | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Keywords: Mechanism Design, Range Sensing, Autonomous Vehicle Navigation
Abstract: In this paper, we have developed a protection module for Light Detection and Ranging (LiDAR) sensors used in outdoor unmanned vehicles. Bio-inspired wiping motion was figured to have more efficient and excellent wiping performance than conventional cleaning methods for LiDAR sensors. An water wiping experiment confirmed that the finger wiping motion removed 35% more water than the translational wiping motion. Also, the theoretical analysis for the existence of an optimal rotational speed at maximum wiping performance was verified to be consistent with the experiment. The LiDAR distortion experiment results demonstrated no data distortion, showing an average error of up to 0.40% for detecting obstacles even when the acrylic cover rotates. Finally, a contamination protection experiment was conducted for water, powder, soil, and mud. As a result, although there was a change in the number of pointcloud and a decrease in the intensity of the sensor data after contamination, it was validated that the number of pointclouds and average intensity of data could be restored to at least 97% and 67% after being cleaned.
|
|
13:30-15:00, Paper TuBT2-CC.8 | Add to My Program |
Lightweight Human-Friendly Robotic Arm Based on Transparent Hydrostatic Transmissions |
|
Bolignari, Marco | University of Trento |
Rizzello, Gianluca | Saarland University |
Zaccarian, Luca | LAAS-CNRS and University of Trento |
Fontana, Marco | Scuola Superiore Sant'Anna |
Keywords: Mechanism Design, Tendon/Wire Mechanism, Force Control, Compliance and Impedance Control
Abstract: We present theoretical and experimental results regarding the development and the control of a two-link robotic arm with remotized actuation via rolling diaphragm hydrostatic transmissions. We propose a dynamical model capturing the essential dynamics of the developed transmission/robot ensemble and implement a control strategy consisting of two nested loops, the inner one performing high-bandwidth joint torque regulation and the outer one producing various types of compliance responses for effective human–robot interactions. Extensive sets of experiments, testing both the low-level torque controller and the high-level compliance controller, confirm the effectiveness of the proposed hardware-software remotization architecture.
|
|
13:30-15:00, Paper TuBT2-CC.9 | Add to My Program |
OSCaR: An Origami-Inspired Shape-Changing Robot for Ground Coverage Tasks |
|
Fan, Zirui | National University of Singapore |
Zhang, Hongying | National University of Singapore |
Keywords: Mechanism Design, Wheeled Robots
Abstract: This paper introduces a novel origami-inspired shape-changing robot OSCaR. The objective is to enhance the adaptability of vehicles engaged in ground coverage tasks, such as floor cleaning. The robot exhibits two distinct configurations: it can fold itself for agile navigation through tight spaces, and unfold to cover larger areas efficiently. The folding pattern has a deploy-to-stow ratio of 3 in the width dimension, and a kinematic model is established to simulate the deployment process for the pattern. The hinge design employs rolling contact elements to mitigate collision among the panels, particularly in regions with multiple colinear crease lines. Furthermore, the design exhibits one degree of freedom and features pivots, making it easy to actuate with motors. The system design of the prototype is also presented, including its structure, an embedded hardware system, and an upper computer software. The results show that the robot has great adaptability in complex environments.
|
|
TuBT3-CC Oral Session, CC-313 |
Add to My Program |
Formal Methods in Robotics and Automation II |
|
|
Chair: O'Kane, Jason | Texas A&M University |
Co-Chair: Kan, Zhen | University of Science and Technology of China |
|
13:30-15:00, Paper TuBT3-CC.1 | Add to My Program |
Robust MITL Planning under Uncertain Navigation Times |
|
Linard, Alexis | KTH Royal Institute of Technology |
Gautier, Anna | KTH Royal Institute of Technology |
Duberg, Daniel | KTH - Royal Institute of Technology |
Tumova, Jana | KTH Royal Institute of Technology |
Keywords: Formal Methods in Robotics and Automation, Planning under Uncertainty
Abstract: In environments like offices, the duration of a robot's navigation between two locations may vary over time. For instance, reaching a kitchen may take more time during lunchtime since the corridors are crowded with people heading the same way. In this work, we address the problem of routing in such environments with tasks expressed in Metric Interval Temporal Logic (MITL) - a rich robot task specification language that allows us to capture explicit time requirements. Our objective is to find a strategy that maximizes the temporal robustness of the robot's MITL task. As the first step towards a solution, we define a Mixed-integer linear programming approach to solving the task planning problem over a Varying Weighted Transition System, where navigation durations are deterministic but vary depending on the time of day. Then, we apply this planner to optimize for MITL temporal robustness in Markov Decision Processes, where the navigation durations between physical locations are uncertain, but the time-dependent distribution over possible delays is known. Finally, we develop a receding horizon planner for Markov Decision Processes that preserves guarantees over MITL temporal robustness. We show the scalability of our planning algorithms in simulations of robotic tasks.
|
|
13:30-15:00, Paper TuBT3-CC.2 | Add to My Program |
Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning |
|
Zhang, Hao | University of Science and Technology of China |
Wang, Hao | University of Science and Technology of China |
Kan, Zhen | University of Science and Technology of China |
Keywords: Formal Methods in Robotics and Automation, Reinforcement Learning, Motion and Path Planning
Abstract: Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer- guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision processes to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in an improved representation of LTL. The simulation results demonstrate the effectiveness of the T2TL framework.
|
|
13:30-15:00, Paper TuBT3-CC.3 | Add to My Program |
Stochastic Games for Interactive Manipulation Domains |
|
Muvvala, Karan | University of Colorado Boulder |
Wells, Andrew | Rice University |
Lahijanian, Morteza | University of Colorado Boulder |
Kavraki, Lydia | Rice University |
Moshe, Vardi | Rice University |
Keywords: Formal Methods in Robotics and Automation, Task and Motion Planning
Abstract: As robots become more prevalent, the complexity of robot-robot, robot-human, and robot-environment interactions increases. In these interactions, a robot needs to consider not only the effects of its own actions, but also the effects of other agents' actions and the possible interactions between agents. Previous works have considered reactive synthesis, where the human/environment is modeled as a deterministic, adversarial agent; as well as probabilistic synthesis, where the human/environment is modeled via a Markov chain. While they provide strong theoretical frameworks, there are still many aspects of human-robot interaction that cannot be fully expressed and many assumptions that must be made in each model. In this work, we propose stochastic games as a general model for human-robot interaction, which subsumes the expressivity of all previous representations. In addition, it allows us to make fewer modeling assumptions and leads to more natural and powerful models of interaction. We introduce the semantics of this abstraction and show how existing tools can be utilized to synthesize strategies to achieve complex tasks with guarantees. Further, we discuss the current computational limitations and improve the scalability by two orders of magnitude by a new way of constructing models for PRISM-Games.
|
|
13:30-15:00, Paper TuBT3-CC.4 | Add to My Program |
Active Inference for Reactive Temporal Logic Motion Planning |
|
Chen, Ziyang | University of Science and Technology of China |
Zhou, Zhangli | University of Science and Technology of China |
Li, Lin | University of Science and Technology of China |
Kan, Zhen | University of Science and Technology of China |
Keywords: Formal Methods in Robotics and Automation, Task and Motion Planning
Abstract: Reactive planning enables the robots to deal with dynamic events in uncertain environments. However, existing methods heavily rely on the predefined hard-coded robot behaviors, e.g, a pre-coded temporal logic formula that specifies how robot should react. Little attention has been paid for autonomous generation of reactive tasks specifications during the runtime. As a first attempt towards this goal, this work develops a real-time decision-making and motion planning framework. It allows the robot to follow a global task planned offline while taking proactive decisions and generating temporal logic specifications for local reactive tasks when encountering dynamic events. Specifically, inspired by the causal knowledge graph, a proposition graph is developed, based on which the decision module encode the environment and the task as the Boolean logic and linear temporal logic (LTL), respectively. Based on the established proposition graph and perceived environment, the agent can autonomously generate an LTL formula to realize the local temporary task. A joint sampling algorithm is then developed, in which the automaton states of local and global task are jointly considered to generate a feasible planning that satisfies both global and local tasks. Experiments demonstrate the effectiveness of the proposed decision-making and motion planning.
|
|
13:30-15:00, Paper TuBT3-CC.5 | Add to My Program |
Fast Task Allocation of Heterogeneous Robots with Temporal Logic and Inter-Task Constraints |
|
Li, Lin | University of Science and Technology of China |
Chen, Ziyang | University of Science and Technology of China |
Wang, Hao | University of Science and Technology of China |
Kan, Zhen | University of Science and Technology of China |
Keywords: Formal Methods in Robotics and Automation, Task Planning, Multi-Robot Systems
Abstract: This work develops a fast task allocation framework for heterogeneous multi-robot systems subject to both temporal logic and inter-task constraints. The considered inter-task constraints include unrelated tasks, compatible tasks, and exclusive tasks. To specify such inter-task relationships, we extend conventional atomic proposition to batch atomic propositions, which gives rise to the LTLT formula. The Task Batch Planning Decision Tree (TB-PDT) is then developed, which is a variant of conventional decision tree specialized for temporal logic and inter-task constraints. The TB-PDT is built incrementally to represent the task progress and does not require sophisticated product automaton, which significantly reduces the search space. Based on TB-PDT, the search algorithm, namely Intensive Inter- task Relationship Tree Search (IIRTS), is developed for the fast task allocation of heterogeneous multi-robot systems. It is shown that the solution time of finding a satisfactory task allocation grows almost quadratically with the number of automaton states. Extensive simulation and experiment demonstrate the validity, the effectiveness, and the transferability of IIRTS.
|
|
13:30-15:00, Paper TuBT3-CC.6 | Add to My Program |
Skill Transfer for Temporal Task Specification |
|
Liu, Jason Xinyu | Brown University |
Shah, Ankit | Brown University |
Rosen, Eric | Brown University |
Jia, Mingxi | Brown University |
Konidaris, George | Brown University |
Tellex, Stefanie | Brown |
Keywords: Formal Methods in Robotics and Automation, Transfer Learning, Integrated Planning and Learning
Abstract: Deploying robots in real-world environments, such as households and manufacturing lines, requires generalization across novel task specifications without violating safety constraints. Linear temporal logic (LTL) is a widely used task specification language with a compositional grammar that naturally induces commonalities among tasks while preserving safety guarantees. However, most prior work on reinforcement learning with LTL specifications treats every new task independently, thus requiring large amounts of training data to generalize. We propose LTL-Transfer, a zero-shot transfer algorithm that composes task-agnostic skills learned during training to safely satisfy a wide variety of novel LTL task specifications. Experiments in Minecraft-inspired domains show that after training on only 50 tasks, LTL-Transfer can solve over 90% of 100 challenging unseen tasks and 100% of 300 commonly used novel tasks without violating any safety constraints. We deployed LTL-Transfer at the task-planning level of a quadruped mobile manipulator to demonstrate its zero-shot transfer ability for fetch-and-deliver and navigation tasks.
|
|
13:30-15:00, Paper TuBT3-CC.7 | Add to My Program |
High Precision Paint Deposition Modeling Considering Variable Posture of Spray Painting Robot |
|
Tanaka, Genichiro | Waseda University |
Takahashi, Yoshinobu | Waseda University |
Iwata, Hiroyasu | Waseda University |
Keywords: Foundations of Automation, Computational Geometry, Factory Automation
Abstract: This study developed a high-precision paint deposition model that considers the position and direction of a spray-painting gun. Our angle-specific paint deposition model focused on the change in paint deposition due to the change in the painting angle; however, there was a problem with its versatility. We analyzed this problem, and the solution was achieved by separately modeling changes in the film thickness distribution using impact angle and spray distance, which were previously modeled together. For higher accuracy, a special function was proposed to convert the three-dimensional vector into two-dimensional coordinate values in the distribution function upper plane. To confirm the validity of our model, a painting test on an L-shaped surface was conducted, and the measured and predicted values were compared. The L-shaped surface is a typical shape in which the film thickness distribution changes with the angle; a complex path with varying distances and angles was employed. The results confirmed that the predicted values agreed well with the measured values in the L-shaped surface painting test, validating the developed model.
|
|
13:30-15:00, Paper TuBT3-CC.8 | Add to My Program |
Verifiable Learned Behaviors Via Motion Primitive Composition: Applications to Scooping of Granular Media |
|
Benton, Andrew | Siemens |
Solowjow, Eugen | Siemens Corporation |
Akella, Prithvi | California Institute of Technology |
Keywords: Hybrid Logical/Dynamical Planning and Verification, Learning from Demonstration, Performance Evaluation and Benchmarking
Abstract: A robotic behavior model that can reliably generate behaviors from natural language inputs in real time would substantially expedite the adoption of industrial robots due to enhanced system flexibility. To facilitate these efforts, we construct a framework in which learned behaviors, created by a natural language abstractor, are verifiable by construction. Leveraging recent advancements in motion primitives and probabilistic verification, we construct a natural-language behavior abstractor that generates behaviors by synthesizing a directed graph over the provided motion primitives. If these component motion primitives are constructed according to the criteria we specify, the resulting behaviors are probabilistically verifiable. We demonstrate this verifiable behavior generation capacity in both simulation on an exploration task and on hardware with a robot scooping granular media.
|
|
13:30-15:00, Paper TuBT3-CC.9 | Add to My Program |
Knowledge Acquisition Plans: Generation, Combination, and Execution |
|
Shell, Dylan | Texas A&M University |
O'Kane, Jason | Texas A&M University |
Keywords: Reactive and Sensor-Based Planning, Formal Methods in Robotics and Automation, Planning under Uncertainty
Abstract: This paper contemplates the possibility of asking robots questions and having them use their ability to go out into the environment and probe it, in combination with what they already know of the world, to provide answers. We describe a method whereby a robot system efficiently answers such questions on the basis of reasoning about observations as they are made, interrelationships between multiple pieces of evidence, and what they imply. A central idea in the approach is to maintain a separation of concerns so that managing 'what is known' is decoupled from 'how it is learned'. This idea is realized in a graph-based representation well-suited to algorithmic manipulation and composition, exposing synergies rife for optimization. We show how to use this representation to leverage both informational overlap between multiple simultaneous queries and availability of multiple robots working in concert to answer those queries. We demonstrate these ideas in a simple case study and present data illustrating how plan quality (in terms of cost to execute) can be improved through an optimization operation that is robot agnostic.
|
|
TuBT4-CC Oral Session, CC-315 |
Add to My Program |
Multi-Robot Systems II |
|
|
Co-Chair: Sabattini, Lorenzo | University of Modena and Reggio Emilia |
|
13:30-15:00, Paper TuBT4-CC.1 | Add to My Program |
Distributed Control of a Limited Angular Field-Of-View Multi-Robot System in Communication-Denied Scenarios: A Probabilistic Approach |
|
Catellani, Mattia | University of Modena and Reggio Emilia |
Sabattini, Lorenzo | University of Modena and Reggio Emilia |
Keywords: Multi-Robot Systems, Distributed Robot Systems
Abstract: Multi-robot systems are gaining popularity over single-agent systems for their advantages. Although they have been studied in agriculture, search and rescue, surveillance, and environmental exploration, real-world implementation is limited due to agent coordination complexities caused by communication and sensor limitations. In this work, we propose a probabilistic approach to allow coordination among robots in communication-denied scenarios, where agents can only rely on visual information from a camera with a limited angular field-of-view. Our solution utilizes a particle filter to analyze uncertainty in the location of neighbors, together with Control Barrier Functions to address the exploration-exploitation dilemma that arises when robots must balance the mission goal with seeking information on undetected neighbors. This technique was tested with virtual robots required to complete a coverage mission, analyzing how the number of deployed robots affects performances and making a comparison with the ideal case of isotropic sensors and communication. Despite an increase in the amount of time required to fulfill the task, results have shown to be comparable to the ideal scenario in terms of final configuration achieved by the system.
|
|
13:30-15:00, Paper TuBT4-CC.2 | Add to My Program |
Assessing Reputation to Improve Team Performance in Heterogeneous Multi-Robot Coverage |
|
Coffey, Mela | Boston University |
Pierson, Alyssa | Boston University |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Cooperating Robots
Abstract: When agents in a multi-robot team have limited knowledge about their relative performance, their teammates, or the environment, robots must observe individual performance variations and adapt accordingly. We propose robot reputation to assess the historical performance of agents and make future adaptations in a persistent coverage task. We consider a heterogeneous multi-robot team, where robots are equipped with different capabilities to serve discrete events in an environment. We utilize a heterogeneous coverage control approach to partition the space according to robot capabilities and the estimated probability density, such that the robot is responsible for serving the events in its assigned region. As the team serves events, we assign each robot a reputation, which is then used to adjust the size of a robot's region, thus adjusting the amount of space a robot is responsible for serving. Our simulations show that using reputation to weigh the size of the Voronoi cells outperforms the case where we neglect reputation.
|
|
13:30-15:00, Paper TuBT4-CC.3 | Add to My Program |
A Robot Web for Distributed Many-Device Localisation |
|
Murai, Riku | Imperial College London |
Ortiz, Joseph | Meta |
Saeedi, Sajad | Toronto Metropolitan University |
Kelly, Paul H J | Imperial College London |
Davison, Andrew J | Imperial College London |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Localization, Distributed Optimization
Abstract: We show that a distributed network of robots or other devices which make measurements of each other can collaborate to globally localise via efficient ad-hoc peer-to-peer communication. Our Robot Web solution is based on Gaussian Belief Propagation on the fundamental non-linear factor graph describing the probabilistic structure of all of the observations robots make internally or of each other, and is flexible for any type of robot, motion or sensor. We define a simple and efficient communication protocol which can be implemented by the publishing and reading of web pages or other asynchronous communication technologies. We show in simulations with up to 1000 robots interacting in arbitrary patterns that our solution convergently achieves global accuracy as accurate as a centralised non-linear factor graph solver while operating with high distributed efficiency of computation and communication. Via the use of robust factors in GBP, our method is tolerant to a high percentage of faulty sensor measurements or dropped communication packets. Furthermore, we showcase that the system operates on real robots with limited onboard computational resources.
|
|
13:30-15:00, Paper TuBT4-CC.4 | Add to My Program |
Learning Decentralized Flocking Controllers with Spatio-Temporal Graph Neural Network |
|
Chen, Siji | Virginia Tech |
Sun, Yanshen | Virginia Tech |
Li, Peihan | Drexel University |
Zhou, Lifeng | Drexel University |
Lu, Chang-Tien | Virginia Tech |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Swarm Robotics
Abstract: Recently a line of researches has delved the use of graph neural networks (GNNs) for decentralized control in swarm robotics. However, it has been observed that relying solely on the states of immediate neighbors is insufficient to imitate a centralized control policy. To address this limitation, prior studies proposed incorporating L-hop delayed states into the computation. While this approach shows promise, it can lead to a lack of consensus among distant flock members and the formation of small clusters, consequently resulting in the failure of cohesive flocking behaviors. Instead, our approach leverages spatiotemporal GNN, named STGNN that encompasses both spatial and temporal expansions. The spatial expansion collects delayed states from distant neighbors, while the temporal expansion incorporates previous states from immediate neighbors. The broader and more comprehensive information gathered from both expansions results in more effective and accurate predictions. We develop an expert algorithm for controlling a swarm of robots and employ imitation learning to train our decentralized STGNN model based on the expert algorithm. We simulate the proposed STGNN approach in various settings, demonstrating its decentralized capacity to emulate the global expert algorithm. Further, we implemented our approach to achieve cohesive flocking, leader following and obstacle avoidance by a group of Crazyflie drones. The performance of STGNN underscores its potential as an effective and reliable approach for achieving cohesive flocking, leader following and obstacle avoidance tasks.
|
|
13:30-15:00, Paper TuBT4-CC.5 | Add to My Program |
Simultaneous Time Synchronization and Mutual Localization for Multi-Robot System |
|
Wen, Xiangyong | Zhejiang University |
Wang, Yingjian | Zhejiang University |
Zheng, Xi | The Hong Kong Polytechnic University |
Wang, Kaiwei | Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Fei | Zhejiang University |
Keywords: Multi-Robot Systems, Localization, Swarm Robotics
Abstract: Mutual localization stands as a foundational component within various domains of multi-robot systems. Nevertheless, in relative pose estimation, time synchronization is usually underappreciated and rarely addressed, although it significantly influences the accuracy of estimation. In this paper, we introduce time synchronization into mutual localization, to recover the time offset and relative poses between robots simultaneously. Under a constant velocity assumption in a short time, we fuse time offset estimation with our previous bearing-based mutual localization by a novel error representation. Based on the error model, we formulate a joint optimization problem and utilize semi-definite relaxation (SDR) to furnish a lossless relaxation. By solving the relaxed problem, time synchronization and relative pose estimation can be achieved when time drift between robots is limited. To enhance the application range of time offset estimation, we further propose an iterative method to recover the time offset from coarse to fine. Comparisons between the proposed method and the existing ones through extensive simulation tests present prominent benefits of time synchronization on mutual localization. Moreover, real-world experiments are conducted to show the practicality and robustness.
|
|
13:30-15:00, Paper TuBT4-CC.6 | Add to My Program |
Enabling Large-Scale Heterogeneous Collaboration with Opportunistic Communications |
|
Cladera, Fernando | University of Pennsylvania |
Ravichandran, Zachary | University of Pennsylvania |
Miller, Ian | University of Pennsylvania |
Hsieh, M. Ani | University of Pennsylvania |
Taylor, Camillo Jose | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Multi-Robot Systems, Field Robots, Networked Robots
Abstract: Multi-robot collaboration in large-scale environments with limited-sized teams and without external infrastructure is challenging, since the software framework required to support complex tasks must be robust to unreliable and intermittent communication links. In this work, we present MOCHA (Multi-robot Opportunistic Communication for Heterogeneous Collaboration), a framework for resilient multi-robot collaboration that enables large-scale exploration in the absence of continuous communications. MOCHA is based on a gossip communication protocol that allows robots to interact opportunistically whenever communication links are available, propagating information on a peer-to-peer basis. We demonstrate the performance of MOCHA through real-world experiments with commercial-off-the-shelf (COTS) communication hardware. We further explore the system’s scalability in simulation, evaluating the performance of our approach as the number of robots increase and communication ranges vary. Finally, we demonstrate how MOCHA can be tightly integrated with the planning stack of autonomous robots. We show a communication-aware planning algorithm for a high-altitude aerial robot executing a collaborative task while maximizing the amount of information shared with ground robots.
|
|
13:30-15:00, Paper TuBT4-CC.7 | Add to My Program |
AG-CVG: Coverage Planning with a Mobile Recharging UGV and an Energy-Constrained UAV |
|
Karapetyan, Nare | Woods Hole Oceanographic Institution |
Asghar, Ahmad Bilal | University of Maryland |
Bhaskar, Amisha | University of Maryland, College Park |
Shi, Guangyao | University of Southern California |
Manocha, Dinesh | University of Maryland |
Tokekar, Pratap | University of Maryland |
Keywords: Multi-Robot Systems, Field Robots, Task and Motion Planning
Abstract: In this paper, we present an approach for coverage path planning for a team of an energy-constrained Unmanned Aerial Vehicle (UAV) and an Unmanned Ground Vehicle (UGV). Both the UAV and the UGV have predefined areas that they have to cover. The goal is to perform complete coverage by both robots while minimizing the coverage time. The UGV can also serve as a mobile recharging station. The UAV and UGV need to occasionally rendezvous for recharging. We propose a heuristic method to address this NP-Hard planning problem. Our approach involves initially determining coverage paths without factoring in energy constraints. Subsequently, we cluster segments of these paths and employ graph matching to assign UAV clusters to UGV clusters for efficient recharging management. We perform numerical analysis on real-world coverage applications and show that compared with a greedy approach our method reduces rendezvous overhead on average by 11.33%. We demonstrate proof-of-concept with a team of a VOXL m500 drone and a Clearpath Jackal ground vehicle, providing a complete system from the offline algorithm to the field execution.
|
|
13:30-15:00, Paper TuBT4-CC.8 | Add to My Program |
A Non-Cubic Space-Filling Modular Robot |
|
Hummer, Tyler | Northwestern University |
Kriegman, Sam | Northwestern University |
Keywords: Product Design, Development and Prototyping, Cellular and Modular Robots
Abstract: Space-filling building blocks of diverse shape permeate nature at all levels of organization, from atoms to honeycombs, and have proven useful in artificial systems, from molecular containers to clay bricks. But, despite the wide variety of space-filling polyhedra known to mathematics, only the cube has been explored in robotics. Thus, here we roboticize a non-cubic space-filling shape: the rhombic dodecahedron. This geometry offers an appealing alternative to cubes as it greatly simplifies rotational motion of one cell about the edge of another, and increases the number of neighbors each cell can communicate with and hold on to. To better understand the challenges and opportunities of these and other space-filling machines, we manufactured 48 rhombic dodecahedral cells and used them to build various superstructures. We report locomotive ability of some of the structures we built, and discuss the dis/advantages of the different designs we tested. We also introduce a strategy for genderless passive docking of cells that generalizes to any polyhedra with radially symmetrical faces. Future work will allow the cells to freely roll/rotate about one another so that they may realize the full potential of their unique shape.
|
|
13:30-15:00, Paper TuBT4-CC.9 | Add to My Program |
Optimal Containment Control of Multiple Quadrotors Via Reinforcement Learning |
|
Cheng, Ming | Beihang University |
Liu, Hao | Beihang University |
Liu, Deyuan | Beihang University |
Gu, Haibo | Beihang University |
Wang, Xiangke | National University of Defense Technology |
Keywords: Multi-Robot Systems, Networked Robots, Reinforcement Learning
Abstract: This paper explores the optimal containment control problem for nonlinear and underactuated quadrotors with multiple team leaders governed by nonlinear dynamics, employing the reinforcement learning. A cascade controller is formulated, comprising a position control component to ensure containment achievement and an attitude control component to govern rotational channel. The proposed optimal control protocols derived from historical data collected from quadrotor systems without requirement for exact knowledge of vehicle dynamics. The simulation illustrates the effectiveness of the proposed controller in managing a quadrotor team with multiple leaders.
|
|
TuBT5-CC Oral Session, CC-411 |
Add to My Program |
Vision Systems |
|
|
Chair: Oishi, Takeshi | The University of Tokyo |
Co-Chair: Ciarfuglia, Thomas Alessandro | Sapienza University of Rome |
|
13:30-15:00, Paper TuBT5-CC.1 | Add to My Program |
Ensemble Latent Space Roadmap for Improved Robustness in Visual Action Planning |
|
Lippi, Martina | University of Roma Tre |
Welle, Michael C. | KTH Royal Institute of Technology |
Gasparri, Andrea | Università Degli Studi Roma Tre |
Kragic, Danica | KTH |
Keywords: Visual Learning, Task Planning, AI-Based Methods
Abstract: Planning in learned latent spaces helps to decrease the dimensionality of raw observations. In this work, we propose to leverage the ensemble paradigm to enhance the robustness of latent planning systems. We rely on our Latent Space Roadmap (LSR) framework, which builds a graph in a learned structured latent space to perform planning. Given multiple LSR framework instances, that differ either on their latent spaces or on the parameters for constructing the graph, we use the action information as well as the embedded nodes of the produced plans to define similarity measures. These are then utilized to select the most promising plans. We validate the performance of our Ensemble LSR (ENS-LSR) on simulated box stacking and grape harvesting tasks as well as on a real-world robotic T-shirt folding experiment.
|
|
13:30-15:00, Paper TuBT5-CC.2 | Add to My Program |
Direct 3D Model-Based Object Tracking with Event Camera by Motion Interpolation |
|
Kang, Yufan | The University of Tokyo |
Caron, Guillaume | CNRS |
Ishikawa, Ryoichi | The University of Tokyo |
Escande, Adrien | INRIA |
Chappellet, Kevin | CNRS |
Sagawa, Ryusuke | National Institute of Advanced Industrial Science AndTechnology |
Oishi, Takeshi | The University of Tokyo |
Keywords: Visual Tracking
Abstract: Event cameras are recent sensors that measure intensity changes in each pixel asynchronously. It is being used due to lower latency and higher temporal resolution compared to traditional frame-based camera. We propose a method of 3D model-based object tracking directly from events captured by event camera. To enable reliable and accurate tracking of objects, we use a new event representation and predict brightness increment images with motion interpolation. Results of object tracking show the new methods significantly improves tracking duration and robustness, both for perspective and fisheye cameras. Our implementation succeeds in tracking objects when the camera speed is reaching 2 m/s.
|
|
13:30-15:00, Paper TuBT5-CC.3 | Add to My Program |
Using Specularities to Boost Non-Rigid Structure-From-Motion |
|
Sengupta, Agniva | Institut Pascal |
Karim, Makki | UCA |
Bartoli, Adrien | UCA |
Keywords: Visual Tracking, Computer Vision for Medical Robotics
Abstract: Non-rigid structure-from-motion reconstructs the time-varying 3D shape of a deforming object from 2D point correspondences in monocular images. Despite promising use-cases such as the grasping of deformable objects and visual navigation in a non-rigid environment, NRSfM has had limited applications in robotics due to a lack of sufficient accuracy. To remedy this, we propose a new method which boosts the accuracy of NRSfM using sparse surface normals. Surface normal information is available from many sources, including structured lighting, homography decomposition of infinitesimal planes and shape priors. However, these sources are not always available. We thus propose a widely available new source of surface normals: the specularities. Our first technical contribution is a method which detects specular highlights and reconstructs the surface normals from it. It assumes that the light source is approximately localised, which is widely applicable in robotics applications such as endoscopy. Our second technical contribution is an NRSfM method which exploits a sparse surface normal set. For that, we propose a novel convex formulation and a globally optimal solution method. Experiments on photo-realistic synthetic data and real household and medical data show that the proposed method outperforms existing NRSfM methods.
|
|
13:30-15:00, Paper TuBT5-CC.4 | Add to My Program |
Tracking Snake-Like Robots in the Wild Using Only a Single Camera |
|
Lu, Jingpei | University of California San Diego |
Richter, Florian | University of California, San Diego |
Lin, Shan | University of California, San Diego |
Yip, Michael C. | University of California, San Diego |
Keywords: Visual Tracking, Localization, Field Robots
Abstract: Robot navigation within complex environments requires precise state estimation and localization to ensure robust and safe operations. For ambulating mobile robots like robot snakes, traditional methods for sensing require multiple embedded sensors or markers, leading to increased complexity, cost, and increased points of failure. Alternatively, deploying an external camera in the environment is very easy to do, and marker-less state estimation of the robot from this camera's images is an ideal solution: both simple and cost-effective. However, the challenge in this process is in tracking the robot under larger environments where the cameras may be moved around without extrinsic calibration, or maybe when in motion (e.g., a drone following the robot). The scenario itself presents a complex challenge: single-image reconstruction of robot poses under noisy observations. In this paper, we address the problem of tracking ambulatory mobile robots from a single camera. The method combines differentiable rendering with the Kalman filter. This synergy allows for simultaneous estimation of the robot's joint angle and pose while also providing state uncertainty which could be used later on for robust control. We demonstrate the efficacy of our approach on a snake-like robot in both stationary and non-stationary (moving) cameras, validating its performance in both structured and unstructured scenarios. The results achieved show an average error of 0.05 m in localizing the robot's base position and 6 degrees in joint state estimation. We believe this novel technique opens up possibilities for enhanced robot mobility and navigation in future exploratory and search-and-rescue missions.
|
|
13:30-15:00, Paper TuBT5-CC.5 | Add to My Program |
Multi-Object Tracking by Hierarchical Visual Representations |
|
Cao, Jinkun | Carnegie Mellon University |
Pang, Jiangmiao | Shanghai AI Laboratory |
Kitani, Kris | Carnegie Mellon University |
Keywords: Visual Tracking, Recognition, Deep Learning for Visual Perception
Abstract: We propose a new visual hierarchical representation paradigm for multi-object tracking. It is more effective to discriminate between objects by attending to objects’ compositional visual regions and contrasting with the background contextual information instead of sticking to only the semantic visual cue such as bounding boxes. This compositional-semantic-contextual hierarchy is flexible to be integrated in different appearance-based multi-object tracking methods. We also propose an attention-based visual feature module to fuse the hierarchical visual representations. The proposed method achieves state-of-the-art accuracy and time efficiency among query-based methods on multiple multi-object tracking benchmarks.
|
|
13:30-15:00, Paper TuBT5-CC.6 | Add to My Program |
AgriSORT: A Simple Online Real-Time Tracking-By-Detection Framework for Robotics in Precision Agriculture |
|
Saraceni, Leonardo | Sapienza University of Rome |
Motoi, Ionut Marian | Sapienza University of Rome |
Nardi, Daniele | Sapienza University of Rome |
Ciarfuglia, Thomas Alessandro | Sapienza University of Rome |
Keywords: Visual Tracking, Robotics and Automation in Agriculture and Forestry, Agricultural Automation
Abstract: The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT, we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons. Both the code and the dataset are available for future comparisons at: https://github.com/Lio320/AgriSORT
|
|
13:30-15:00, Paper TuBT5-CC.7 | Add to My Program |
Tightly Coupled Visual-Inertial-UWB Indoor Localization System with Multiple Position-Unknown Anchors |
|
Hu, Chao | Harbin Engineering University |
Huang, Ping | Harbin Engineering University |
Wang, Wei | Harbin Engineering University |
Keywords: Visual-Inertial SLAM, Sensor Fusion, Localization
Abstract: In this letter, we perform a tightly-coupled fusion of a monocular camera, a 6-DoF IMU, and multiple position-unknown Ultra-wideband (UWB) anchors to construct an indoor localization system with both accuracy and robustness. Prior to this, there have been several works that have achieved satisfactory results by fusing UWB ranging measurements with visual-inertial system. However, these approaches still have some limitations: 1) these approaches either require the UWB anchor position to be calibrated in advance or the UWB anchor position estimation method used is not robust enough; 2) these approaches do not allow for dynamic changes to the number of UWB anchors in a tightly coupled estimator. Our approach uses visual object detection algorithm to provide UWB anchor initial position and refine it in the factor graph, using chi-square test algorithm to identify UWB ranging outliers. Based on the above two ideas, we implement a tightly coupled estimator that dynamically adjusts the number of UWB anchors, i.e. adding them to the factor graph when their ranging measurements are available and discarding them when their ranging measurements are outliers. These ideas improve the efficiency and robustness of the fusion about UWB ranging measurements with the visual-inertial system, as well as the easy setup of UWB anchors. Experimental results show that the proposed method outperforms previous methods in terms of estimating anchor position and improving localization accuracy.
|
|
13:30-15:00, Paper TuBT5-CC.8 | Add to My Program |
Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints |
|
Wang, Weihan | Stevens Institute of Technology |
Chou, Chieh | InnoPeak Technology |
Sevagamoorthy, Ganesh | OPPO US Research Center |
Chen, Kevin | University of Michigan |
Chen, Zheng | Indiana University Bloomington |
Feng, Ziyue | Clemson University |
Xia, Youjie | OPPO US Research Center |
Cai, Feiyang | Stony Brook University |
Xu, Yi | OPPO US Research Center |
Mordohai, Philippos | Stevens Institute of Technology |
Keywords: Visual-Inertial SLAM, Sensor Fusion, SLAM
Abstract: We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of precise gyroscope bias estimation on rotation accuracy. This, in turn, affects trajectory accuracy due to the accumulation of translation errors. To address this, we first independently estimate the gyroscope bias and use it to formulate a maximum a posteriori problem for further refinement. After this refinement, we proceed to update the rotation estimation by performing IMU integration with gyroscope bias removed from gyroscope measurements. We then leverage robust and accurate rotation estimates to enhance translation estimation via 3-DoF bundle adjustment. Moreover, we introduce a novel approach for determining the success of the initialization by evaluating the residual of the normal epipolar constraint. Extensive evaluations on the EuRoC dataset illustrate that our method excels in accuracy and robustness. It outperforms ORB-SLAM3, the current leading stereo visual-inertial initialization method, in terms of absolute trajectory error and relative rotation error, while maintaining competitive computational speed. Notably, even with 5 keyframes for initialization, our method consistently surpasses the state-of-the-art approach using 10 keyframes in rotation accuracy.
|
|
13:30-15:00, Paper TuBT5-CC.9 | Add to My Program |
Nvblox: GPU-Accelerated Incremental Signed Distance Field Mapping |
|
Millane, Alexander | NVIDIA |
Oleynikova, Helen | ETH Zurich |
Wirbel, Emilie | Valeo |
Steiner, Remo | ETH Zurich |
Ramasamy, Vikram | Nvidia |
Tingdahl, David | Nvidia |
Siegwart, Roland | ETH Zurich |
Keywords: Mapping, RGB-D Perception, Vision-Based Navigation
Abstract: Dense, volumetric maps are essential to enable robot navigation and interaction with the environment. To achieve low latency, dense maps are typically computed on-board the robot, often on computationally constrained hardware. Previous works leave a gap between CPU-based systems for robotic mapping which, due to computation constraints, limit map resolution or scale, and GPU-based reconstruction systems which omit features that are critical to robotic path planning, such as computation of the Euclidean Signed Distance Field (ESDF). We introduce a library, nvblox, that aims to fill this gap, by GPU-accelerating robotic volumetric mapping. Nvblox delivers a significant performance improvement over the state of the art, achieving up to a 177× speed-up in surface reconstruction, and up to a 31× improvement in distance field computation, and is available open-source.
|
|
TuBT6-CC Oral Session, CC-414 |
Add to My Program |
RGB-D Sensing and Perception I |
|
|
Chair: Navab, Nassir | TU Munich |
Co-Chair: Xiang, Yu | University of Texas at Dallas |
|
13:30-15:00, Paper TuBT6-CC.1 | Add to My Program |
Multi-Resolution Planar Region Extraction for Uneven Terrains |
|
Sun, Yinghan | Southern University of Science and Technology |
Zheng, Linfang | University of Birmingham, Southern University of Science and Tec |
Chen, Hua | Zhejiang University |
Zhang, Wei | Southern University of Science and Technology |
Keywords: RGB-D Perception, Computer Vision for Automation
Abstract: This paper studies the problem of extracting planar regions in uneven terrains from unordered point cloud measurements. Such a problem is critical in various robotic applications such as robotic perceptive locomotion. While ex- isting approaches have shown promising results in effectively extracting planar regions from the environment, they often suffer from issues such as low computational efficiency or loss of resolution. To address these issues, we propose a multi-resolution planar region extraction strategy in this paper that balances the accuracy in boundaries and computational efficiency. Our method begins with a pointwise classification preprocessing module, which categorizes all sampled points according to their local geometric properties to facilitate multi- resolution segmentation. Subsequently, we arrange the catego- rized points using an octree, followed by an in-depth analysis of nodes to finish multi-resolution plane segmentation. The efficiency and robustness of the proposed approach are verified via synthetic and real-world experiments, demonstrating our method’s ability to generalize effectively across various uneven terrains while maintaining real-time performance, achieving frame rates exceeding 35 FPS.
|
|
13:30-15:00, Paper TuBT6-CC.2 | Add to My Program |
RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction |
|
Kasahara, Isaac | Samsung Research America |
Agrawal, Shubham | Samsung Research America |
Engin, Kazim Selim | Samsung Research America |
Chavan-Dafle, Nikhil | Samsung Research America |
Song, Shuran | Columbia University |
Isler, Volkan | University of Minnesota |
Keywords: RGB-D Perception, Deep Learning for Visual Perception
Abstract: General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction task challenging. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting. Specifically, we leverage the generalization capability of large visual language models (Dalle-2) to inpaint the missing areas of scene color images rendered from different views. Next, we lift these inpainted images to 3D by predicting normals of the inpainted image and solving for the missing depth values. By predicting for normals instead of depth directly, our method allows for robustness to changes in depth distributions and scale. With rigorous quantitative evaluation, we show that our method outperforms multiple baselines while providing generalization to novel objects and scenes. Code and data is available at https://samsunglabs.github.io/RIC-project-page/.
|
|
13:30-15:00, Paper TuBT6-CC.3 | Add to My Program |
Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction |
|
Yang, Yuhao | Chongqing University of Technology |
Wu, Jun | Zhejiang University |
Wang, Yue | Zhejiang University |
Zhang Guangjian, Mr | Chongqing University of Technology |
Xiong, Rong | Zhejiang University |
Keywords: RGB-D Perception, Deep Learning for Visual Perception
Abstract: To address the problem that traditional geometric registration based estimation methods only exploit the CAD model implicitly, which leads to their dependence on observation quality and deficiency to occlusion, this paper proposes a bidirectional correspondence prediction network with point-wise attention aware mechanism that not only requires the model points to predict the correspondence, but also explicitly models the geometric similarities between observations and the model prior. Our key insight is that the correlations between each model point and scene point provide essential information for learning point-pair matches. To further tackle the correlation noises brought by feature distribution divergence, we design a simple but effective pseudo-siamese network to improve feature homogeneity. Experimental results on the public datasets of LineMOD, YCB-Video, and Occ-LineMOD show that the proposed method achieves better performance than other state-of-the-art methods under the same evaluation criteria. Its robustness in estimating poses is greatly improved, especially in an environment with severe occlusions.
|
|
13:30-15:00, Paper TuBT6-CC.4 | Add to My Program |
Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion |
|
Li, Ang | Shanghai Jiao Tong University |
Hu, Anning | Shanghai Jiao Tong University |
Xi, Wei | Midea |
Zou, Danping | Shanghai Jiao Ton University |
Yu, Wenxian | Shanghai Jiao Tong University |
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: Accurate and dense depth estimation with stereo cameras and LiDAR is an important task for automatic driving and robotic perception. While sparse hints from LiDAR points have improved cost aggregation in stereo matching, their effectiveness is limited by the low density and non-uniform distribution. To address this issue, we propose a novel stereo-LiDAR depth estimation network. Our network includes a deformable propagation module for generating a semi-dense hint map and a confidence map by propagating sparse hints using a learned deformable window. These maps then guide cost aggregation in stereo matching. To reduce the triangulation error in depth recovery from disparity, especially in distant regions, we introduce a disparity-depth conversion module. Our method is both accurate and efficient. The experimental results on benchmark tests show its superior performance.
|
|
13:30-15:00, Paper TuBT6-CC.5 | Add to My Program |
Leveraging Cycle-Consistent Anchor Points for Self-Supervised RGB-D Registration |
|
Tourani, Siddharth | IIIT Hyderabad |
Gurram, Jayaram | International Institute of Information Technology, Hyderabad |
Thakur, Sarvesh | IIIT HYDERABAD |
Khan, Muhammad Haris | Mohamed Bin Zayed University of Artificial Intelligence |
Krishna, Madhava | IIIT Hyderabad |
Narapureddy, Dinesh Reddy | Amazon |
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Deep Learning Methods
Abstract: With the rise in consumer depth cameras, a wealth of unlabeled RGB-D data has become available. This prompts the question of how to utilize this data for geometric reasoning of scenes. While many RGB-D registration methods rely on geometric and feature-based similarity, we take a different approach. We use cycle-consistent keypoints as salient points to enforce spatial coherence constraints during matching, improving correspondence accuracy. Additionally, we introduce a novel pose block that combines a GRU recurrent unit with transformation synchronization, blending historical and multi-view data. Our approach surpasses previous self-supervised registration methods on ScanNet and 3DMatch, even outperforming some older supervised methods. We also integrate our components into existing methods, showing their effectiveness.
|
|
13:30-15:00, Paper TuBT6-CC.6 | Add to My Program |
MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats |
|
Yuan, Shenghai | Nanyang Technological University |
Yang, Yizhuo | Nangyang Technological Univercity |
Nguyen, Thien Hoang | University of Sydney |
Nguyen, Thien-Minh | Nanyang Technological University |
Yang, Jianfei | Nanyang Technological University |
Liu, Fen | Guangdong University of Technology |
Li, Jianping | Nanyang Technological University |
Wang, Han | Nanyang Technological University, Singapore |
Xie, Lihua | NanyangTechnological University |
Keywords: Sensor Fusion, Data Sets for Robotic Vision, Representation Learning
Abstract: Introducing MMAUD: a Multi-Modal Anti-UAV Dataset, developed in response to the evolving challenges posed by small unmanned aerial vehicles (UAVs). These UAVs have the potential to transport harmful payloads or independently cause damage, necessitating comprehensive exploration of countermeasures. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on aerial detection, UAV-type classification, and trajectory estimation—a perspective often overlooked but laden with substantial risks. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique aerial perspective vital for addressing real-world scenarios with higher fidelity compared to datasets reliant on specific vantage points. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement UAV threat assessments. MMAUD's dataset collection process follows a methodical strategy, selecting industrial sites characterized by ambient machinery noise to mirror real-world scenarios. This approach enhances the dataset's applicability, capturing nuanced challenges faced during proximate vehicular operations. MMAUD emerges as an indispensable resource for scholarly investigation and practical research, facilitated by meticulous methodologies. It plays a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Explore the dataset, codes, and designs at https://github.com/ntu-aris/MMAUD.
|
|
13:30-15:00, Paper TuBT6-CC.7 | Add to My Program |
SupeRGB-D: Zero-Shot Instance Segmentation in Cluttered Indoor Environments |
|
Örnek, Evin Pinar | TU Munich |
Krishnan, Aravindhan | Amazon Lab126 |
Gayaka, Shreekant | Amazon |
Kuo, Cheng-Hao | Amazon |
Sen, Arnab | Amazon |
Navab, Nassir | TU Munich |
Tombari, Federico | Technische Universität München |
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: Object instance segmentation is a key challenge for indoor robots navigating cluttered environments with many small objects. Limitations in 3D sensing capabilities often make it difficult to detect every possible object. While deep learning approaches may be effective for this problem, manually annotating 3D data for supervised learning is time-consuming. In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the objectness of pixels and generalize to unseen object categories in cluttered indoor environments. Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. SupeRGB-D outperforms existing baselines on unseen objects while achieving similar performance on seen objects. We further show competitive results on the real dataset OCID. With its lightweight design (0.4 MB memory requirement), our method is extremely suitable for mobile and robotic applications. Additional DINO features can increase the performance with a higher memory requirement. The dataset split and code is available under www.github.com/evinpinar/supergb-d.
|
|
13:30-15:00, Paper TuBT6-CC.8 | Add to My Program |
Mean Shift Mask Transformer for Unseen Object Instance Segmentation |
|
Lu, Yangxiao | The University of Texas at Dallas |
Chen, Yuqiao | University of Texas at Dallas |
Ruozzi, Nicholas | The University of Texas at Dallas |
Xiang, Yu | University of Texas at Dallas |
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Recognition
Abstract: Segmenting unseen objects from images is a critical perception skill that a robot needs to acquire. In robot manipulation, it can facilitate a robot to grasp and manipulate unseen objects. Mean shift clustering is a widely used method for image segmentation tasks. However, the traditional mean shift clustering algorithm is not differentiable, making it difficult to integrate it into an end-to-end neural network training framework. In this work, we propose the Mean Shift Mask Transformer (MSMFormer), a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm, allowing for the joint training and inference of both the feature extractor and the clustering. Its central component is a hypersphere attention mechanism, which updates object queries on a hypersphere. To illustrate the effectiveness of our method, we apply MSMFormer to unseen object instance segmentation. Our experiments show that MSMFormer achieves competitive performance compared to state-of-the-art methods for unseen object instance segmentation.
|
|
13:30-15:00, Paper TuBT6-CC.9 | Add to My Program |
SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion Using a 3D Recurrent U-Net |
|
Cao, Helin | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Visual Learning
Abstract: We introduce SLCF-Net, a novel approach for the Semantic Scene Completion (SSC) task that sequentially fuses LiDAR and camera data. It jointly estimates missing geometry and semantics in a scene from sequences of RGB images and sparse LiDAR measurements. The images are semantically segmented by a pre-trained 2D U-Net and a dense depth prior is estimated from a depth-conditioned pipeline fueled by Depth Anything. To associate the 2D image features with the 3D scene volume, we introduce Gaussian-decay Depth-prior Projection (GDP). This module projects the 2D features into the 3D volume along the line of sight with a Gaussian-decay function, centered around the depth prior. Volumetric semantics is computed by a 3D U-Net. We propagate the hidden 3D U-Net state using the sensor motion and design a novel loss to ensure temporal consistency. We evaluate our approach on the SemanticKITTI dataset and compare it with leading SSC approaches. The SLCF-Net excels in all SSC metrics and shows great temporal consistency.
|
|
TuBT7-CC Oral Session, CC-416 |
Add to My Program |
Imitation Learning |
|
|
Chair: Johns, Edward | Imperial College London |
Co-Chair: Bıyık, Erdem | University of Southern California |
|
13:30-15:00, Paper TuBT7-CC.1 | Add to My Program |
Overparametrization Helps Offline-To-Online Generalization of Closed-Loop Control from Pixels |
|
Lechner, Mathias | Massachusetts Institute of Technology |
Hasani, Ramin | Massachusetts Institute of Technology (MIT) |
Amini, Alexander | Massachusetts Institute of Technology |
Wang, Tsun-Hsuan | Massachusetts Institute of Technology |
Henzinger, Thomas | IST Austria |
Rus, Daniela | MIT |
Keywords: Imitation Learning, Deep Learning Methods, Representation Learning
Abstract: There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control from visual observations. These advanced deep models, ranging from convolutional to vision transformers, from small to gigantic networks, have been extensively tested on offline image classification tasks. In this paper, we study these vision models with respect to the open-loop training to closed-loop generalization abilities, i.e., deployment realizes a causal feedback loop that is not present during training. This causality gap typically emerges in robotics applications such as autonomous driving, where a network is trained to imitate the control commands of a human. In this setting, two situations arise: 1) Closed-loop testing in-distribution, where the test environment shares properties with those of offline training data. 2) Closed-loop testing under distribution shifts and out-of-distribution. Contrary to recently reported results, we show that emph{under proper training guidelines}, all vision architectures perform indistinguishably well on in-distribution deployment, resolving the causality gap. In situation 2, We observe that scale is the strongest factor in improving closed-loop generalization regardless of the choice of the model architecture. Our results predict the trend that in the future we will see larger and larger models being used in offline-training-online-deployment imitation learning tasks in robotic applications.
|
|
13:30-15:00, Paper TuBT7-CC.2 | Add to My Program |
Hierarchical Human-To-Robot Imitation Learning for Long-Horizon Tasks Via Cross-Domain Skill Alignment |
|
Lin, Zhenyang | University of Chinese Academy of Sciences |
Chen, Yurou | Chinese Academy of Sciences |
Liu, Zhiyong | Institute of Automation Chinese Academy of Sciences |
Keywords: Imitation Learning, Deep Learning Methods, Representation Learning
Abstract: For a general-purpose robot, it is desirable to imitate human demonstration videos that can effectively solve long-horizon tasks and perform novel ones. Recent advances in skill-based imitation learning have shown that extracting skill embedding from raw human videos is a promising paradigm to enable robots to cope with long-horizon tasks. However, generalization to unseen tasks in a different domain with a human prompt video poses a significant challenge due to the big embodiment and environment difference. To this end, we present Hierarchical Human-to-Robot Imitation Learning (H2RIL) that learns the mapping of cross-domain sensorimotor skills and utilizes it to generalize to unseen tasks given a human video in a different environment. To allow for generalizing zero-shot across environments and embodiments, H2RIL leverages task-agnostic play data for low-level policy training and paired human-robot data for both semantic and temporal skill embedding alignment. Extensive experiments in a simulated kitchen environment demonstrate that H2RIL significantly outperforms other prior baselines and is capable of generalizing to composable new tasks and adapting to Out-of-Distribution (OOD) tasks.
|
|
13:30-15:00, Paper TuBT7-CC.3 | Add to My Program |
Policy Optimization by Looking Ahead for Model-Based Offline RL |
|
Liu, Yang | The University of Hong Kong |
Hofert, Marius | The University of Hong Kong |
Keywords: Reinforcement Learning, Deep Learning Methods, Planning under Uncertainty
Abstract: Offline reinforcement learning (RL) aims to optimize the policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data, then guide the update of policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL and the greedy strategy of existing methods, we propose an approach of policy optimization by looking ahead (POLA) to mitigate the gap. Concretely, we optimize the policy on both current and future states where the future states are predicted by a transition model. A trajectory contains numerous actions before the task is done. Performing the best action at each time does not mean an optimal trajectory in the end. We need to allow sub-optimal or negative actions occasionally. But existing methods focus on generating the optimal action at each time according to the maximizing Q-value principle. This motivates our looking ahead approach. Besides, hidden confounding factors may affect the decision making process. To that end, we incorporate the correlations among dimensions of the state into the policy, providing more information about the environment for the policy to make decisions. Empirical results on the Mujoco dataset show the effectiveness of the proposed approach.
|
|
13:30-15:00, Paper TuBT7-CC.4 | Add to My Program |
DINOBot: Robot Manipulation Via Retrieval and Alignment with Vision Foundation Models |
|
Di Palo, Norman | Imperial College London |
Johns, Edward | Imperial College London |
Keywords: Imitation Learning, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation
Abstract: We propose DINOBot, a novel imitation learning framework for robot manipulation, which leverages the image-level and pixel-level capabilities of features extracted from Vision Transformers trained with DINO. When interacting with a novel object, DINOBot first uses these features to retrieve the most visually similar object experienced during human demonstrations, and then uses this object to align its end-effector with the novel object to enable effective interaction. Through a series of real-world experiments on everyday tasks, we show that exploiting both the image-level and pixel-level properties of visual foundation models enables unprecedented learning efficiency and generalisation. Videos and code are available at https://www.robot-learning.uk/dinobot
|
|
13:30-15:00, Paper TuBT7-CC.5 | Add to My Program |
Rank2Reward: Learning Shaped Reward Functions from Passive Video |
|
Yang, Daniel | Massachusetts Institute of Technology |
Tjia, Davin | University of Washington |
Herman Berg, Jacob | University of Washington |
Damen, Dima | University of Bristol |
Agrawal, Pulkit | MIT |
Gupta, Abhishek | University of Washington |
Keywords: Imitation Learning, Reinforcement Learning, Machine Learning for Robot Control
Abstract: Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both "what" to do and "how" to do it. A powerful way to encode both the "what" and the "how" is to infer a well-shaped reward function for reinforcement learning. The challenge is determining how to ground visual demonstration inputs into a well-shaped and informative reward function. We propose a technique Rank2Reward for learning behaviors from videos of tasks being performed without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental "progress" through a task by learning how to temporally rank the video frames in a demonstration. By inferring an appropriate ranking, the reward function is able to guide reinforcement learning by indicating when task progress is being made. This ranking function can be integrated into an adversarial imitation learning scheme resulting in an algorithm that can learn behaviors without exploiting the learned reward function. We demonstrate the effectiveness of Rank2Reward at learning behaviors from raw video on a number of tabletop manipulation tasks in both simulations and on a real-world robotic arm. We also demonstrate how Rank2Reward can be easily extended to be applicable to web-scale video datasets.
|
|
13:30-15:00, Paper TuBT7-CC.6 | Add to My Program |
A Generalized Acquisition Function for Preference-Based Reward Learning |
|
Ellis, Evan | UC Berkeley |
Ghosal, Gaurav | Carnegie Mellon University |
Russell, Stuart Jonathan | University of California, Berkeley |
Dragan, Anca | University of California Berkeley |
Bıyık, Erdem | University of Southern California |
Keywords: Imitation Learning, Reinforcement Learning, Machine Learning for Robot Control
Abstract: Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task. Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency. The information gain criterion focuses on precisely identifying all parameters of the reward function. This can potentially be wasteful as many parameters may result in the same reward, and many rewards may result in the same behavior in the downstream tasks. Instead, we show that it is possible to optimize for learning the reward function up to a behavioral equivalence class, such as inducing the same ranking over behaviors, distribution over choices, or other related definitions of what makes two rewards similar. We introduce a tractable framework that can capture such definitions of similarity. Our experiments in a synthetic environment, an assistive robotics environment with domain transfer, and a natural language processing problem with real datasets demonstrate the superior performance of our querying method over the state-of-the-art information gain method.
|
|
13:30-15:00, Paper TuBT7-CC.7 | Add to My Program |
Human-Robot Deformation Manipulation Skill Transfer: Sequential Fabric Unfolding Method for Robots |
|
Fu, Tianyu | Shandong University |
Bai, Yunfeng | Shandong University |
Li, Cheng | Shandong University |
Li, Fengming | Shandong Jianzhu University |
Wang, Chaoqun | Shandong University |
Song, Rui | Shandong University |
Keywords: Imitation Learning, Learning from Demonstration, Manipulation Planning
Abstract: Deformable object manipulation has been considered a challenging task for robots for its complex dynamics and the infinite-dimensional configuration space. Fabric unfolding manipulation takes on critical significance in the textile industry and household services. Accordingly, enabling robots to possess the above-mentioned skill has been confirmed as a crucial and challenging task. In this study, a general framework is developed for transferring human skills to robots in fabric unfolding manipulation. The developed framework comprises two key components (i.e., behavior cloning to learn human unfolding policy and learning from demonstration to transfer unfolding actions). A mixture density network is introduced, with the aim of addressing the multimodality in human policy. Moreover, task parameter weighting is considered during action generalization to adapt to a wide variety of unfolding scenarios. As revealed by the experimental results of this study, the framework can successfully unfold fabrics of different colors and sizes, and its performance can be comparable to human-level operation. Furthermore, the framework also can be applied to garment unfolding, and experiments suggest that it exhibits generalization.
|
|
13:30-15:00, Paper TuBT7-CC.8 | Add to My Program |
Model Optimization in Deep Learning Based Robot Control for Autonomous Driving |
|
Paniego, Sergio | Universidad Rey Juan Carlos |
Paliwal, Nikhil | Saarland University, Germany |
Cañas, José M. | Universidad Rey Juan Carlos |
Keywords: Imitation Learning, Deep Learning for Visual Perception, Machine Learning for Robot Control
Abstract: Deep Learning (DL) has been successfully used in robotics for perception tasks and end-to-end robot control. In the context of autonomous driving, this work explores and compares a variety of alternatives for model optimization to solve the visual lane-follow application in urban scenarios with an imitation learning approach. The optimization techniques include quantization, pruning, fine-tuning (retraining), and clustering, covering all the options available at the most common DL frameworks. TensorRT optimization for specific cutting-edge hardware devices has been also explored. For the comparison, offline metrics such as mean squared error and inference time are used. In addition, the optimized models have been evaluated in an online fashion using the autonomous driving state-of-the-art simulator CARLA and an assessment tool called Behavior Metrics, which provides holistic quantitative fine-grain data about robot performance. Typically the performance of robot applications depends both on the quality of the control decisions and also on their frequency. The studied optimized models significantly increase inference frequency without losing decision quality. The impact of each optimization alone has also been measured. This speed-up allows us to successfully run DL robot-control applications even in limited computing hardware. All the work presented here is open-source, including models, weights, assessment tool, and dataset, for easy replication and extension.
|
|
TuBT8-CC Oral Session, CC-418 |
Add to My Program |
Reinforcement Learning I |
|
|
Chair: Plancher, Brian | Barnard College, Columbia University |
Co-Chair: Mangharam, Rahul | University of Pennsylvania |
|
13:30-15:00, Paper TuBT8-CC.1 | Add to My Program |
Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy |
|
Cao, Chenyang | Tsinghua University |
Yan, Zichen | Tsinghua University |
Lu, Renhao | Tsinghua University |
Tan, Junbo | Tsinghua University |
Wang, Xueqian | Center for Artificial Intelligence and Robotics, Graduate School |
Keywords: Reinforcement Learning, Robot Safety, Deep Learning Methods
Abstract: Offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. While prior work has demonstrated various approaches for agents to learn near-optimal policies, these methods encounter limitations when dealing with diverse constraints in complex environments, such as safety constraints. Some of these approaches prioritize goal attainment without considering safety, while others excessively focus on safety at the expense of training efficiency. In this paper, we study the problem of constrained offline GCRL and propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals. To evaluate the method performance, we build a benchmark based on the robot-fetching environment with a randomly positioned obstacle and use expert or random policies to generate an offline dataset. We compare RbSL with three offline GCRL algorithms and one offline safe RL algorithm. As a result, our method outperforms the existing state-of-the-art methods to a large extent. Furthermore, we validate the practicality and effectiveness of RbSL by deploying it on a real Panda manipulator. Code is available at https://github.com/Sunlighted/RbSL.git.
|
|
13:30-15:00, Paper TuBT8-CC.2 | Add to My Program |
Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization |
|
Yang, Fan | University of Michigan |
Zhou, Wenxuan | Carnegie Mellon University |
Liu, Zuxin | Carnegie Mellon University |
Zhao, Ding | Carnegie Mellon University |
Held, David | Carnegie Mellon University |
Keywords: Reinforcement Learning, Robot Safety
Abstract: Safe Reinforcement Learning (RL) plays an important role in applying RL algorithms to safety-critical real-world applications, addressing the trade-off between maximizing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space of a modified Markov Decision Process (MDP). The RL agent produces a sequence of actions that are transformed into safe trajectories by a trajectory optimizer, thereby effectively ensuring safety and increasing training stability. This novel approach excels in its performance on challenging Safety Gym tasks, achieving significantly higher rewards and near-zero safety violations during inference. The method's real-world applicability is demonstrated through a safe and effective deployment in a real robot task of box-pushing around obstacles. Further insights are available from the videos on our website: https://sites.google.com/view/safemdp
|
|
13:30-15:00, Paper TuBT8-CC.3 | Add to My Program |
Distributional Reinforcement Learning with Sample-Set Bellman Update |
|
Zhang, Weijian | Nanjing University |
Wang, Jianshu | Nanjing University |
Yang, Yu | National Key Laboratory for Novel Software Technology, Nanjing U |
Keywords: Reinforcement Learning, Representation Learning, Probability and Statistical Methods
Abstract: Distributional Reinforcement Learning (DRL) not only endeavors to optimize expected returns but also strives to accurately characterize the full distribution of these returns, a key aspect in enhancing risk-aware decision-making. Previous DRL implementations often inappropriately treat statistical estimations as concrete samples, which undermines the integrity of learning. While several studies have addressed this issue, they frequently give rise to new complications, including computational burdens and diminished stochastic behavior. In our work, we present a novel DRL framework that leverages the Gaussian mixture model to adeptly depict the distribution of returns. This approach ensures precise, authentic sampling critical for robust learning, while also preserving computational tractability. Through extensive evaluation of a diverse array of 59 Atari games, our method not only surpasses the efficacy of prior DRL algorithms but also presents formidable competition to contemporary top-tier RL algorithms, signifying a substantial advancement in the field.
|
|
13:30-15:00, Paper TuBT8-CC.4 | Add to My Program |
Learning Adaptive Safety for Multi-Agent Systems |
|
Berducci, Luigi | TU Wien |
Yang, Shuo | University of Pennsylvania |
Mangharam, Rahul | University of Pennsylvania |
Grosu, Radu | TU Wien |
Keywords: Reinforcement Learning, Robot Safety, Multi-Robot Systems
Abstract: Ensuring safety in dynamic multi-agent systems is challenging due to limited information about the other agents. Control Barrier Functions (CBFs) are showing promise for safety assurance but current methods make strong assumptions about other agents and often rely on manual tuning to balance safety, feasibility, and performance. In this work, we delve into the problem of adaptive safe learning for multi-agent systems with CBF. We show how emergent behaviour can be profoundly influenced by the CBF configuration, highlighting the necessity for a responsive and dynamic approach to CBF design. We present ASRL, a novel adaptive safe RL framework, to fully automate the optimization of policy and CBF coefficients, to enhance safety and long-term performance through reinforcement learning. By directly interacting with the other agents, ASRL learns to cope with diverse agent behaviours and maintains the cost violations below a desired limit. We evaluate ASRL in a multi-robot system and competitive multi-agent racing, against learning-based and control-theoretic approaches. We empirically demonstrate the efficacy of ASRL, and assess generalization and scalability to out-of-distribution scenarios.
|
|
13:30-15:00, Paper TuBT8-CC.5 | Add to My Program |
Contrastive Initial State Buffer for Reinforcement Learning |
|
Messikommer, Nico | University of Zurich |
Song, Yunlong | University of Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Reinforcement Learning, Deep Learning Methods
Abstract: In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples. While recent works have been effective in leveraging past experiences for policy updates, they often overlook the potential of reusing past experiences for data collection. Independent of the underlying RL algorithm, we introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment in order to guide it toward more informative states. We validate our approach on two complex robotic tasks without relying on any prior information about the environment: (i) locomotion of a quadruped robot traversing challenging terrains and (ii) a quadcopter drone racing through a track. The experimental results show that our initial state buffer achieves higher task performance than the nominal baseline while also speeding up training convergence.
|
|
13:30-15:00, Paper TuBT8-CC.6 | Add to My Program |
Safety Optimized Reinforcement Learning Via Multi-Objective Policy Optimization |
|
Honari, Homayoun | University of Victoria |
Ghafarian Tamizi, Mehran | University of Victoria |
Najjaran, Homayoun | University of Victoria |
Keywords: Reinforcement Learning, Learning from Experience
Abstract: Safe reinforcement learning (Safe RL) refers to a class of techniques that aim to prevent RL algorithms from violating constraints in the process of decision-making and exploration during trial and error. In this paper, a novel model-free Safe RL algorithm, formulated based on the multi-objective policy optimization framework is introduced where the policy is optimized towards optimality and safety, simultaneously. The optimality is achieved by the environment reward function that is subsequently shaped using a safety critic. The advantage of the textit{Safety Optimized RL (SORL)} algorithm compared to the traditional Safe RL algorithms is that it omits the need to constrain the policy search space. This allows SORL to find a natural tradeoff between safety and optimality without compromising the performance in terms of either safety or optimality due to strict search space constraints. Through our theoretical analysis of SORL, we propose a condition for SORL's converged policy to guarantee safety and then use it to introduce an aggressiveness parameter that allows for fine-tuning the mentioned tradeoff. The experimental results obtained in seven different robotic environments indicate a considerable reduction in the number of safety violations along with higher, or competitive, policy returns, in comparison to six different state-of-the-art Safe RL methods. The results demonstrate the significant superiority of the proposed SORL algorithm in safety-critical applications.
|
|
13:30-15:00, Paper TuBT8-CC.7 | Add to My Program |
Differentially Encoded Observation Spaces for Perceptive Reinforcement Learning |
|
Grossman, Lev | Berkshire Grey |
Plancher, Brian | Barnard College, Columbia University |
Keywords: Reinforcement Learning, Deep Learning Methods, Software Architecture for Robotic and Automation
Abstract: Perceptive deep reinforcement learning (DRL) has lead to many recent breakthroughs for complex AI systems leveraging image-based input data. Applications of these results range from super-human level video game agents to dexterous, physically intelligent robots. However, training these perceptive DRL-enabled systems remains incredibly compute and memory intensive, often requiring huge training datasets and large experience replay buffers. This poses a challenge for the next generation of field robots that will need to be able to learn on the edge in order to adapt to their environments. In this paper, we begin to address this issue through differentially encoded observation spaces. By reinterpreting stored image-based observations as a video, we leverage lossless differential video encoding schemes to compress the replay buffer without impacting training performance. We evaluate our approach with three state-of-the-art DRL algorithms and find that differential image encoding reduces the memory footprint by as much as 14.2x and 16.7x across tasks from the Atari 2600 benchmark and the DeepMind Control Suite (DMC) respectively. These savings also enable large-scale perceptive DRL that previously required paging between flash and RAM to be run entirely in RAM, improving the latency of DMC tasks by as much as 27%.
|
|
13:30-15:00, Paper TuBT8-CC.8 | Add to My Program |
Projected Task-Specific Layers for Multi-Task Reinforcement Learning |
|
Somerville Roberts, Josselin | Stanford University |
Di, Julia | Stanford University |
Keywords: Reinforcement Learning, Deep Learning Methods
Abstract: Multi-task reinforcement learning could enable robots to scale across a wide variety of manipulation tasks in homes and workplaces. However, generalizing from one task to another and mitigating negative task interference still remains a challenge. Addressing this challenge by successfully sharing information across tasks will depend on how well the structure underlying the tasks is captured. In this work, we introduce our new architecture, Projected Task-Specific Layers (PTSL), that leverages a common policy with dense task-specific corrections through task-specific layers to better express shared and variable task information. We then show that our model outperforms the state of the art on the MT10 and MT50 benchmarks of Meta-World consisting of 10 and 50 goal-conditioned tasks for a Sawyer arm.
|
|
13:30-15:00, Paper TuBT8-CC.9 | Add to My Program |
Bi²Lane: Bi-Directional Temporal Refinement with Bi-Level Feature Aggregation for 3D Lane Detection |
|
Li, Chengxin | South China Normal University |
Hu, Yihui | Gac R&d Center |
Zheng, Zewen | Guangdong University of Technology |
Gao, Xiang | Guangdong University of Technology |
Mou, Yongqiang | Guangzhou Automobile Group Co Ltd |
Nie, Peng | Guangzhou Automobile Group Co Ltd |
Li, Jun | South China Normal University |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Recognition
Abstract: Monocular 3D lane detection has recently received increasing research attention in autonomous driving due to its application effectiveness and simplicity. However, depending solely on the limited semantic information from a single image makes current monocular detection methods unable to deal with complex scenarios, such as occluded, blurred, and unaligned scenes. In this study, we introduce an end-to-end framework named Bi²Lane which models temporal dependency in a continuous sequence. It recurrently utilizes detected lanes within historical frames as prior information to achieve robust lane detection. Additionally, Bi²Lane employs temporal reverse refinement together with temporal forward refinement to achieve bi-directional temporal refinement (BDTR) while maintaining a robust temporal dependency. For the refined features of different frames, we design a bi-level feature aggregation module (BLFA) to fuse them in both point-level and line-level manners, enabling a comprehensive feature representation to deal with complicated road scenes. Extensive experiments conducted on the OpenLane dataset demonstrate the superiority of Bi²Lane, achieving a notable F1 score of 63.8% using a simple ResNet50 backbone, surpassing the performance of existing state-of-the-art methods.
|
|
TuBT9-CC Oral Session, CC-419 |
Add to My Program |
Vision-Based Navigation |
|
|
Chair: Morimitsu, Henrique | University of Science and Technology Beijing |
Co-Chair: Fischer, Tobias | Queensland University of Technology |
|
13:30-15:00, Paper TuBT9-CC.1 | Add to My Program |
Exploitation-Guided Exploration for Semantic Embodied Navigation |
|
Wasserman, Justin | University of Illinois at Urbana-Champaign |
Chowdhary, Girish | University of Illinois at Urbana Champaign |
Gupta, Abhinav | Carnegie Mellon University |
Jain, Unnat | Indian Institute of Technology Kanpur |
Keywords: Vision-Based Navigation
Abstract: In the recent progress in embodied navigation and sim-to-robot transfer, modular policies have emerged as a de facto framework. However, there is more to compositionality beyond the decomposition of the learning load into modular components. In this work, we investigate a principled way to syntactically combine these components. Particularly, we propose Exploitation-Guided Exploration (XGX) where separate modules for exploration and exploitation come together in a novel and intuitive manner. We configure the exploitation module to take over in the deterministic final steps of navigation i.e. when the goal becomes visible. Crucially, an exploitation module teacher-forces the exploration module and continues driving an overridden policy optimization. XGX, with effective decomposition and novel guidance, improves the state-of-the-art performance on the challenging object navigation task from 70% to 73%. Along with better accuracy, through targeted analysis, we show that XGX is also more efficient at goal-conditioned exploration. Finally, we show sim-to-real transfer to robot hardware and XGX performs over two-fold better than the best baseline from simulation benchmarking.
|
|
13:30-15:00, Paper TuBT9-CC.2 | Add to My Program |
Teach and Repeat Navigation: A Robust Control Approach |
|
Nourizadeh, Payam | QUT Centre for Robotics |
Milford, Michael J | Queensland University of Technology |
Fischer, Tobias | Queensland University of Technology |
Keywords: Vision-Based Navigation
Abstract: Robot navigation requires an autonomy pipeline that is robust to environmental changes and effective in varying conditions. Teach and Repeat (T&R) navigation has shown high performance in autonomous repeated tasks under challenging circumstances, but research within T&R has predominantly focused on motion planning as opposed to robust motion control. In this paper, we propose a novel T&R system based on a robust motion control technique for a skid-steering mobile robot using sliding-mode control that effectively handles uncertainties due to sensor noises, parametric uncertainties, and wheel-terrain interaction. We theoretically demonstrate that the proposed T&R system is globally stable and robust while considering the uncertainties of the closed-loop system. When deployed on a Clearpath Jackal robot, we show the global stability of the proposed system in both indoor and outdoor environments covering different terrains, outperforming previous state-of-the-art methods that had a higher mean average trajectory error and became unstable in these challenging environments. This paper makes an important step towards long-term autonomous T&R navigation with ensured safety guarantees.
|
|
13:30-15:00, Paper TuBT9-CC.3 | Add to My Program |
Real-Time Localization for Closed-Loop Control of Assistive Furniture |
|
Tang, Lixuan | EPFL |
Ning, Chuanfang | EPFL |
Adaimi, George | École Polytechnique Fédérale De Lausanne (EPFL) |
Ijspeert, Auke | EPFL |
Alahi, Alexandre | EPFL |
Bolotnikova, Anastasia | EPFL |
Keywords: Vision-Based Navigation, Localization, Object Detection, Segmentation and Categorization
Abstract: For people with limited mobility, navigating in cluttered indoor environments is challenging. In this work, we propose a mobile assistive furniture suite that is designed to ease the life of people with special needs in indoor movement. To enable intelligent coordination of this system, a key component is the localization of each mobile furniture. The challenge is to assess the state of an arbitrary living environment so that the estimation can be used as a realtime feedback signal for autonomous closed-loop control of mobile furniture. We propose a perception pipeline that addresses these challenges. A machine learning model is designed and trained to jointly achieve multi-object semantic keypoint detection and classification in camera images. The synthetic data generation is employed to augment the training set and boost the model performance. A robust point cloud registration uses the detected semantic keypoints and depth information to estimate poses of the furniture. Tracking is applied to achieve smooth estimation. A high-performance accelerator that optimizes the efficiency of using heterogeneous devices is applied to achieve real-time performance. This visual perception pipeline is used in closed-loop control to steer the mobile furniture from initial to a desired location demonstrated in experiments on real hardware.
|
|
13:30-15:00, Paper TuBT9-CC.4 | Add to My Program |
Uncertainty-Aware Hybrid Paradigm of Nonlinear MPC and Model-Based RL for Offroad Navigation: Exploration of Transformers in the Predictive Model |
|
Lotfi, Faraz | McGill University |
Virji, Khalil | McGill University |
Faraji, Farnoosh | McGill University |
Berry, Lucas | McGill University |
Holliday, Andrew | McGill University |
Meger, David Paul | McGill University |
Dudek, Gregory | McGill University |
Keywords: Vision-Based Navigation, Planning under Uncertainty, Optimization and Optimal Control
Abstract: In this paper, we investigate a hybrid scheme that combines nonlinear model predictive control (MPC) and model-based reinforcement learning (RL) for navigation planning of an autonomous model car across offroad, unstructured terrains without relying on predefined maps. Our innovative approach takes inspiration from BADGR, an LSTM-based network that primarily concentrates on environment modeling, but distinguishes itself by substituting LSTM modules with transformers to greatly elevate the performance our model. Addressing uncertainty within the system, we train an ensemble of predictive models and estimate the mutual information between model weights and outputs, facilitating dynamic horizon planning through the introduction of variable speeds. Further enhancing our methodology, we incorporate a nonlinear MPC controller that accounts for the intricacies of the vehicle's model and states. The model-based RL facet produces steering angles and quantifies inherent uncertainty. At the same time, the nonlinear MPC suggests optimal throttle settings, striking a balance between goal attainment speed and managing model uncertainty influenced by velocity. In the conducted studies, our approach excels over the existing baseline by consistently achieving higher metric values in predicting future events and seamlessly integrating the vehicle's kinematic model for enhanced decision-making.
|
|
13:30-15:00, Paper TuBT9-CC.5 | Add to My Program |
Robot Navigation in Unseen Environments Using Coarse Maps |
|
Xu, Chengguang | Northeastern University |
Amato, Christopher | Northeastern University |
Wong, Lawson L.S. | Northeastern University |
Keywords: Vision-Based Navigation, Localization
Abstract: Metric occupancy maps are widely used in autonomous robot navigation systems. However, when a robot is deployed in an unseen environment, building an accurate metric map is time-consuming. Can an autonomous robot directly navigate in previously unseen environments using coarse maps? In this work, we propose the Coarse Map Navigator (CMN), a navigation framework that can perform robot navigation in unseen environments using different coarse maps. To do so, CMN addresses two challenges: (1) novel and realistic visual observations; (2) error and misalignment on coarse maps. To tackle novel visual observations in unseen environments, CMN learns a deep perception model that maps the visual input from various pixel spaces to the local occupancy grid space. To tackle the error and misalignment on coarse maps, CMN extends the Bayesian filter and maintains a belief directly on coarse maps using the predicted local occupancy grids as observations. Using the latest belief, CMN extracts a global heuristic vector that guides the planner to find a local navigation action. Empirical results demonstrate that CMN achieves high navigation success rates in unseen environments, significantly outperforming baselines, and is robust to different coarse maps.
|
|
13:30-15:00, Paper TuBT9-CC.6 | Add to My Program |
Bicode: A Hybrid Blinking Marker System for Event Cameras |
|
Kitade, Takuya | Ntt Docomo, Inc |
Yamada, Wataru | Ntt Docomo, Inc |
Ochiai, Keiichi | Ntt Docomo, Inc |
Imai, Michita | Keio University |
Keywords: Vision-Based Navigation, Recognition, Localization
Abstract: In the field of robotics, tag systems play an important role in various applications, such as object identification and robot control in real-world environments. While typical visual markers use two-dimensional (2D) patterns and RGB cameras for recognizing object IDs and poses, achieving long-distance recognition necessitates increasing marker size and camera magnification to ensure the required resolution. Furthermore, the growing adoption of event cameras in robotics captures rapid changes in pixel brightness but faces limitations in recognizing stationary 2D markers. Although compact blinker markers using blinking light-emitting diodes (LEDs) achieve long-distance recognition, they are constrained by the number of IDs or recognition speed when used with standard RGB cameras. In addition, recognizing object pose using only a single blinking LED presents challenges. To address these challenges, we introduce ‘Bicode,’ an indoor visual marker designed for event cameras. Bicode seamlessly integrates 2D and blinker markers within a single marker unit.We have developed prototypes of 2.5, 5, and 10 cm square acrylic 2D markers, each equipped with a single LED blinking at 1 kHz, enabling recognition with an event camera. Our experiments revealed the effects of marker size, LED light quantity, recognition distance, and angle, external lighting conditions, and camera or marker movement on accuracy. Notably, using the 5 cm marker, we confirmed its compatibility to recognize IDs at distances exceeding 20 m, and pose recognition at 2.5 m was confirmed.
|
|
13:30-15:00, Paper TuBT9-CC.7 | Add to My Program |
RAPIDFlow: Recurrent Adaptable Pyramids with Iterative Decoding for Efficient Optical Flow Estimation |
|
Morimitsu, Henrique | University of Science and Technology Beijing |
Xiaobin, Zhu | University of Science and Technology Beijing |
Marcondes Cesar Junior, Roberto | University of São Paulo USP |
Ji, Xiangyang | Tsinghua University |
Yin, Xu-Cheng | University of Science and Technology Beijing |
Keywords: Vision-Based Navigation, Visual Tracking, Computer Vision for Automation
Abstract: Extracting motion information from videos with optical flow estimation is vital in multiple practical robot applications. Current optical flow approaches show remarkable accuracy, but top-performing methods have high computational costs and are unsuitable for embedded devices. Although some previous works have focused on developing low-cost optical flow strategies, their estimation quality has a noticeable gap with more robust methods. In this paper, we develop a novel method to efficiently estimate high-quality optical flow in embedded devices. Our proposed RAPIDFlow model combines efficient NeXt1D convolution blocks with a fully recurrent structure based on feature pyramids to decrease computational costs without significantly impacting estimation accuracy. The adaptable recurrent encoder produces multi-scale features with a single shared block, which allows us to adjust the pyramid length at inference time and make it more robust to changes in input size. Also, it enables our model to offer multiple tradeoffs between accuracy and speed to suit different applications. Experiments using a Jetson Orin NX embedded system on the MPI-Sintel and KITTI public benchmarks show that RAPIDFlow outperforms previous approaches by significant margins at faster speeds.
|
|
13:30-15:00, Paper TuBT9-CC.8 | Add to My Program |
LOC-ZSON: Language-Driven Object-Centric Zero-Shot Object Retrieval and Navigation |
|
Guan, Tianrui | University of Maryland |
Yang, Yurou | Amazon |
Cheng, Harry | Amazon |
Lin, Muyuan | Amazon.com LLC |
Kim, Richard | Amazon, Lab126 |
Madhivanan, Rajasimman | Amazon.com |
Sen, Arnab | Amazon |
Manocha, Dinesh | University of Maryland |
Keywords: Vision-Based Navigation, AI-Enabled Robotics, Computer Vision for Automation
Abstract: In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stability during training and zero-shot inference. We implement our method on Astro robot and deploy it in both simulated and real-world environments for zero-shot object navigation. We show that our proposed method can achieve an improvement of 1.38-13.38% in terms of text-to-image recall on different benchmark settings for the retrieval task. For object navigation, we show the benefit of our approach in simulation and real world, showing 5% and 16.67% improvement in terms of navigation success rate, respectively.
|
|
TuBT10-CC Oral Session, CC-501 |
Add to My Program |
Soft Robot Materials and Design II |
|
|
Chair: Gu, Guoying | Shanghai Jiao Tong University |
Co-Chair: Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
|
13:30-15:00, Paper TuBT10-CC.1 | Add to My Program |
Modeling and Design of Lattice-Reinforced Pneumatic Soft Robots |
|
Wang, Dong | Shanghai Jiao Tong University |
Jiang, Chengru | Shanghai Jiaotong University |
Gu, Guoying | Shanghai Jiao Tong University |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Soft Sensors and Actuators
Abstract: Lattice metamaterials exhibit diverse functions and complex spatial deformations by rational structural design. Here, lattice metamaterials are exploited to design pneumatic soft robots with programmable bending, twisting and elongation deformations. The system comprises an elastomeric tube reinforced by lattice metamaterials. We develop an analytical framework to model the twisting, bending and elongation finite deformation taking into account the geometric orthotropy and nonlinear elasticity. We experimentally validate our modeling approach and investigate the effects of geometric patterns and input loading on the soft actuators’ deformation. Theoretical guided design of lateral-climbing soft robots and exploration soft manipulators are demonstrated. The soft actuator could exhibit a combined twisting-bending-elongation deformation by lattice superimposition. The proposed structural design method paves the way for designing soft robots with complex and dexterous deformations.
|
|
13:30-15:00, Paper TuBT10-CC.2 | Add to My Program |
Design and Analysis of Soft Hybrid-Driven Manipulator with Variable Stiffness and Multiple Motion Patterns |
|
Fu, Xin | Shenyang Institute of Automation Chinese Academy of Sciences |
Zhang, Daohui | Shenyang Institute of Automation, Chinese Academy of Sciences |
Mo, Liyan | Shenyang Institute of Automation, Chinese Academy of Sciences |
Li, Kai | Chinese Academy of Sciences(CAS), University of Chinese Academy |
Zhao, Xingang | Shenyang Institute of Automation, Chinese Academy of Sciences |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Software-Hardware Integration for Robot Systems
Abstract: Abstract— Soft manipulators offer the advantages of safety and adaptability. However, due to insufficient stiffness and single motion mode limitations, existing soft manipulators usually exhibit low load capacity and small working space. To address this problem, we propose a novel soft hybrid-driven manipulator with continuous stiffness control capability and multiple motion patterns (bending, rotation, and elongation). Furthermore, we develop kinematic and stiffness models based on the constant curvature assumption. The soft robot consists of a soft bellow actuator and inextensible rigid skeletons, which exhibits a high extension ratio and low input pressure. With the antagonistic actuation of tendon-pulling and air-pushing, the robot can achieve independent control over stiffness and position in three-dimensional space. The performance associated with the designed soft hybrid-driven manipulator is experimentally verified. The robot can achieve an elongation of 198% and a maximum bending angle of 240°. The robot can also increase stiffness by increasing internal air pressure to resist deformation caused by external loads. Additionally, tracking experiments with various trajectories in 3D space verify the accuracy of the kinematic model, which indicates that the soft manipulator possesses a large workspace and stable motion capabilities.
|
|
13:30-15:00, Paper TuBT10-CC.3 | Add to My Program |
Directly 3D Printed, Pneumatically Actuated Multi-Material Robotic Hand |
|
Matusik, Hanna | MIT |
Liu, Chao | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: Soft Robot Materials and Design, Multifingered Hands, Compliant Joints and Mechanisms
Abstract: Soft robotic manipulators with many degrees of freedom can carry out complex tasks safely around humans. However, manufacturing of soft robotic hands with several degrees of freedom requires a complex multi-step manual process, which significantly increases their cost. We present a design of a multi-material 15 DoF robotic hand with five fingers including an opposable thumb. Our design has 15 pneumatic actuators based on a series of hollow chambers that are driven by an external pressure system. The thumb utilizes rigid joints and the palm features internal rigid structure and soft skin. The design can be directly 3D printed using a multi-material additive manufacturing process without any assembly process and therefore our hand can be manufactured for less than 300 dollars. We test the hand in conjunction with a low-cost vision-based teleoperation system on different tasks.
|
|
13:30-15:00, Paper TuBT10-CC.4 | Add to My Program |
Soft Hand Extension Glove with Thumb Abduction and Extension Assistance |
|
Xie, Disheng | The Chinese University of Hong Kong |
Su, Yujie | The Chinese Unverisity of Hong Kong |
Shi, Xiangqian | The Chinese University of Hong Kong |
Li, Zheng | The Chinese University of Hong Kong |
Tong, Kai Yu | The Chinese University of Hong Kong |
Keywords: Soft Robot Materials and Design, Rehabilitation Robotics, Soft Robot Applications
Abstract: Hand extension is crucial for stroke survivors with spasticity, where their fingers become rigid and their thumb remains curled within the palm. Due to the underactuated nature of the hand, the dominance of flexor muscles over extensors, and the limited surface area available, developing an extension glove with thumb assistance poses a challenge for researchers. This paper introduces a fully wearable soft hand extension glove based on the X-pouch and strap system, addressing the above challenges. The glove enables adequate finger extension, thumb abduction, and extension for high MAS score patients. Modelling and testing revealed extension torques of up to 2.7 Nm at the MCP joint and 0.67 Nm at the PIP and DIP joints. Performance evaluation, including comparison with existing methods, demonstrated the glove's superior extension capabilities using a model hand with realistic stiffness. Furthermore, the glove's effectiveness was confirmed through testing on a stroke patient with MAS = 2, validating its on-body functionality.
|
|
13:30-15:00, Paper TuBT10-CC.5 | Add to My Program |
Design and Characterization of a Soft Flat Tube Twisting Actuator |
|
Liu, Hao | The University of Hong Kong |
Wu, Changchun | The University of Hong Kong |
Lin, Senyuan | The University of Hong Kong |
Chen, Yonghua | The University of Hong Kong |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Grippers and Other End-Effectors
Abstract: Soft actuators have shown advantages of adaptiveness, large deformation, and safe human-robot interaction, making them suitable for various applications. Herein, a novel soft flat tube twisting actuator (SFTTA) is proposed. The SFTTA is composed of a folded flat tube sandwiched between two silicone rubber laminates. When inflated by compressed air, the folded corners of the flat tube tend to unfold, resulting in the twist of the actuator to a helical structure. The SFTTA has great scalability. It can be fabricated through simple processes with low-cost materials. For a sample SFTTA with the size of a human finger, it can twist 5400 at an air pressure of 300 kPa. In general, SFTTA based actuators can twist 9.6 degree per millimeter in length, which is significantly larger than previously reported soft twisting actuators. Additionally, the composite-like SFTTA allows mechanical property programming through the alteration of folding patterns of the flat tube and the material structure of the elastomer laminates. Finally, an extensible soft gripper based on flat tube actuators and a robotic wrist module are developed, and their rotation is realized by the proposed SFTTA actuator.
|
|
13:30-15:00, Paper TuBT10-CC.6 | Add to My Program |
Self-Retractable Soft Growing Robots for Reliable and Fast Retraction While Preserving Their Inherent Advantages |
|
Kim, Nam Gyun | Korea Advanced Institute of Science and Technology |
Seo, Dongoh | Korea Advanced Institute of Science and Technology |
Park, Shinwoo | KAIST |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Soft Sensors and Actuators
Abstract: Soft growing robots have garnered significant research interest owing to their unique locomotion. However, real-world applications of these robots are limited by challenges in achieving reversible and repeatable operations, particularly when faced with buckling during retraction. Although a va- riety of retraction mechanisms have been developed, many necessitate the installation of extra rigid hardware at the distal part, compromising the inherent benefits of soft growing robots. Existing soft retraction mechanisms that maintain these advantages tend to be relatively slow and rely on heavy driving fluids. This study introduces a soft retraction mechanism that depends exclusively on the existing pneumatic force, eliminating the need for additional rigid hardware, power sources, or complex control procedures. This mechanism enables rapid and reliable retraction of soft growing robots without sacrificing their inherent advantages or interfering with their inner chan- nels during retraction. The proposed mechanism’s straightfor- ward structure facilitates easy integration with a wide range of tip mounts, steering mechanisms, and other application- specific soft growing robots. This research offers an analysis and experimental examination of the operating principles and behaviors of the proposed mechanism. It also presents the design guidelines and fabrication details for the mechanism, as well as a demonstration of its swift and buckling-free retraction.
|
|
13:30-15:00, Paper TuBT10-CC.7 | Add to My Program |
High-Curvature, High-Force, Vine Robot for Inspection |
|
Mendoza Flores, Mijaíl Jaén | University of California Santa Barbara |
Naclerio, Nicholas | University of California, Santa Barbara |
Hawkes, Elliot Wright | University of California, Santa Barbara |
Keywords: Soft Robot Materials and Design, Soft Robot Applications
Abstract: Robot performance has advanced considerably both in and out of the factory, however in tightly constrained, unknown environments such as inside a jet engine or the human heart, current robots are less adept. In such cases where a borescope or endoscope can’t reach, disassembly or surgery are costly. One promising inspection device inspired by plant growth are “vine robots” that can navigate cluttered environments by extending from their tip. Yet, these vine robots are currently limited in their ability to simultaneously steer into tight curvatures and apply substantial forces to the environment. Here, we propose a plant-inspired method of steering by asymmetrically lengthening one side of the vine robot to enable high curvature and large force application. Our key development is the introduction of an extremely anisotropic, composite, wrinkled film with elastic moduli 400x different in orthogonal directions. The film is used as the vine robot body, oriented such that it can stretch over 120% axially, but only 3% circumferentially. With the addition of controlled layer jamming, this film enables a steering method inspired by plants in which the circumference of the robot is inextensible, but the sides can stretch to allow turns. This steering method and body pressure do not work against each other, allowing the robot to exhibit higher forces and tighter curvatures than previous vine robot architectures. This work advances the abilities of vine robots–and robots more generally–to not only access tightly constrained environments, but perform useful work once accessed.
|
|
13:30-15:00, Paper TuBT10-CC.8 | Add to My Program |
Robotic Modules for a Continuum Manipulator with Variable Stiffness Joints |
|
Paterno, Linda | Scuola Superiore Sant'Anna |
Sozer, Canberk | The University of Sheffield |
Sahu, Sujit | Indian Institute of Technology Patna, Bihta |
Menciassi, Arianna | Scuola Superiore Sant'Anna - SSSA |
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Actuation and Joint Mechanisms
Abstract: This study introduces a novel robotic module that integrates three spring-reinforced soft actuators for positioning the module in 3D space. This is achieved by utilizing a ball joint as the rotation center and leveraging the spring elements not only as reinforcement structures but also as inductive sensors. Additionally, soft pads are strategically placed around the ball joint to adjust the module stiffness irrespective of its position. Both actuation and stiffening mechanisms are independently controlled by pressure. Design, experimental characterization, and closed-loop control of the module are reported. In addition, a multifunctional manipulator that is built by integrating three modules in a series is demonstrated. A specific architecture has been pursued to reduce the overall number of fluidic tubes required when adding a new module. It resulted in a manipulator with continuum soft actuators, but independent variable stiffness joints, which are the key feature for guaranteeing different bending angles of each segment. Results show that a single module can bend up to 30° omnidirectionally, its stiffness can increase up to 95% in a controllable way, and the output voltage change of the springs can be employed for position sensing. This design offers a highly compact, lightweight, and low-cost solution exploitable in a wide range of applications, from medical to rescue missions, where actions behind obstacles in highly confined areas are needed.
|
|
13:30-15:00, Paper TuBT10-CC.9 | Add to My Program |
A Modular, Tendon Driven Variable Stiffness Manipulator with Internal Routing for Improved Stability and Increased Payload Capacity |
|
Walker, Kyle Liam | The National Robotarium |
Partridge, Alix James | The National Robotarium |
Chen, Hsing-Yu | Univeristy of Bristol |
Ramachandran, Rahul Ramakrishnan | The National Robotarium, Heriot-Watt University |
Stokes, Adam Andrew | University of Edinburgh |
Tadakuma, Kenjiro | Tohoku University |
Cruz da SIlva, Lucas | SENAI CIMATEC |
Giorgio-Serchi, Francesco | University of Edinburgh |
Keywords: Soft Robot Materials and Design, Tendon/Wire Mechanism, Compliance and Impedance Control
Abstract: Stability and reliable operation under a spectrum of environmental conditions is still an open challenge for soft and continuum style manipulators. The inability to carry sufficient load and effectively reject external disturbances are two drawbacks which limit the scale of continuum designs, preventing widespread adoption of this technology. To tackle these problems, this work details the design and experimental testing of a modular, tendon driven bead-style continuum manipulator with tunable stiffness. By embedding the ability to independently control the stiffness of distinct sections of the structure, the manipulator can regulate it’s posture under greater loads of up to 1kg at the end-effector, with reference to the flexible state. Likewise, an internal routing scheme vastly improves the stability of the proximal segment when operating the distal segment, reducing deviations by at least 70.11%. Operation is validated when gravity is both tangential and perpendicular to the manipulator backbone, a feature uncommon in previous designs. The findings presented in this work are key to the development of larger scale continuum designs, demonstrating that flexibility and tip stability under loading can co-exist without compromise.
|
|
TuBT11-CC Oral Session, CC-502 |
Add to My Program |
Deep Learning for Visual Perception II |
|
|
Chair: Su, Hao | North Carolina State University |
Co-Chair: Ji, Jingjing | Huazhong University of Science and Technology |
|
13:30-15:00, Paper TuBT11-CC.1 | Add to My Program |
EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction |
|
Fang, Irving | New York University |
Chen, Yuzhong | New York University |
Wang, Yifan | New York University |
Zhang, Jianghan | New York University |
Zhang, Qiushi | New York University |
Xu, Jiali | New York University |
He, Xibo | Xi'an Jiaotong University |
Gao, Weibo | North Carolina State University |
Su, Hao | North Carolina State University |
Li, Yiming | New York University |
Feng, Chen | New York University |
Keywords: Data Sets for Robotic Vision, Intention Recognition, Deep Learning for Visual Perception
Abstract: A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstream robotics tasks, especially given the increasing prevalence of headset devices. This study expands EgoPAT3D, the sole dataset dedicated to egocentric 3D action target prediction. We augment both its size and diversity, enhancing its potential for generalization. Moreover, we substantially enhance the baseline algorithm by introducing a large pre-trained model and human prior knowledge. Remarkably, our novel algorithm can now achieve superior prediction outcomes using solely RGB images, eliminating the previous need for 3D point clouds and IMU input. Furthermore, we deploy our enhanced baseline algorithm on a real-world robotic platform to illustrate its practical utility in straightforward HRI tasks. The demonstrations showcase the real-world applicability of our advancements and may inspire more HRI use cases involving egocentric vision. All code and data are open-sourced and can be found on the project website.
|
|
13:30-15:00, Paper TuBT11-CC.2 | Add to My Program |
Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation |
|
Ni, Jiayi | Peking University |
Yang, Senqiao | Harbin Institute of Technology, Shenzhen |
Xu, Ran | Beijing University of Posts and Telecommunications |
Liu, Jiaming | Peking University |
Li, Xiaoqi | Peking University |
Jiao, Wenyu | University of Washington |
Chen, Zehui | University of Science and Technology of China |
Liu, Yi | Baidu Inc |
Zhang, Shanghang | Peking University |
Keywords: Deep Learning for Visual Perception, Continual Learning, Transfer Learning
Abstract: Since autonomous driving systems usually face dynamic and ever-changing environments, continual test-time adaptation (CTTA) has been proposed as a strategy for transferring deployed models to continually changing target domains. However, the pursuit of long-term adaptation often introduces catastrophic forgetting and error accumulation problems, which impede the practical implementation of CTTA in the real world. Recently, existing CTTA methods mainly focus on utilizing a majority of parameters to fit target domain knowledge through self-training. Unfortunately, these approaches often amplify the challenge of error accumulation due to noisy pseudo-labels, and pose practical limitations stemming from the heavy computational costs associated with entire model updates. In this paper, we propose a distribution-aware tuning (DAT) method to make the semantic segmentation CTTA efficient and practical in real-world applications. DAT adaptively selects and updates two small groups of trainable parameters based on data distribution during the continual adaptation process, including domain-specific parameters (DSP) and task-relevant parameters (TRP). Specifically, DSP exhibits sensitivity to outputs with substantial distribution shifts, effectively mitigating the problem of error accumulation. In contrast, TRP are allocated to positions that are responsive to outputs with minor distribution shifts, which are fine-tuned to avoid the catastrophic forgetting problem. In addition, since CTTA is a temporal task, we introduce the Parameter Accumulation Update (PAU) strategy to collect the updated DSP and TRP in target domain sequences. We conducted extensive experiments on two widely-used semantic segmentation CTTA benchmarks, achieving competitive performance and efficiency compared to previous state-of-the-art methods.
|
|
13:30-15:00, Paper TuBT11-CC.3 | Add to My Program |
STNet: Spatio-Temporal Fusion-Based Self-Attention for Slip Detection in Visuo-Tactile Sensors |
|
Lu, Jin | Huazhong University of Science and Technology |
Niu, Bangyan | Huazhong University of Science and Technology |
Ma, Huan | Huazhong University of Science and Technology |
Jiafeng, Zhu | Huazhong University of Science and Technology |
Ji, Jingjing | Huazhong University of Science and Technology |
Keywords: Deep Learning for Visual Perception, Force and Tactile Sensing, Perception for Grasping and Manipulation
Abstract: Slip detection plays a pivotal role in the dexterity of robotics, improving the reliability and precision of manipulations but also contributing to safety, efficiency, and adaptability. Deep learning-based slip detection algorithms commonly difficult to concentrate on key features when faced with dense 3D shape data obtained by visuo-tactile sensors. Data from noncontact locations can interfere with slip judgements and the ignorance of interframe linkage can also lead to slip detection failure. In this paper, a new spatio-temporal sequences fusion-based self-attention, STNet, is proposed to perform slip detection by allocating more attention to the object-sensor contact area when processing complex 3D shape data. A binocular visuo-tactile system (BVTS) is designed and fabricated for dataset construction. The entire 3D shape dataset containing 4 motion patterns, including stationary, pressing, rolling and slipping. Self-attention architecture with and without spatio-temporal sequences fusion mechanism (denoted as STNet and TemNet, respectively) are trained based on the same dataset. The experiments show the validity of STNet, which can reach 98.91% slip detection accuracy. Meanwhile, the ablation studies confirm the effectiveness of the spatio-temporal sequences fusion mechanism.
|
|
13:30-15:00, Paper TuBT11-CC.4 | Add to My Program |
Commonsense Spatial Knowledge-Aware 3-D Human Motion and Object Interaction Prediction |
|
Lee, Sang Uk | Motional |
Keywords: Deep Learning for Visual Perception, Human-Robot Collaboration, Deep Learning Methods
Abstract: We propose a novel 3-D human motion and object interaction prediction model that is aware of commonsense knowledge about human--object interaction. We jointly predict human joint motion and human--object interactions. The two prediction results are combined to enforce commonsense knowledge, such as ``if the human right hand is predicted to be in contact with an object after 1 second, the distance between the right hand and an object should also be predicted to be small,'' explicit to the model. Our model uses the raw point cloud representation of the surrounding objects in the environment as input. Using raw point cloud representation allows us to model commonsense knowledge easily and improve accuracy. In particular, it does not require a separate perception system (e.g., object classification, object pose estimation, and so on), as in previous studies, and thus is robust to perception errors. Our model applies a cross-attention mechanism to fuse the environmental point cloud and past human joint poses. The surrounding environment context and past human joint poses are two heterogeneous inputs and cross-attention can be a powerful approach to fuse them. Our model is validated on the KIT Whole-Body Human Motion (WBHM) dataset.
|
|
13:30-15:00, Paper TuBT11-CC.5 | Add to My Program |
High-Degrees-Of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning |
|
Schulze, Lennart | Columbia University |
Lipson, Hod | Columbia University |
Keywords: Deep Learning for Visual Perception, Machine Learning for Robot Control, AI-Enabled Robotics
Abstract: A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in the absence of a classical geometric kinematic model. In particular, when the latter is hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on motion planning tasks as an exemplary downstream application.
|
|
13:30-15:00, Paper TuBT11-CC.6 | Add to My Program |
Language-Conditioned Affordance-Pose Detection in 3D Point Clouds |
|
Nguyen, Tien Toan | FPT Software |
Vu, Minh Nhat | TU Wien, Austria |
Huang, Baoru | Imperial College London |
Van Vo, Tuan | FPT Software |
Truong, Thuy Tuong Vy | FPT Software |
Le, Ngan | University of Arkansas |
Vo, Thieu | Ton Duc Thang University |
Le, Hoai Bac | VNUHCM-University of Science |
Nguyen, Anh | University of Liverpool |
Keywords: Deep Learning for Visual Perception, Recognition
Abstract: Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at https://3dapnet.github.io.
|
|
13:30-15:00, Paper TuBT11-CC.7 | Add to My Program |
Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter |
|
Lim, Seunghyeon | Seoul National University |
Yoo, Youngjae | Seoul National University |
Jun Ki, Lee | Seoul National University |
Zhang, Byoung-Tak | Seoul National University |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Grasping
Abstract: In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach---Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plane segmentation by generating subplanes in Deep Plane Clustering (DPC) module, which are then merged with the final planes by post-processing. DPC rearranges the point cloud by voting layers to make subplane clusters, trained in a self-supervised manner using pseudo-labels generated from RANSAC. Multi-Object RANSAC demonstrates superior plane instance segmentation performances over other recent RANSAC applications. We conducted an experiment on robot suction-based grasping, comparing our method with vision-based grasping network and RANSAC applications. The results from this real-world scenario showed its remarkable performance surpassing the baseline methods, highlighting its potential for advanced scene understanding and manipulation.
|
|
13:30-15:00, Paper TuBT11-CC.8 | Add to My Program |
Utilizing Inpainting for Training Keypoint Detection Algorithms towards Markerless Visual Servoing |
|
Chatterjee, Sreejani | Worcester Polytechnic Institute |
Doan, Duc | Worcester Polytechnic Institute |
Calli, Berk | Worcester Polytechnic Institute |
Keywords: Deep Learning for Visual Perception, Visual Servoing, Computer Vision for Automation
Abstract: This paper presents a novel strategy to train keypoint detection models for robotics applications. Our goal is to develop methods that can robustly detect and track natural features on robotic manipulators. Such features can be used for vision-based control and pose estimation purposes, when placing artificial markers (e.g. ArUco) on the robot’s body is not possible or practical in runtime. Prior methods require accurate camera calibration and robot kinematic models in order to label training images for the keypoint locations. In this paper, we remove these dependencies by utilizing inpainting methods: In the training phase, we attach ArUco markers along the robot’s body and then label the keypoint locations as the center of those markers. We, then, use an inpainting method to reconstruct the parts of the robot occluded by the ArUco markers. As such, the markers are artificially removed from the training images, and labeled data is obtained to train markerless keypoint detection algorithms without the need for camera calibration or robot models. Using this approach, we trained a model for realtime keypoint detection and used the inferred keypoints as control features for an adaptive visual servoing scheme. We obtained successful control results with this fully model-free control strategy, utilizing natural robot features in the runtime and not requiring camera calibration or robot models in any stage of this process.
|
|
TuBT12-CC Oral Session, CC-503 |
Add to My Program |
Deep Learning in Grasping and Manipulation II |
|
|
Chair: Acar, Cihan | Institute for Infocomm Research (I2R), A*STAR |
Co-Chair: Kuntz, Alan | University of Utah |
|
13:30-15:00, Paper TuBT12-CC.1 | Add to My Program |
Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks |
|
Acar, Cihan | Institute for Infocomm Research (I2R), A*STAR |
Binici, Kuluhan | National University of Singapore |
Tekırdag, Alp | Nanyang Technological University |
Wu, Yan | A*STAR Institute for Infocomm Research |
Keywords: Deep Learning in Grasping and Manipulation, Learning from Experience, Transfer Learning
Abstract: The use of multi-camera views simultaneously has been shown to improve the generalization capabilities and performance of visual policies. However, using multiple cameras in real-world scenarios can be challenging. In this study, we present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our proposed method involves utilizing a technique known as knowledge distillation, in which a ``teacher'' policy pre-trained with multiple camera viewpoints guides a ``student'' policy in learning from a single camera viewpoint. To enhance the student policy's robustness against camera location perturbations, it is trained using data augmentation and extreme viewpoint changes. As a result, the student policy learns robust visual features that allow it to locate the object of interest accurately and consistently, regardless of the camera viewpoint. The efficacy and efficiency of the proposed method were evaluated both in simulation and real-world environments. The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone. Furthermore, the student policy demonstrates zero-shot transfer capability, where it can successfully grasp and lift objects in real-world scenarios for unseen visual configurations.
|
|
13:30-15:00, Paper TuBT12-CC.2 | Add to My Program |
Symmetric Models for Visual Force Policy Learning |
|
Kohler, Colin | Northeastern University |
Srikanth, Anuj Shrivatsav | Northeastern University |
Arora, Eshan | Northeastern University |
Platt, Robert | Northeastern University |
Keywords: Deep Learning in Grasping and Manipulation, Force Control, Reinforcement Learning
Abstract: While it is generally acknowledged that force feedback is beneficial to robotic control, applications of policy learning to robotic manipulation typically only leverage visual feedback. Recently, symmetric neural models have been used to significantly improve the sample efficiency and performance of policy learning across a variety of robotic manipulation domains. This paper explores an application of symmetric policy learning to visual-force problems. We present Symmetric Visual Force Learning (SVFL), a novel method for robotic control which leverages visual and force feedback. We demonstrate that SVFL can significantly outperform state of the art baselines for visual force learning and report several interesting empirical findings related to the utility of learning force feedback control policies in both general manipulation tasks and scenarios with low visual acuity.
|
|
13:30-15:00, Paper TuBT12-CC.3 | Add to My Program |
Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models |
|
Huang, Yixuan | University of Utah |
Yuan, Jialin | Oregon State University |
Kim, Chanho | Oregon State University |
Pradhan, Pupul | University of Utah |
Chen, Bryan | Oregon State University |
Fuxin, Li | Oregon State University |
Hermans, Tucker | University of Utah |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation
Abstract: Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real-world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers of distractor actions. Furthermore, we show our approaches outperform an implicit memory baseline.
|
|
13:30-15:00, Paper TuBT12-CC.4 | Add to My Program |
Learning to Dexterously Pick or Separate Tangled-Prone Objects for Industrial Bin Picking |
|
Zhang, Xinyi | Osaka University |
Domae, Yukiyasu | The National Institute of Advanced Industrial Science and Techno |
Wan, Weiwei | Osaka University |
Harada, Kensuke | Osaka University |
Keywords: Deep Learning in Grasping and Manipulation, Grasping
Abstract: Industrial bin picking for tangled-prone objects requires the robot to either pick up untangled objects or perform separation manipulation when the bin contains no isolated objects. The robot must be able to flexibly perform appropriate actions based on the current observation. It is challenging due to high occlusion in the clutter, elusive entanglement phenomena, and the need for skilled manipulation planning. In this paper, we propose an autonomous, effective and general approach for picking up tangled-prone objects for industrial bin picking. First, we learn PickNet - a network that maps the visual observation to pixel-wise possibilities of picking isolated objects or separating tangled objects and infers the corresponding grasp. Then, we propose two effective separation strategies: Dropping the entangled objects into a buffer bin to reduce the degree of entanglement; Pulling to separate the entangled objects in the buffer bin planned by PullNet - a network that predicts position and direction for pulling from visual input. To efficiently collect data for training PickNet and PullNet, we embrace the self-supervised learning paradigm using an algorithmic supervisor in a physics simulator. Real-world experiments show that our policy can dexterously pick up tangled-prone objects with success rates of 90%. We further demonstrate the generalization of our policy by picking a set of unseen objects. Supplementary material, code, and videos can be found at https://xinyiz093
|
|
13:30-15:00, Paper TuBT12-CC.5 | Add to My Program |
Learning Fabric Manipulation in the Real World with Human Videos |
|
Lee, Robert | Australian Centre for Robotic Vision |
Abou-Chakra, Jad | Queensland University of Technology |
Zhang, Fangyi | Queensland University of Technology |
Corke, Peter | Queensland University of Technology |
Keywords: Deep Learning in Grasping and Manipulation, Learning from Demonstration, Perception for Grasping and Manipulation
Abstract: Fabric manipulation is a long-standing challenge in robotics due to the enormous state space and complex dynamics. Learning approaches stand out as promising for this domain as they allow us to learn behaviours directly from data. Most prior methods however rely heavily on simulation, which is still limited by the large sim-to-real gap of deformable objects or rely on large datasets. A promising alternative is to learn fabric manipulation directly from watching humans perform the task. In this work, we explore how demonstrations for fabric manipulation tasks can be collected directly by humans, providing an extremely natural and fast data collection pipeline. Then, using only a handful of such demonstrations, we show how a pick-and-place policy can be learned and deployed on a real robot, without any robot data collection at all. We demonstrate our approach on a fabric smoothing and folding task, showing that our policy can reliably reach folded states from crumpled initial configurations. Videos, code, dataset and trained models are available on the project website: https://sites.google.com/view/foldingbyhand
|
|
13:30-15:00, Paper TuBT12-CC.6 | Add to My Program |
HAGrasp: Hybrid Action Grasp Control in Cluttered Scenes Using Deep Reinforcement Learning |
|
Song, Kai-Tai | National Yang Ming Chiao Tung University |
Chen, Hsiang-Hsi | National Yang Ming Chiao Tung University |
Keywords: Deep Learning in Grasping and Manipulation, Grasping, Reinforcement Learning
Abstract: Robotic autonomous grasp requires the system to perform multiple functions such as gripper and robot control, making it a task with hybrid output nature. Existing methods based on closed-loop deep reinforcement learning rely on external models for termination evaluation. To achieve more effective grasp for novel objects, we propose a new autonomous grasp control scheme termed HAGrasp that considers the complete point cloud of the workspace. It integrates grasp pose estimation, end-effector pose evaluation, and motion planning of the robotic arm into a single model, enhancing the success rate while reducing computational load. We present a closed-loop grasp control system based on deep reinforcement learning. This control system can perform grasp tasks while dynamically adjusting to avoid end-effector collisions. The design of hybrid-action reinforcement learning module is trained with unified latent action space and further improve generalization, achieving real-time autonomous grasp control. Real robot experiments show that our method has 74.2% success rate for grasping 7 unseen objects. Comparative experiments show that the proposed HAGrasp outperforms open-loop baseline Contact-Graspnet in both success rate and inference time. It is demonstrated that with integrated multi-view input and sim-to-real training design, our method improves real-world applications of autonomous grasp.
|
|
13:30-15:00, Paper TuBT12-CC.7 | Add to My Program |
Dual-Critic Deep Reinforcement Learning for Push-Grasping Synergy in Cluttered Environment |
|
Zhong, Jiakang | Swinburne University of Technology |
Wong, Yew Wee | Swinburne University of Technology |
Jin, Jiong | Swinburne University of Technology |
Song, Yong | Shandong University |
Yuan, Xianfeng | Shandong University |
Chen, Xiaoqi | South China University of Technology |
Keywords: Deep Learning in Grasping and Manipulation, Grasping
Abstract: Robotic push-grasping in densely cluttered environments presents significant challenges due to unbalanced synergy and redundancy between both actions, leading to decreased grasp efficiency. In this paper, a novel double-critic deep reinforcement learning framework is introduced to optimize the push-grasping synergy for robotic manipulation in such environments, aiming to significantly reduce pre-grasping redundancy. This framework incorporates two distinct Deep Q-learning critics: Critic I selects the best course of actions based on the current state derived from visual interpretation, whereas Critic II evaluates the success rate of the current state-action pairing. To further refine the push-grasping synergy, an active double-step learning mechanism is introduced to optimize the training reward function for the pushing action, thereby enhancing its effectiveness through increased intentionality. Simulations show that the proposed framework outperforms contemporary counterparts, notably in grasping success rate and action efficiency. Finally, the framework's generalization and adaptability are demonstrated by conducting real-world experiments using novel objects without the need of retraining.
|
|
13:30-15:00, Paper TuBT12-CC.8 | Add to My Program |
DefGoalNet: Contextual Goal Learning from Demonstrations for Deformable Object Manipulation |
|
Thach, Bao | University of Utah |
Watts, Tanner | University of Utah |
Ho, Shing-Hei | University of Utah |
Hermans, Tucker | University of Utah |
Kuntz, Alan | University of Utah |
Keywords: Deep Learning in Grasping and Manipulation, Learning from Demonstration, Surgical Robotics: Laparoscopy
Abstract: Shape servoing, a robotic task dedicated to controlling objects to desired goal shapes, is a promising approach to deformable object manipulation. An issue arises, however, with the reliance on the specification of a goal shape. This goal has been obtained either by a laborious domain knowledge engineering process or by manually manipulating the object into the desired shape and capturing the goal shape at that specific moment, both of which are impractical in various robotic applications. In this paper, we solve this problem by developing a novel neural network DefGoalNet, which learns deformable object goal shapes directly from a small number of human demonstrations. We demonstrate our method’s effectiveness on various robotic tasks, both in simulation and on a physical robot. Notably, in the surgical retraction task, even when trained with as few as 10 demonstrations, our method achieves a median success percentage of nearly 90%. These results mark a substantial advancement in enabling shape servoing methods to bring deformable object manipulation closer to practical, real-world applications.
|
|
13:30-15:00, Paper TuBT12-CC.9 | Add to My Program |
Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation |
|
Xie, Annie | Stanford University |
Lee, Lisa | Google |
Xiao, Ted | Google |
Finn, Chelsea | Stanford University |
Keywords: Deep Learning in Grasping and Manipulation, Imitation Learning
Abstract: What makes generalization hard for imitation learning in visual robotic manipulation? This question is difficult to approach at face value, but the environment from the perspective of a robot can often be decomposed into enumerable factors of variation, such as the lighting conditions or the placement of the camera. Empirically, generalization to some of these factors have presented a greater obstacle than others, but existing work sheds little light on precisely how much each factor contributes to the generalization gap. Towards an answer to this question, we study imitation learning policies in simulation and on a real robot language-conditioned manipulation task to quantify the difficulty of generalization to different (sets of) factors. We design a simulated benchmark of 19 tasks with 11 factors of variation to facilitate more controlled evaluations of generalization. From our study, we determine an ordering of factors based on generalization difficulty, that is consistent across simulation and our real robot setup. Videos and code are available at: https://sites.google.com/stanford.edu/gengap-icra
|
|
TuBT13-AX Oral Session, AX-201 |
Add to My Program |
Physical Human-Robot Interaction II |
|
|
Chair: Zefran, Milos | University of Illinois at Chicago |
|
13:30-15:00, Paper TuBT13-AX.1 | Add to My Program |
Transformer-Based Prediction of Human Motions and Contact Forces for Physical Human-Robot Interaction |
|
Fusco, Alessia | Politecnico Di Torino |
Modugno, Valerio | University College London |
Kanoulas, Dimitrios | University College London |
Rizzo, Alessandro | Politecnico Di Torino |
Cognetti, Marco | LAAS-CNRS and Université Toulouse III - Paul Sabatier |
Keywords: Physical Human-Robot Interaction, Intention Recognition, Safety in HRI
Abstract: In this paper, we propose a transformer-based architecture for predicting contact forces during a physical human-robot interaction. Our Neural Network is composed of two main parts: a Multi-Layer Perceptron called Transducer and a Transformer. The former estimates, based on the kinematic data from a motion capture suit, the current contact forces. The latter predicts -- taking as input the same kinematic data and the output of the Transducer -- the human motions and the contact forces over a time window in the future. We validated our approach by testing the network on directions of motions that were not provided in the training set. We also compared our approach to a purely Transformer-based network, showing a better prediction accuracy of the contact forces.
|
|
13:30-15:00, Paper TuBT13-AX.2 | Add to My Program |
SynH2R: Synthesizing Hand-Object Motions for Learning Human-To-Robot Handovers |
|
Christen, Sammy | ETH Zurich |
Feng, Lan | ETH ZURICH |
Yang, Wei | NVIDIA |
Chao, Yu-Wei | NVIDIA |
Hilliges, Otmar | ETH Zurich |
Song, Jie | ETHZ |
Keywords: Physical Human-Robot Interaction, Modeling and Simulating Humans, Data Sets for Robot Learning
Abstract: Vision-based human-to-robot handover is an important and challenging task in human-robot interaction. Recent work has attempted to train robot policies by interacting with dynamic virtual humans in simulated environments, where the policies can later be transferred to the real world. However, a major bottleneck is the reliance on human motion capture data, which is expensive to acquire and difficult to scale to arbitrary objects and human grasping motions. In this paper, we introduce a framework that can generate plausible human grasping motions suitable for training the robot. To achieve this, we propose a hand-object synthesis method that is designed to generate handover-friendly motions similar to humans. This allows us to generate synthetic training and testing data with 100x more objects than previous work. In our experiments, we show that our method trained purely with synthetic data is competitive with state-of-the-art methods that rely on real human motion data both in simulation and on a real system. In addition, we can perform evaluations on a larger scale compared to prior work. With our newly introduced test set, we show that our model can better scale to a large variety of unseen objects and human motions compared to the baselines.
|
|
13:30-15:00, Paper TuBT13-AX.3 | Add to My Program |
Proactive Robot Control for Collaborative Manipulation Using Human Intent |
|
Rysbek, Zhanibek | University of Illinois at Chicago |
Li, Siyu | University of Illinois at Chicago |
Mehri Shervedani, Afagh | University of Illinois Chicago |
Zefran, Milos | University of Illinois at Chicago |
Keywords: Physical Human-Robot Interaction, Intention Recognition, Human-Robot Collaboration
Abstract: Collaborative manipulation task often requires negotiation using explicit or implicit communication. An important example is determining where to move when the goal destination is not uniquely specified, and who should lead the motion. This work is motivated by the ability of humans to communicate the desired destination of motion through back-and-forth force exchanges. Inherent to these exchanges is also the ability to dynamically assign a role to each participant, either taking the initiative or deferring to the partner's lead. In this paper, we propose a hierarchical robot control framework that emulates human behavior in communicating a motion destination to a human collaborator and in responding to their actions. At the top level, the controller consists of a set of finite-state machines corresponding to different levels of commitment of the robot to its desired goal configuration. The control architecture is loosely based on the human strategy observed in the human-human experiments, and the key component is a real-time intent recognizer that helps the robot respond to human actions. We describe the details of the control framework, feature engineering and training process of the intent recognition. The proposed controller was implemented on a UR10e robot (Universal Robots) and evaluated through human studies. The experiments show that the robot correctly recognizes and responds to human input, communicates its intent clearly, and resolves conflict. We report success rates and draw comparisons with human-human experiments to demonstrate the effectiveness of the approach.
|
|
13:30-15:00, Paper TuBT13-AX.4 | Add to My Program |
Human Modeling in Physical Human-Robot Interaction: A Brief Survey |
|
Fang, Cheng | University of Southern Denmark |
Peternel, Luka | Delft University of Technology |
Seth, Ajay | Delft University of Technology |
Sartori, Massimo | University of Twente |
Mombaur, Katja | Karlsruhe Institute of Technology |
Yoshida, Eiichi | Tokyo University of Science |
Keywords: Physical Human-Robot Interaction, Modeling and Simulating Humans, Human-Centered Robotics
Abstract: The advancement and development of human modeling have greatly benefited from principles used in robotics, for instance, multibody dynamics laid the foundations for physics engines of human movement simulation, and the robotics and control theory were used to contextualize human sensorimotor control. There are many common interests and interconnections between the fields of human modeling and robotics. In recent years, as robots have become safer and smarter, they actively participate in our lives and help us in various scenarios. Roboticists need tools and data from human modeling to build next-generation robots that better assist humans. In this survey, we focus on the connections between physical human-robot interaction and human modeling. On one hand, human neuromusculoskeletal and sensorimotor control models provide novel insights into the human response that robots can utilize to improve human performance. On the other hand, robots are becoming instrumental in quantifying the performance of the (neuro)musculoskeletal system. Thus, the combined use of human modeling and robotic methods in physical human-robot interaction can lead to both improved human understanding and functional assistance.
|
|
13:30-15:00, Paper TuBT13-AX.5 | Add to My Program |
Exploring Transformers and Visual Transformers for Force Prediction in Human-Robot Collaborative Transportation Tasks |
|
Dominguez-Vidal, Jose Enrique | Institut De Robòtica I Informàtica Industrial, CSIC-UPC |
Sanfeliu, Alberto | Universitat Politècnica De Cataluyna |
Keywords: Physical Human-Robot Interaction, Intention Recognition, Deep Learning Methods
Abstract: In this paper, we analyze the possibilities offered by Deep Learning State-of-the-Art architectures such as Transformers and Visual Transformers in generating a prediction of the human’s force in a Human-Robot collaborative object transportation task at a middle distance. We outperform our previous predictor by achieving a success rate of 93.8% in testset and 90.9% in real experiments with 21 volunteers predicting in both cases the force that the human will exert during the next 1 s. A modification in the architecture allows us to obtain a second output from the model with a velocity prediction, which allows us to improve the capabilities of our predictor if it is used to estimate the trajectory that the human-robot pair will follow. An ablation test is also performed to verify the relative contribution to performance of each input.
|
|
13:30-15:00, Paper TuBT13-AX.6 | Add to My Program |
Exploring the Effect of Base Compliance on Physical Human-Robot Collaboration |
|
Wang, Ziqi | University of Technology Sydney |
Carmichael, Marc | Centre for Autonomous Systems |
Keywords: Physical Human-Robot Interaction, Human-Robot Collaboration, Human Factors and Human-in-the-Loop
Abstract: Mobile physical human-robot collaboration (pHRC) using collaborative robots (cobots) and mobile robots has attracted much research attention. Many researchers have focused on improving the control performance to comply with human intentions. However, a problem that generally exists with mobile pHRC but often gets neglected is the impact of non-rigid components e.g. deformable tyres, suspension systems and uneven terrain on human interaction experience and task performance. To fullfil this current research gap, we carried out an investigation on the above-mentioned problem by altering a cobot’s base rigidity level (also referred to as base compliance level or BCL) during pHRC experiments. We explored how the task performance is affected by base compliance as well as human operator’s experience and cobot control parameters. Measurements include the human operator’s physical effort, task velocity, and task error. From the experimental results, it is discovered that base compliance has a significant impact on task accuracy as it can easily excite the system if an inadequate control strategy is deployed. Furthermore, through ANOVA, it is discovered that the influence of base compliance can be minimized and system excitation can be avoided by sufficient human operator training and the appropriate selection of cobot’s control parameters.
|
|
13:30-15:00, Paper TuBT13-AX.7 | Add to My Program |
Experimental and Simulation-Based Estimation of Interface Power During Physical Human-Robot Interaction in Hand Exoskeletons |
|
Yousaf, Saad | The University of Texas at Austin |
Mukherjee, Gaurav | University of Washington |
King, Raymond | Oculus VR |
Deshpande, Ashish | The University of Texas |
Keywords: Physical Human-Robot Interaction, Wearable Robotics, Design and Human Factors
Abstract: Even the best wearable robots face challenges with power losses in the system, especially at the physical attachment interface. While some sources for power loss are inherent to the system, such as human soft tissue or musculoskeletal joint damping, other sources such as soft padding materials and bias strap forces can be modulated to optimize interface power transmission. Few methods currently exist for estimating power loss at physical human-robot interfaces, especially for upper-body exoskeletons. This letter presents a novel method to estimate interface power from experimental data in a wearable hand device, along with a simulation model for predicting interaction behavior by incorporating viscoelastic properties at the attachment interface. The experimental method is implemented with the Maestro hand exoskeleton, and repeatability of the interface power estimation is confirmed with pilot human testing. Simulation results are compared with experimental estimation of interface power, showing agreement of trends and validating the use of a simulation model to predict physical human-robot interaction behavior. These findings highlight the advantages of multi-body simulations as a tool to perform modular, inexpensive, and predictive investigations in physical human-robot interaction, without affecting the real-world mechatronic system or hindering the subject’s safety. The proposed tools can optimize the design of wearable robots for seamless integration with the human body.
|
|
13:30-15:00, Paper TuBT13-AX.8 | Add to My Program |
A Personalizable Controller for the Walking Assistive omNi-Directional Exo-Robot (WANDER) |
|
Fortuna, Andrea | Politecnico Di Milano |
Lorenzini, Marta | Istituto Italiano Di Tecnologia |
Leonori, Mattia | Istituto Italiano Di Tecnologia |
Gandarias, Juan M. | University of Malaga |
Balatti, Pietro | Istituto Italiano Di Tecnologia |
Cho, Younggeol | Istituto Italiano Di Tecnologia (IIT) |
De Momi, Elena | Politecnico Di Milano |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Physically Assistive Devices, Optimization and Optimal Control, Physical Human-Robot Interaction
Abstract: Preserving and encouraging mobility in the elderly and adults with chronic conditions is of paramount importance. However, existing walking aids are either inadequate to provide sufficient support to users' stability or too bulky and poorly maneuverable to be used outside hospital environments. In addition, they all lack adaptability to individual requirements. To address these challenges, this paper introduces WANDER, a novel Walking Assistive omNi-Directional Exo-Robot. It consists of an omnidirectional platform and a robust aluminum structure mounted on top of it, which provides partial body weight support. A comfortable and minimally restrictive coupling interface embedded with a force/torque sensor allows to detect users' intentions, which are translated into command velocities by means of a variable admittance controller. An optimization technique based on users' preferences, i.e., Preference-Based Optimization (PBO) guides the choice of the admittance parameters (i.e., virtual mass and damping) to better fit subject-specific needs and characteristics. Experiments with twelve healthy subjects exhibited a significant decrease in energy consumption and jerk when using WANDER with PBO parameters as well as improved user performance and comfort. The great interpersonal variability in the optimized parameters highlights the importance of personalized control settings when walking with an assistive device, aiming to enhance users' comfort and mobility while ensuring reliable physical support.
|
|
TuBT14-AX Oral Session, AX-202 |
Add to My Program |
Prosthetics and Exoskeletons II |
|
|
Chair: Kong, Kyoungchul | Korea Advanced Institute of Science and Technology |
Co-Chair: Masia, Lorenzo | Heidelberg University |
|
13:30-15:00, Paper TuBT14-AX.1 | Add to My Program |
Lightweight and Flexible Prosthetic Wrist with Shape Memory Alloy (SMA)-Based Artificial Muscle and Elliptic Rolling Joint |
|
Hyeon, Kyujin | KAIST |
Chung, Chongyoung | Korea Advanced Institute of Science and Technology (KAIST) |
Ma, Jihyeong | Korea Advanced Institute of Science and Technology |
Kyung, Ki-Uk | Korea Advanced Institute of Science & Technology (KAIST) |
Keywords: Prosthetics and Exoskeletons, Soft Robot Applications, Biomimetics
Abstract: This paper proposes a novel prosthetic wrist that emulates the anatomical structure of the human wrist, specifically the wrist bones and muscles responsible for wrist movements. To achieve a range of motion (ROM) and load-bearing capacity comparable to the human wrist joint, we designed an elliptic rolling joint as an artificial wrist joint, mimicking the two-row structures of carpal bones. The joint offers two degrees of freedom (DOFs) and can support high loads while also providing adequate ROM. In addition, we designed the artificial muscles using the properties of human muscles, such as moment arm and displacement, and implemented them as shape memory alloy (SMA) spring-based actuators. The resulting prosthetic wrist, incorporating the artificial joint and artificial muscles, is lightweight at only 50g and can perform functional ranges of motion, including 53° for flexion, 50° for extension, 40° for radial deviation, and 42° for ulnar deviation. The use of SMA spring actuators confers restoring force and flexibility to the prosthetic wrist, allowing it to withstand external disturbances. Furthermore, the proposed wrist can be utilized as a robotic wrist, affording two additional DOFs, the ability to lift loads more than 20 times its weight, and variable joint stiffness.
|
|
13:30-15:00, Paper TuBT14-AX.2 | Add to My Program |
Ankle Exoskeleton with a Symmetric 3 DoF Structure for Plantarflexion Assistance |
|
Dezman, Miha | Karlsruhe Institute of Technology |
Marquardt, Charlotte Dorothea | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Mechanism Design
Abstract: Ankle exoskeletons can assist the ankle joint and reduce the metabolic cost of walking. However, many existing ankle exoskeletons constrain the natural 3 degrees of freedom (DoF) of the ankle to limit the exoskeleton's weight and mechanical complexity, thereby compromising comfort and kinematic compatibility with the user. This paper presents a novel ankle exoskeleton frame design that allows for 3 DoF ankle motion using a symmetric parallel frame design principle resulting in a strong frame while weighing 1.8 kg. Furthermore, a cable routing method is proposed to actuate the plantarflexion of the ankle. The kinematic compatibility of the proposed exoskeleton frame is evaluated in straight- and curve-walking scenarios with four users. The study demonstrates that the exoskeleton frame adapts to the natural 3 DoF ankle motion and the range of motion (RoM) during walking. The actuation in plantarflexion is evaluated in a stationary torque experiment demonstrating the ability of the frame to transfer large torque loads of up to 57.4 Nm. This work contributes to the design and development of more flexible and adaptable ankle exoskeletons for walking assistance.
|
|
13:30-15:00, Paper TuBT14-AX.3 | Add to My Program |
Design of a Front-Enveloping Powered Exoskeleton Considering Optimal Distribution of Actuating Torques and Center of Mass |
|
Park, Jeongsu | KAIST |
Shi, Kyeongsu | Korea Advanced Institute of Science and Technology |
An, Hyojun | Korea Advanced Institute of Science and Technology (KAIST) |
Lee, Gunhee | Korea Advanced Institute of Science and Technology |
Kim, Seunghwan | Korea Advanced Institute of Science and Technology |
Ko, Chanyoung | Korea Advanced Institute of Science and Technology |
Kim, Taeyeon | Korea Advanced Institute of Science and Technology |
Kim, Hyeongjun | Korea Advanced Institute of Science and Technology |
Kong, Kyoungchul | Korea Advanced Institute of Science and Technology |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Mechanism Design
Abstract: Traditionally, powered exoskeletons have predominantly featured a back-enveloping design due to its simplicity in both implementation and user donning. However, this design results in a backward shift of the center of mass (CoM) in the sagittal plane. This paper identifies the limitations of existing design approaches and determines the optimal anterior-posterior (A/P) CoM position considering factors like actuating power, balance in the neutral posture, and user's hand workspace. Our optimization analysis recommends placing the CoM in front of the user. We address historical constraints on front-enveloping designs and propose solutions. Furthermore, we validate the usability of our designed exoskeleton through testing with a complete paraplegic user.
|
|
13:30-15:00, Paper TuBT14-AX.4 | Add to My Program |
Real-Time Locomotion Transitions Detection: Maximizing Performances with Minimal Resources |
|
Orhan, Zeynep Özge | EPFL |
Prete, Andrea Dal | Politecnico Di Milano |
Bolotnikova, Anastasia | EPFL |
Gandolla, Marta | Politecnico Di Milano |
Ijspeert, Auke | EPFL |
Bouri, Mohamed | EPFL |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Physical Human-Robot Interaction
Abstract: Assistive devices, such as exoskeletons and prostheses, have revolutionized the field of rehabilitation and mobility assistance. Efficiently detecting transitions between different activities, such as walking, stair ascending and descending, and sitting, is crucial for ensuring adaptive control and enhancing user experience. We present an approach for real-time transition detection, aimed at optimizing the processing-time performance. By establishing activity-specific threshold values through trained machine learning models, we effectively distinguish motion patterns and we identify transition moments between locomotion modes. This threshold-based method improves real-time embedded processing time performance by up to 11 times compared to machine learning approaches. The efficacy of the developed finite-state machine is validated using data collected from three different measurement systems. Moreover, experiments with healthy participants were conducted on an active pelvis orthosis to validate the robustness and reliability of our approach. The proposed algorithm achieved high accuracy in detecting transitions between activities. These promising results show the robustness and reliability of the method, reinforcing its potential for integration into practical applications.
|
|
13:30-15:00, Paper TuBT14-AX.5 | Add to My Program |
ExoRecovery: Push Recovery with a Lower-Limb Exoskeleton Based on Stepping Strategy |
|
Orhan, Zeynep Özge | EPFL |
Shafiee, Milad | EPFL |
Juillard, Vincent | EPFL |
Coelho Oliveira, Joel | EPFL |
Ijspeert, Auke | EPFL |
Bouri, Mohamed | EPFL |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Physically Assistive Devices
Abstract: Balance loss is a significant challenge in lower-limb exoskeleton applications, as it can lead to potential falls, thereby impacting user safety and confidence. We introduce a control framework for omnidirectional recovery step planning by online optimization of step duration and position in response to external forces. We map the step duration and position to a human-like foot trajectory, which is then translated into joint trajectories using inverse kinematics. These trajectories are executed via an impedance controller, promoting cooperation between the exoskeleton and the user. Moreover, our framework is based on the concept of the divergent component of motion, also known as the Extrapolated Center of Mass, which has been established as a consistent dynamic for describing human movement. This real-time online optimization framework enhances the adaptability of exoskeleton users under unforeseen forces thereby improving the overall user stability and safety. To validate the effectiveness of our approach, simulations, and experiments were conducted. Our push recovery experiments employing the exoskeleton in zero-torque mode (without assistance) exhibit an alignment with the exoskeleton's recovery assistance mode, that shows the consistency of the control framework with human intention. To the best of our knowledge, this is the first cooperative push recovery framework for the lower-limb human exoskeleton that relies on the simultaneous adaptation of intra-stride parameters in both frontal and sagittal directions. The proposed control scheme has been validated with human subject experiments.
|
|
13:30-15:00, Paper TuBT14-AX.6 | Add to My Program |
Pilot Comparison of Customized and Generalized Hip-Knee-Ankle Exoskeleton Torque Profiles |
|
Bryan, Gwendolyn | IHMC |
Franks, Patrick W. | Skip |
Song, Seungmoon | Northeastern |
Collins, Steven H. | Stanford University |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics
Abstract: Optimized assistance patterns have produced the greatest exoskeleton benefits to energy expenditure of any strategy to date. This strategy may be effective due to the customization of the applied torque profiles to the user as well as the locomotion condition; however, it is currently unclear how sensitive participants are to their unique torque profile. To investigate, we applied previously optimized hip-knee-ankle torque profiles to expert users (N=3; 1.25 m/s; 0 deg incline). The participants walked with the profile optimized to them, the two profiles optimized to the other two participants, and the average of the three torque profiles while we measured their energy expenditure. Relative to walking with the device turned off, on average, participants experienced a 47.5% (range 12%) metabolic reduction when walking with the torque profile optimized to them and a 46% (range 15%) reduction when walking with the other profiles. Interestingly, within-subject performance was more consistent than across subjects (P1: 52% range 5%, P2: 49% range 6%, P3: 39% range 3%) suggesting that, for expert users of some devices, there may be a range of nearly equally effective torque profiles to reduce the metabolic cost of walking. The torque timing was remarkably similar across the four torque profiles while the torque magnitude varied; participants may be much more sensitive to torque timing than torque magnitude, and there may be a set of torque timing parameters that are generally effective.
|
|
13:30-15:00, Paper TuBT14-AX.8 | Add to My Program |
Task-Space Control of a Powered Ankle Prosthesis |
|
Kelly, David | University of Notre Dame |
Posh, Ryan | University of Notre Dame |
Wensing, Patrick M. | University of Notre Dame |
Keywords: Prosthetics and Exoskeletons
Abstract: Powered lower-limb prostheses have shown promise in helping individuals with amputation regain functionality that passive prostheses cannot provide. However, the best method for controlling these devices in coordination with their users is still an open research topic. While powered devices can replicate normative joint kinematics and kinetics, active control also holds the potential to shape system-level characteristics such as the center of mass (CoM) that play an important role in balance. Controlling the prosthesis based on these system-level, or task-space, variables would further represent a new way of coordinating the user and their device. This paper explores the initial implementation of task-space control for a powered ankle prosthesis, characterizing the emergent outcomes of this new coordination strategy. One able-bodied subject walked using a bypass adapter while prosthesis torques were commanded based on reference ground reaction force (GRF) and CoM trajectories. The subject could walk comfortably and continuously at their preferred walking speed, achieving normative ankle torques and joint trajectories despite not tracking explicit joint-level references in stance.
|
|
13:30-15:00, Paper TuBT14-AX.9 | Add to My Program |
Integrating Computer Vision in Exosuits for Adaptive Support and Reduced Muscle Strain in Industrial Environments |
|
Missiroli, Francesco | Heidelberg University |
Mazzoni, Pietro | Politecnico Di Milano |
Lotti, Nicola | Heidelberg University |
Tricomi, Enrica | Heidelberg University |
Braghin, Francesco | Politecnico Di Milano |
Roveda, Loris | SUPSI-IDSIA |
Masia, Lorenzo | Heidelberg University |
Keywords: Embedded Systems for Robotic and Automation, Modeling, Control, and Learning for Soft Robots, Prosthetics and Exoskeletons
Abstract: Exosuits are wearable technologies that improve physical capabilities and mobility providing support during various activities. Although primarily intended for medical rehabilitation, there is growing interest in utilizing exosuits in industrial environments to prevent work-related musculoskeletal disorders (WMSDs) by ensuring continuous joints support. However, achieving synchronization between the exosuit and human motion, as well as effectively controlling interactions with the surroundings, presents ongoing challenges. The integration of computer vision techniques, particularly object recognition algorithms, can greatly assist exosuits in understanding the user's environment and adapting their behaviour accordingly. To address this issue, we have developed a control strategy for a soft exosuit that employs computer vision to collaboratively offer tailored assistance to the elbow, alleviating joint stress during interactions with objects of various natures and weights. We conducted a study to assess the effectiveness of the integrated system, which merges object recognition and gravity compensation within a built-in structure of the robotic exosuit. The findings confirmed that the suggested solution notably minimized muscle strain during dynamic activities, exhibiting a consistent correlation with the mass of the object being lifted, namely reducing by 45% and 54% respectively the Biceps activity while lifting the MW and HW compared to the 32% of the "Dynamic Arm". The int
|
|
TuBT15-AX Oral Session, AX-203 |
Add to My Program |
Multi-Modal Perception for HRI II |
|
|
Chair: Nakadai, Kazuhiro | Tokyo Institute of Technology |
Co-Chair: Kawanishi, Yasutomo | RIKEN |
|
13:30-15:00, Paper TuBT15-AX.1 | Add to My Program |
Vision and Tactile-Based Continuous Multimodal Intention and Attention Recognition for Safer Physical Human-Robot Interaction (I) |
|
Wong, Christopher Yee | McGill University |
Vergez, Lucas | Arts Et Métiers Institute of Technology |
Suleiman, Wael | University of Sherbrooke |
Keywords: Touch in HRI, Multi-Modal Perception for HRI, Intention Recognition
Abstract: Employing skin-like sensors on robots enhances both the safety and usability of collaborative robots by adding the capability to detect human contact. Unfortunately, simple binary tactile sensors alone cannot determine the context of the human contact---whether it is a deliberate interaction or an unintended collision that requires safety manoeuvres. Many published methods classify discrete interactions using more advanced tactile sensors or by analysing joint torques. Instead, we propose to augment the intention recognition capabilities of simple binary tactile sensors by adding a robot-mounted camera for human analysis. Different interaction characteristics, including touch location, human pose, and gaze direction, are used to train a supervised machine learning algorithm to classify whether a touch is intentional or not with an F1-score of 86%. We demonstrate that multimodal intention recognition is significantly more accurate than monomodal analyses. Furthermore, our method continuously monitors interactions that fluidly change between intentional or unintentional. If deemed unintentional, the proposed intention and attention recognition algorithm can activate safety features to prevent unsafe interactions. We also employ a feature reduction technique that reduces the number of inputs to five to achieve a more generalized low-dimensional classifier. This simplification both reduces the amount of training data required and improves real-world classification accuracy.
|
|
13:30-15:00, Paper TuBT15-AX.2 | Add to My Program |
Towards Unified Interactive Visual Grounding in the Wild |
|
Xu, Jie | Xi'an Jiaotong University |
Zhang, Hanbo | Bytedance Research |
Si, Qingyi | Chinese Academy of Sciences |
Li, Yifeng | ByteDance |
Lan, Xuguang | Xi'an Jiaotong University |
Kong, Tao | ByteDance |
Keywords: Natural Dialog for HRI
Abstract: Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the user’s input by active information gathering. Previous approaches often rely on predefined templates to ask disambiguation questions, resulting in performance reduction in realistic interactive scenarios. In this paper, we propose TiO, an end-to-end system for interactive visual grounding in human-robot interaction. Benefiting from a unified formulation of visual dialog and grounding, our method can be trained on a joint of extensive public data, and show superior generality to diversified and challenging open-world scenarios. In the experiments, we validate TiO on GuessWhat?! and InViG benchmarks, setting new state-of-the-art performance by a clear margin. Moreover, we conduct HRI experiments on the carefully selected 150 challenging scenes as well as real-robot platforms. Results show that our method demonstrates superior generality to diversified visual and language inputs with a high success rate. Codes and demos are available on https://jxu124.github.io/TiO/.
|
|
13:30-15:00, Paper TuBT15-AX.3 | Add to My Program |
Think, Act and Ask: Open-World Interactive Personalized Robot Navigation |
|
Dai, Yinpei | University of Michigan |
Peng, Run | University of Michigan, Ann Arbor |
Li, Sikai | University of Michigan |
Chai, Joyce | University of Michigan |
Keywords: Natural Dialog for HRI
Abstract: Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. The existing works of ZSON mainly focus on following individual instructions to find generic object classes, neglecting the utilization of natural language interaction and the complexities of identifying user-specific objects. To address these limitations, we introduce Zero-shot Interactive Personalized Object Navigation (ZIPON), where robots need to navigate to personalized goal objects while engaging in conversations with users. To solve ZIPON, we propose a new framework termed Open-woRld Interactive persOnalized Navigation (ORION), which uses Large Language Models (LLMs) to make sequential decisions to manipulate different modules for perception, navigation and communication. Experimental results show that the performance of interactive agents that can leverage user feedback exhibits significant improvement. However, obtaining a good balance between task completion and the efficiency of navigation and interaction remains challenging for all methods. We further provide more findings on the impact of diverse user feedback forms on the agents’ performance.
|
|
13:30-15:00, Paper TuBT15-AX.4 | Add to My Program |
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping |
|
Kang, Gi-Cheon | Seoul National University |
Kim, Junghyun | Seoul National University |
Kim, Jaein | Seoul National University |
Zhang, Byoung-Tak | Seoul National University |
Keywords: Natural Dialog for HRI, Multi-Modal Perception for HRI, Deep Learning Methods
Abstract: Interactive Object Grasping (IOG) is the task of identifying and grasping the desired object via human-robot natural language interaction. Current IOG systems assume that a human user initially specifies the target object's category (e.g., bottle). Inspired by pragmatics, where humans often convey their intentions by relying on context to achieve goals, we introduce a new IOG task, Pragmatic-IOG, and the corresponding dataset, Intention-oriented Multi-modal Dialogue (IM-Dial). In our proposed task scenario, an intention-oriented utterance (e.g., "I am thirsty") is initially given to the robot. The robot should then identify the target object by interacting with a human user. Based on the task setup, we propose a new robotic system that can interpret the user's intention and pick up the target object, Pragmatic Object Grasping (PROGrasp). PROGrasp performs Pragmatic-IOG by incorporating modules for visual grounding, question asking, object grasping, and most importantly, answer interpretation for pragmatic inference. Experimental results show that PROGrasp is effective in offline (i.e., target object discovery) and online (i.e., IOG with a physical robot arm) settings. Code and data are available at https://github.com/gicheonkang/prograsp.
|
|
13:30-15:00, Paper TuBT15-AX.5 | Add to My Program |
Enhancing Tactile Sensing in Robotics: Dual-Modal Force and Shape Perception with EIT-Based Sensors and MM-CNN |
|
Chen, Haofeng | University of Science and Technology of China |
Yang, Xuanxuan | Chinese Academy of Sciences |
Ma, Gang | University of Science and Technology of China |
Wang, Yucheng | Hefei Institutes of Physical Science, Chinese Academy of Science |
Wang, Xiaojie | Chinese Academy of Sciences |
Keywords: Multi-Modal Perception for HRI, Wearable Robotics, Soft Sensors and Actuators
Abstract: Electrical Impedance Tomography (EIT)-based tactile sensors offer durability, scalability, and cost-effective manufacturing. However, simultaneously reconstructing force and shape from boundary measurements remains challenging due to EIT’s inherent location dependencies and image artifacts. This study presents a model-driven multimodal convolutional neural network (MM-CNN) for joint EIT-based force and shape sensing. The hybrid approach combines physics-inspired voltage preprocessing with an attention-based network to overcome EIT’s limitations. The preprocessing network applies a linearized one-step inverse solution with Tikhonov regularization to convert raw boundary voltage into a noise-reduced 2D image. The image reconstruction network uses an attention mechanism to focus on salient features, addressing location dependency issues. Quantitative metrics show that MM-CNN outperforms traditional EIT algorithms like NOSER and TV, reducing location dependency and improving shape discrimination. MMCNN enables unified force and shape modalities, validated through real-contact experiments, enhancing EIT tactile systems for human-robot interaction by incorporating physical knowledge with deep learning.
|
|
13:30-15:00, Paper TuBT15-AX.6 | Add to My Program |
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents |
|
Park, Jeongeun | Korea University |
Lim, Seungwon | Yonsei University |
Lee, Joonhyung | Korea University |
Park, Sangbeom | Korea University |
Chang, Minsuk | Google |
Yu, Youngjae | Yonsei University |
Choi, Sungjoon | Korea University |
Keywords: Natural Dialog for HRI, AI-Enabled Robotics
Abstract: In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware context in a zero-shot manner. For ambiguous commands, we disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. We present a dataset for robotic situational awareness, consisting of pairs of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset and pick-and-place tabletop simulation environment. Finally, we demonstrate the proposed approach in real-world human-robot interaction experiments.
|
|
13:30-15:00, Paper TuBT15-AX.7 | Add to My Program |
Assisting Group Discussions Using Desktop Robot Haru |
|
Tang, Fei | Ocean University of China |
Zheng, Chuanxiong | Ocean University of China |
Yu, Hongqi | Ocean University of China |
Zhang, Lei | Ocean University of China |
Nichols, Eric | Honda Research Institute Japan |
Gomez, Randy | Honda Research Institute Japan Co., Ltd |
Li, Guangliang | Ocean University of China |
Keywords: Robot Companions, Human-Centered Automation, Human-Centered Robotics
Abstract: Socially assistive robots are potentially to be integrated with human daily lives in the near future, and expected to be able to improve group dynamics when interacting with groups of people in social settings. In this paper, we developed a system with desktop robot Haru to assist group discussions. The system consists of three modules: a dialogue assistance module, which facilitates Haru to speak to users and answer questions in a free way; a dialogue balance module to encourage participation of users in the discussion with verbal behaviors; an autonomous gazing behavior module trained via deep reinforcement learning in simulation and deployed on physical Haru in reality, which can show politeness during group discussion, e.g., gazing to the speaking member, looking to the middle when both members are talking or silent, looking at the least spoken person when encouraging her. Results of user study with 40 subjects show the significant effectiveness of our system in assisting group discussion.
|
|
13:30-15:00, Paper TuBT15-AX.8 | Add to My Program |
Assessment and Benchmarking of XoNLI: A Natural Language Processing Interface for Industrial Exoskeletons |
|
Moreno Franco, Olmo Alonso | Istituto Italiano Di Tecnologia |
Parameswari Neelakandan, Raajshekhar | Istituto Italiano Di Technologia |
Di Natali, Christian | Istituto Italiano Di Tecnologia (IIT) |
Caldwell, Darwin G. | Istituto Italiano Di Tecnologia |
Ortiz, Jesus | Istituto Italiano Di Tecnologia (IIT) |
Keywords: Natural Dialog for HRI, Product Design, Development and Prototyping, Wearable Robotics
Abstract: Industrial exoskeletons are a potential solution for reducing work-related musculoskeletal disorders during carrying or lifting tasks. Having sensors, electrical/pneumatic actuators, and control systems, active exoskeletons present a more versatile control system because it is possible to select different assistive strategies based on the performed task. From this perspective, human-machine interaction is required to safely open basic exoskeleton domains to the user and provide an adaptable setup system. This article presents the assessment and benchmarking of the novel XoLab Natural Language Interface, a voice user interface for interaction and configuration of industrial active exoskeletons. The evaluation of the novel interface was performed by 17 participants who completed the setup and operational activities while wearing the XoTrunk exoskeleton. The benchmark consisted of a comparison of the presented device with previous adaptable interfaces for the exoskeleton: the user command interface and the monitor system interface. The results showed that although the novel interface presented a considerable lag in the time response, it was more attractive than the standard one. However, the user command interface obtained favourable results over the standard interface in terms of perspicuity and efficiency.
|
|
13:30-15:00, Paper TuBT15-AX.9 | Add to My Program |
Advancing Virtual Reality Interaction: A Ring-Shaped Controller and Pose Tracking |
|
Zhang, Zhuqing | Zhejiang University |
Li, Dongxuan | Zhejiang University |
Ma, Jiayao | Peking University |
He, Yijia | Institute of Automation, Chinese Academy of Sciences |
Ji, Pan | Tencent |
Xiong, Rong | Zhejiang University |
Li, Hongdong | Australian National University and NICTA |
Wang, Yue | Zhejiang University |
Keywords: Virtual Reality and Interfaces, Sensor Fusion
Abstract: Ensuring robust tracking of controllers' movement is critical for human-robot interaction in virtual reality (VR) scenarios. This paper proposes a robust tracking algorithm based on a novel wearable ring-shaped controller equipped with an inertial measurement unit (IMU) and a light-emitting diode (LED). This novel controller design allows users to free up their hands for more immersive experiences. To track the controller's motion accurately and robustly, we resort to various forms of visual measurements, including 6 DoF and 5 DoF pose measurements from hand gesture detection, as well as 3 DoF position measurement and 2 DoF image measurement derived from the LED. We theoretically analyze the performances of these observation models and propose an optimal observation model combination scheme. Moreover, the necessity and rationale of online estimating system gravity are illustrated. The effectiveness of our tracking method is validated through extensive experiments.
|
|
TuBT16-AX Oral Session, AX-204 |
Add to My Program |
Force and Tactile Sensing II |
|
|
Chair: Roberge, Jean-Philippe | École De Technologie Supérieure |
Co-Chair: Sintov, Avishai | Tel-Aviv University |
|
13:30-15:00, Paper TuBT16-AX.1 | Add to My Program |
Tactile Embeddings for Multi-Task Learning |
|
Luo, Yiyue | Massachusetts Institute of Technology |
Wonsick, Murphy | Boston Dynamics AI Institute |
Hodgins, Jessica | Carnegie Mellon University |
Okorn, Brian | Boston Dynamics AI Institute |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation
Abstract: Tactile sensing plays a pivotal role in human perception and manipulation tasks, allowing us to intuitively understand task dynamics and adapt our actions in real time. Transferring such tactile intelligence to robotic systems would help intelligent agents understand task constraints and accurately interpret the dynamics of both the objects they are interacting with and their own operations. While significant progress has been made in imbuing robots with this tactile intelligence, challenges persist in effectively utilizing tactile information due to the diversity of tactile sensor form factors, manipulation tasks, and learning objectives involved. To address this challenge, we present a unified tactile embedding space capable of predicting a variety of task-centric qualities over multiple manipulation tasks. We collect tactile data from human demonstrations across various tasks and leverage this data to construct a shared latent space for task stage classification, object dynamics estimation, and tactile dynamics prediction. Through experiments and ablation studies, we demonstrate the effectiveness of our shared tactile latent space for more accurate and adaptable tactile networks, showing an improvement of up to 84% over the single-task training.
|
|
13:30-15:00, Paper TuBT16-AX.2 | Add to My Program |
AllSight: A Low-Cost and High-Resolution round Tactile Sensor with Zero-Shot Learning Capability |
|
Azulay, Osher | Tel Aviv University |
Curtis, Nimrod | Tel-Aviv University |
Sokolovsky, Rotem | Tel-Aviv University |
Levistky, Guy | Tel-Aviv University |
Slomovik, Daniel | Tel-Aviv University |
Lilling, Guy | Tel-Aviv University |
Sintov, Avishai | Tel-Aviv University |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation
Abstract: Tactile sensing is a necessary capability for a robotic hand to perform fine manipulations and interact with the environment. Optical sensors are a promising solution for high-resolution contact estimation. Nevertheless, they are usually not easy to fabricate and require individual calibration in order to acquire sufficient accuracy. In this letter, we propose AllSight, an optical tactile sensor with a round 3D structure designed for robotic in-hand manipulation tasks. AllSight is mostly 3D printed including a novel and simplified fabrication processes. This makes it low-cost, modular, durable and in the size of a human thumb while with a large contact surface. We show the ability of AllSight to learn and estimate a full contact state, i.e., contact position, forces and torsion. With that, an experimental benchmark between various configurations of illumination and contact elastomers are provided. Furthermore, the robust design of AllSight provides it with a unique zero-shot capability such that a practitioner can fabricate the open-source design and have a ready-to-use state estimation model. A set of experiments demonstrates the accurate state estimation performance of AllSight.
|
|
13:30-15:00, Paper TuBT16-AX.3 | Add to My Program |
9DTact: A Compact Vision-Based Tactile Sensor for Accurate 3D Shape Reconstruction and Generalizable 6D Force Estimation |
|
Lin, Changyi | Carnegie Mellon University |
Zhang, Han | Tsinghua University, Shanghai Qi Zhi Institute |
Xu, Jikai | Huazhong University of Science and Technology, Shanghai Qi Zhi In |
Wu, Lei | Huazhong University of Science and Technology |
Xu, Huazhe | Tsinghua University |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation
Abstract: The advancements in vision-based tactile sensors have boosted the aptitude of robots to perform contact-rich manipulation, particularly when precise positioning and contact state of the manipulated objects are crucial for successful execution. In this work, we present 9DTact, a straightforward yet versatile tactile sensor that offers 3D shape reconstruction and 6D force estimation capabilities. Conceptually, 9DTact is designed to be highly compact, robust, and adaptable to various robotic platforms. Moreover, it is low-cost and easy-to-fabricate, requiring minimal assembly skills. Functionally, 9DTact builds upon the optical principles of DTact and is optimized to achieve 3D shape reconstruction with enhanced accuracy and efficiency. Remarkably, we leverage the optical and deformable properties of the translucent gel so that 9DTact can perform 6D force estimation without the participation of auxiliary markers or patterns on the gel surface. More specifically, we collect a dataset consisting of approximately 100,000 image-force pairs from 175 complex objects and train a neural network to regress the 6D force, which can generalize to unseen objects. To promote the development and applications of vision-based tactile sensors, we open-source both the hardware and software of 9DTact, along with a comprehensive video tutorial, all of which are available at https://linchangyi1.github.io/9DTact.
|
|
13:30-15:00, Paper TuBT16-AX.4 | Add to My Program |
GelFinger: A Novel Visual-Tactile Sensor with Multi-Angle Tactile Image Stitching |
|
Lin, Zhonglin | Fuzhou University |
Zhuang, JiaQuan | FUZHOU UNIVERSITY |
Li, Yufeng | Fuzhou University |
Wu, Xianyu | Fuzhou University |
Luo, Shan | King's College London |
Fernandes Gomes, Daniel | Kings College London |
Huang, Feng | Fuzhou University |
Yang, Zheng | Fuzhou University |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Object Detection, Segmentation and Categorization
Abstract: Visual-tactile sensors that use a camera to capture the deformation of a soft gel layer have become popular in recent years. However, these sensors have a limited receptive field, which can hinder their ability to perceive tactile information effectively. In this paper, we propose a novel visual-tactile sensor named GelFinger that closely resembles the human finger and is well-suited for detecting various complex surfaces. The GelFinger sensor is equipped with an embedded miniature motor that allows for the adaptation of the camera pose and the scanning of a large contact area. During the detection process, the camera rotates to multiple angles to capture the tactile image of the contact area. To stitch together the tactile images obtained at different camera poses, we use an As-Projective-As-Possible image stitching algorithm to form a global view of the contact. We demonstrate the effectiveness of the GelFinger sensor in assessing large surfaces by using it to reconstruct curved crack outlines. Comparative experimental results show that the proposed sensor can effectively detect cracks and has the potential to assist humans in detecting defects on curved surfaces of infrastructure such as pipelines.
|
|
13:30-15:00, Paper TuBT16-AX.5 | Add to My Program |
StereoTac: A Novel Visuotactile Sensor That Combines Tactile Sensing with 3D Vision |
|
Roberge, Etienne | École De Technologie Supérieure |
Fornes, Guillaume | ENSEIRB-MATMECA, Bordeaux INP |
Roberge, Jean-Philippe | École De Technologie Supérieure |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, RGB-D Perception
Abstract: Combining 3D vision with tactile sensing could unlock a greater level of dexterity for robots and improve several manipulation tasks. However, obtaining a close-up 3D view of the location where manipulation contacts occur can be challenging, particularly in confined spaces, cluttered environments, or without installing more sensors on the end effector. In this context, this paper presents StereoTac, a novel vision-based sensor that combines tactile sensing with 3D vision. The proposed sensor relies on stereoscopic vision to capture a 3D representation of the environment before contact and uses photometric stereo to reconstruct the tactile imprint generated by an object during contact. To this end, two cameras were integrated in a single sensor, whose interface is made of a transparent elastomer coated with a thin layer of paint with a level of transparency that can be adjusted by varying the sensor’s internal lighting conditions. We describe the sensor’s fabrication and evaluate its performance for both tactile perception and 3D vision. Our results show that the proposed sensor can reconstruct a 3D view of a scene just before grasping and perceive the tactile imprint after grasping, allowing for monitoring of the contact during manipulation.
|
|
13:30-15:00, Paper TuBT16-AX.6 | Add to My Program |
An Investigation of Multi-Feature Extraction and Super-Resolution with Fast Microphone Arrays |
|
Chang, Eric T. | Columbia University |
Wang, Runsheng | Columbia University |
Ballentine, Peter | Columbia University |
Xu, Jingxi | Columbia University |
Smith, Trey | NASA Ames Research Center |
Coltin, Brian | Carnegie Mellon University |
Kymissis, Ioannis | Columbia University |
Ciocarlie, Matei | Columbia University |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Soft Sensors and Actuators
Abstract: In this work, we use MEMS microphones as vibration sensors to simultaneously classify texture and estimate contact position and velocity. Vibration sensors are an important facet of both human and robotic tactile sensing, providing fast detection of contact and onset of slip. Microphones are an attractive option for implementing vibration sensing as they offer a fast response and can be sampled quickly, are affordable, and occupy a very small footprint. Our prototype sensor uses only a sparse array (8-9 mm spacing) of distributed MEMS microphones (<1, 3.76 x 2.95 x 1.10 mm) embedded under an elastomer. We use transformer-based architectures for data analysis, taking advantage of the microphones' high sampling rate to run our models on time-series data as opposed to individual snapshots. This approach allows us to obtain 77.3% average accuracy on 4-class texture classification (84.2% when excluding the slowest drag velocity), 1.8 mm mean error on contact localization, and 5.6 mm/s mean error on contact velocity. We show that the learned texture and localization models are robust to varying velocity and generalize to unseen velocities. We also report that our sensor provides fast contact detection, an important advantage of fast transducers. This investigation illustrates the capabilities one can achieve with a MEMS microphone array alone, leaving valuable sensor real estate available for integration with complementary tactile sensing modalities.
|
|
13:30-15:00, Paper TuBT16-AX.7 | Add to My Program |
Model-Based Compliance Discrimination Via Soft Tactile Optical Sensing and Optical Flow Computation: A Biomimetic Approach |
|
Pagnanelli, Giulia | University of Pisa |
Ciotti, Simone | University of Pisa |
Lepora, Nathan | University of Bristol |
Bicchi, Antonio | Fondazione Istituto Italiano Di Tecnologia |
Bianchi, Matteo | University of Pisa |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Soft Sensors and Actuators
Abstract: Soft tactile optical sensors have opened up new possibilities for endowing artificial robotic hands with advanced touch-related properties; however, their use for compliance discrimination has been poorly investigated and mainly relies on data-driven methods. Discrimination of object compliance is crucial for enabling accurate and purposeful object manipulation. Humans retrieve this information primarily using the contact area spread rate (CASR) over their fingertips. CASR can be defined as the integral of tactile flow, which describes the movement of iso-strain surfaces within the fingerpad. This work presents the first attempt to discriminate compliance through soft optical tactile sensing based on a computational model of human tactile perception that relies on CASR and tactile flow concepts. To this aim, we used a soft optical biomimetic sensor that transduces surface deformation via movements of marked pins, similar to the function of intermediate ridges in the human fingertip. We acquired images of markers' movements during the interaction with silicone specimens with different compliance at different indenting forces. Then, we computed the optical flow as a tactile flow approximation and its divergence to estimate the CASR. Our model-based approach can accurately discriminate the compliance levels of the specimens, both when the sensor probed the surface perpendicularly and with different inclinations. Finally, we used the relation between specimen compliance and the
|
|
13:30-15:00, Paper TuBT16-AX.8 | Add to My Program |
Bi-Touch: Bimanual Tactile Manipulation with Sim-To-Real Deep Reinforcement Learning |
|
Lin, Yijiong | University of Bristol |
Church, Alex | Cambrian |
Yang, Max | University of Bristol |
Li, Haoran | University of Bristol |
Lloyd, John | University of Bristol |
Zhang, Dandan | Imperial College London |
Lepora, Nathan | University of Bristol |
Keywords: Force and Tactile Sensing, Reinforcement Learning
Abstract: Bimanual manipulation with tactile feedback will be key to human-level robot dexterity. However, this topic is less explored than single-arm settings, partly due to the availability of suitable hardware along with the complexity of designing effective controllers for tasks with relatively large state-action spaces. Here we introduce a dual-arm tactile robotic system (Bi-Touch) based on the Tactile Gym 2.0 setup that integrates two affordable industrial-level robot arms with low-cost high-resolution tactile sensors (TacTips). We present a suite of bimanual manipulation tasks tailored towards tactile feedback: bi-pushing, bi-reorienting and bi-gathering. To learn effective policies for challenging tasks in simulation, we contribute several efforts, such as introducing appropriate reward functions and proposing a novel goal-update mechanism with deep reinforcement learning. We also apply these policies to real-world settings with a zero-shot sim-to-real approach. Our analysis highlights and addresses some challenges met during the sim-to-real application, e.g. the learned policy tended to squeeze an object in the bi-reorienting task due to the sim-to-real gap. Finally, we demonstrate the generalizability and robustness of this system by experimenting with different unseen objects with applied perturbations in the real world. These tasks and our system can also serve as a benchmark for bimanual tactile manipulation. Code will be openly released at https://github.com/ac-93/tact
|
|
13:30-15:00, Paper TuBT16-AX.9 | Add to My Program |
AcTExplore: Active Tactile Exploration on Unknown Objects |
|
Shahidzadeh, Amir Hossein | University of Maryland |
Yoo, Seong Jong | University of Maryland |
Mantripragada, Pavan | University of Maryland, College Park |
Singh, Chahat Deep | University of Maryland, College Park |
Fermuller, Cornelia | University of Maryland |
Aloimonos, Yiannis | University of Maryland |
Keywords: Force and Tactile Sensing, Reinforcement Learning, Perception for Grasping and Manipulation
Abstract: Tactile exploration plays a crucial role in under- standing object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes.
|
|
TuBT17-AX Oral Session, AX-205 |
Add to My Program |
Legged Robots II |
|
|
Chair: Zhou, Chengxu | University College London |
Co-Chair: Ijspeert, Auke | EPFL |
|
13:30-15:00, Paper TuBT17-AX.1 | Add to My Program |
Terrestrial Locomotion of PogoX: From Hardware Design to Energy Shaping and Step-To-Step Dynamics Based Control |
|
Wang, Yi | Columbia University |
Kang, Jiarong | University of Wisconsin Madison |
Chen, Zhiheng | University of Wisconsin-Madison |
Xiong, Xiaobin | University of Wisconsin Madison |
Keywords: Legged Robots, Aerial Systems: Mechanics and Control, Underactuated Robots
Abstract: We present a novel controller design on a robotic locomotor that combines an aerial vehicle with a spring-loaded leg. The main motivation is to enable the terrestrial locomotion capability on aerial vehicles so that they can carry heavy loads: heavy enough that flying is no longer possible, e.g., when the thrust-to-weight ratio (TWR) is small. The robot is designed with a pogo-stick leg and a quadrotor, and thus it is named as PogoX. We show that with a simple and lightweight spring-loaded leg, the robot is capable of hopping with TWR <1. The control of hopping is realized via two components: a vertical height control via control Lyapunov function-based energy shaping, and a step-to-step (S2S) dynamics based horizontal velocity control that is inspired by the hopping of the Spring-Loaded Inverted Pendulum (SLIP). The controller is successfully realized on the physical robot, showing dynamic terrestrial locomotion of PogoX which can hop at variable heights and different horizontal velocities with robustness to ground height variations and external pushes.
|
|
13:30-15:00, Paper TuBT17-AX.2 | Add to My Program |
Learning Emergent Gaits with Decentralized Phase Oscillators: On the Role of Observations, Rewards, and Feedback |
|
Zhang, Jenny | Massachusetts Institute of Technology |
Heim, Steve | Massachusetts Institute of Technology |
Jeon, Se Hwan | Massachusetts Institute of Technology |
Kim, Sangbae | Massachusetts Institute of Technology |
Keywords: Legged Robots, Bioinspired Robot Learning, Natural Machine Motion
Abstract: We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.
|
|
13:30-15:00, Paper TuBT17-AX.3 | Add to My Program |
Bio-Inspired Gait Transitions for Quadruped Locomotion |
|
Humphreys, Joseph Elliot | University of Leeds |
Li, Jun | Harbin Institute of Technology |
Wan, Yuhui | University of Leeds |
Gao, Haibo | Harbin Institute of Technology |
Zhou, Chengxu | University College London |
Keywords: Legged Robots, Biologically-Inspired Robots, Humanoid and Bipedal Locomotion
Abstract: Developing gaits inspired by animal locomotion for quadruped robots has become a prevalent approach in achieving dynamic locomotion. Analogous to animal gaits, they exhibit optimal effectiveness at specific velocities, necessitating the transitions between them for enhanced locomotion proficiency. Despite the significance of these transitions, methods for achieving them have received comparatively limited attention. For successful gait transitions, stability and suitable velocities are essential to maintain efficiency. In this study, a bio-inspired gait transition method has been devised, capitalising on the Froude number—a parameter characterising the velocity at which different-sized quadrupeds alter their gaits. By formulating a set of governing equations contingent on the Froude number, stable gait transitions can be generated. A series of simulations were conducted to determine the optimal Froude number ranges for various gaits and to validate the generality of this method by applying it to four distinct quadrupeds. To assess the performance of the gait transitions, a series of hardware experiments were executed, demonstrating a variety of gait transitions, comparing the proposed transition method with existing alternatives and testing its generality.
|
|
13:30-15:00, Paper TuBT17-AX.4 | Add to My Program |
Optimizing Dynamic Balance in a Rat Robot Via the Lateral Flexion of a Soft Actuated Spine |
|
Huang, Yuhong | Technische Universität München |
Bing, Zhenshan | Technical University of Munich |
Zhang, Zitao | Sun Yat-Sen University |
Zhuang, Genghang | Technical University of Munich |
Huang, Kai | Sun Yat-Sen University |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Legged Robots, Body Balancing, Motion Control
Abstract: Balancing oneself using the spine is a physiological alignment of the body posture in the most efficient manner by the muscular forces for mammals. For this reason, we can see many disabled quadruped animals can still stand or walk even with three limbs. This paper investigates the optimization of dynamic balance during trot gait based on the spatial relationship between the center of mass (CoM) and support area influenced by spinal flexion. During trotting, the robot balance is significantly influenced by the distance of the CoM to the support area formed by diagonal footholds. In this context, lateral spinal flexion, which is able to modify the position of footholds, holds promise for optimizing balance during trotting. This paper explores this phenomenon using a rat robot equipped with a soft actuated spine. Based on the lateral flexion of the spine, we establish a kinematic model to quantify the impact of spinal flexion on robot balance during trot gait. Subsequently, we develop an optimized controller for spinal flexion, designed to enhance balance without altering the leg locomotion. The effectiveness of our proposed controller is evaluated through extensive simulations and physical experiments conducted on a rat robot. Compared to both a non-spine based trot gait controller and a trot gait controller with lateral spinal flexion, our proposed optimized controller effectively improves the dynamic balance of the robot and retains the desired locomotion during trotting.
|
|
13:30-15:00, Paper TuBT17-AX.5 | Add to My Program |
SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos |
|
Zhang, John | Carnegie Mellon University |
Yang, Shuo | Carnegie Mellon University |
Yang, Gengshan | Meta |
Bishop, Arun | Carnegie Mellon University |
Gurumurthy, Swaminathan | Carnegie Mellon University |
Ramanan, Deva | Carnegie Mellon University |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Legged Robots, Computer Vision for Automation
Abstract: We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured “in- the-wild” video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as a contact sequence that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert anima- tors, collaborative demonstrations, and/or expensive motion- capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain videos, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real- world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation).
|
|
13:30-15:00, Paper TuBT17-AX.6 | Add to My Program |
Introducing the Carpal-Claw: A Mechanism to Enhance High-Obstacle Negotiation for Quadruped Robots |
|
Barasuol, Victor | Istituto Italiano Di Tecnologia |
Emre, Sinan | Istituto Italiano Di Tecnologia |
Suzano Medeiros, Vivian | University of São Paulo |
Bratta, Angelo | Istituto Italiano Di Tecnologia |
Semini, Claudio | Istituto Italiano Di Tecnologia |
Keywords: Legged Robots, Mechanism Design, Robot Safety
Abstract: The capability of a quadruped robot to negotiate obstacles is tightly connected to its leg workspace and joint torque limits. When facing terrain where the height of obstacles is close to the leg length, the locomotion robustness and safety are reduced since more dynamic motions are required to traverse it. In this paper, we introduce a new mechanism called the Carpal-Claw, which enables quadruped robots to negotiate higher obstacles and adds safety to the locomotion by allowing the robot to negotiate obstacles under static and quasi-static locomotion and regular joint torque demands. The design of the mechanism is detailed, as well as the methodology to exploit the mechanism in the locomotion control framework. The Carpal-Claw functionality is validated through various experiments on a very high obstacle and stairs-like terrains using an Aliengo robot. We demonstrate how Aliengo can safely descend a step height of 40cm, which is 80% of its leg length. To the best knowledge of the authors, this is the first time a mechanism like the C-Claw is proposed for improving quadruped robot locomotion over high obstacles.
|
|
13:30-15:00, Paper TuBT17-AX.7 | Add to My Program |
SpaceHopper: A Small-Scale Legged Robot for Exploring Low-Gravity Celestial Bodies |
|
Spiridonov, Alexander | ETH Zurich |
Buehler, Fabio | ETH Zurich |
Berclaz, Moriz | ETH Zurich |
Schelbert, Valerio Antonio | ETH Zurich |
Geurts, Jorit | ETH Zürich |
Krasnova, Elena | ETH Zürich |
Steinke, Emma | ETH Zürich |
Toma, Jonas | ZHAW School of Engineering |
Wüthrich, Joschua | ETH Zürich |
Polat, Recep | ETH Zürich |
Zimmermann, Wim | ZHAW |
Arm, Philip | ETH Zurich |
Rudin, Nikita | ETH Zurich, NVIDIA |
Kolvenbach, Hendrik | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Legged Robots, Space Robotics and Automation, Engineering for Robotic Systems
Abstract: We present SpaceHopper, a three-legged, small-scale robot designed for future mobile exploration of asteroids and moons. The robot weighs 5.2kg and has a body size of 245mm while using space-qualifiable components. Furthermore, SpaceHopper's design and controls make it well-adapted for investigating dynamic locomotion modes with extended flight-phases. Instead of gyroscopes or fly-wheels, the system uses its three legs to reorient the body during flight in preparation for landing. We control the leg motion for reorientation using Deep Reinforcement Learning policies. In a simulation of Ceres' gravity (0.029g), the robot can reliably jump to commanded positions up to 6m away. Our real-world experiments show that SpaceHopper can successfully reorient to a safe landing orientation within 9.7 degree inside a rotational gimbal and jump in a counterweight setup in Earth's gravity. Overall, we consider SpaceHopper an important step towards controlled jumping locomotion in low-gravity environments.
|
|
13:30-15:00, Paper TuBT17-AX.8 | Add to My Program |
ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots |
|
Shafiee, Milad | EPFL |
Bellegarda, Guillaume | EPFL |
Ijspeert, Auke | EPFL |
Keywords: Legged Robots, Biomimetics
Abstract: Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. The robot differences encompass: a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 18 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot’s nominal mass.
|
|
TuBT18-AX Oral Session, AX-206 |
Add to My Program |
Motion Control II |
|
|
Chair: Kim, Min Jun | KAIST |
Co-Chair: Jin, Long | Lanzhou University |
|
13:30-15:00, Paper TuBT18-AX.1 | Add to My Program |
Safety-Critical Coordination of Legged Robots Via Layered Controllers and Forward Reachable Set Based Control Barrier Functions |
|
Kim, Jeeseop | Caltech |
Lee, Jaemin | California Institute of Technology |
Ames, Aaron | Caltech |
Keywords: Motion Control, Legged Robots, Multi-Robot Systems
Abstract: This paper presents a safety-critical approach to the coordination of robots in dynamic environments. To this end, we leverage control barrier functions (CBFs) with the forward reachable set to guarantee the safe coordination of the robots while preserving a desired trajectory via a layered controller. The top-level planner generates a safety-ensured trajectory for each agent, accounting for the dynamic constraints in the environment. This planner leverages high-order CBFs based on the forward reachable set to ensure safety-critical coordination control, i.e., guarantee the safe coordination of the robots during locomotion. The middle-level trajectory planner employs single rigid body (SRB) dynamics to generate optimal ground reaction forces (GRFs) to track the safety-ensured trajectories from the top-level planner. The whole-body motions to adhere to the optimal GRFs while ensuring the friction cone condition at the end of each stance leg are generated from the low-level controller. The effectiveness of the approach is demonstrated through simulation and hardware experiments.
|
|
13:30-15:00, Paper TuBT18-AX.2 | Add to My Program |
Safety-Critical Control of Quadrupedal Robots with Rolling Arms for Autonomous Inspection of Complex Environments |
|
Lee, Jaemin | California Institute of Technology |
Kim, Jeeseop | Caltech |
Ubellacker, Wyatt | California Institute of Technology |
Molnar, Tamas G. | Wichita State University |
Ames, Aaron | Caltech |
Keywords: Motion Control, Legged Robots, Robotics in Hazardous Fields
Abstract: This paper presents a safety-critical control framework tailored for quadruped robots equipped with a roller arm, particularly when performing locomotive tasks such as autonomous robotic inspection in complex, multi-tiered environments. In this study, we consider the problem of operating a quadrupedal robot in distillation columns, locomoting on column trays and transitioning between these trays with a roller arm. To address this problem, our framework encompasses the following key elements: 1) Trajectory generation for seamless transitions between columns, 2) Foothold re-planning in regions deemed unsafe, 3) Safety-critical control incorporating control barrier functions, 4) Gait transitions based on safety levels, and 5) A low-level controller. Our comprehensive framework, comprising these components, enables autonomous and safe locomotion across multiple layers. We incorporate reduced-order and full-body models to ensure safety, integrating safety-critical control and footstep re-planning approaches. We validate the effectiveness of our proposed framework through practical experiments involving a quadruped robot equipped with a roller arm, successfully navigating and transitioning between different levels within the column tray structure.
|
|
13:30-15:00, Paper TuBT18-AX.3 | Add to My Program |
Robust and Remote Center of Cyclic Motion Control for Redundant Robots with Partially Unknown Structure |
|
Jin, Long | Lanzhou University |
Liu, Kun | Lanzhou University |
Liu, Mei | Lanzhou University |
Keywords: Motion Control, Optimization and Optimal Control
Abstract: Remote center of motion (RCM) describes a robot with a rod-like end-effector operating through a hole in the interface separating the internal space from the external space. Considering that the control of RCM may be influenced by perturbations (noises) and that the end-effector is frequently replaced to complete different tasks, the structural information related to the robot manipulator and its rod-like end-effector may contain errors. This paper proposes an acceleration-level remote center of cyclic motion (ARC^{2}M) control scheme, which takes into account the cyclic motion index and the physical limitations of robot manipulators to achieve repetitive motion planning and RCM control at the acceleration level. Additionally, a parameter calculation method is proposed to compute unknown parameters of the end-effector under the influence of noise. Kalman filter and a neural dynamics-based method are employed to address noises effects, and related theoretical analyses are given. To validate the proposed ARC^2M scheme, simulations and physical experiments are carried out. The source code is available at https://github.com/LongJin-lab/ARCM.
|
|
13:30-15:00, Paper TuBT18-AX.4 | Add to My Program |
Safe Risk-Averse Bayesian Optimization for Controller Tuning |
|
König, Christopher | Inspire AG |
Ozols, Miks | ETH Zurich |
Makarova, Anastasiia | ETH Zurich |
Balta, Efe | Inspire AG |
Krause, Andreas | ETH Zurich |
Rupenyan, Alisa | Zurich University of Applied Sciences |
Keywords: Motion Control, Probabilistic Inference, Industrial Robots
Abstract: Controller tuning and parameter optimization are crucial in system design to improve both the controller and underlying system performance. Bayesian optimization has been established as an efficient model-free method for controller tuning and adaptation. Standard methods, however, are not enough for high-precision systems to be robust with respect to unknown input-dependent noise and stable under safety constraints. In this work, we present a novel data-driven approach, RAGoOSe, for safe controller tuning in the presence of heteroscedastic noise, combining safe learning with risk-averse Bayesian optimization. We demonstrate the method for synthetic benchmark and compare its performance to established BO-based tuning methods. We further evaluate RaGoose performance on a real precision-motion system utilized in semiconductor industry applications and compare it to the built-in auto-tuning routine.
|
|
13:30-15:00, Paper TuBT18-AX.5 | Add to My Program |
Phase Synthesis for Spatial Locomotion Control of Retractable Worm Robots |
|
Wang, Zhongcheng | Northwestern Polytechnical University |
Yuan, Shiwei | Northwestern Polytechnical University |
Dou, Manfen | Northwestern Polytechnical University |
Yang, Jianhua | Northwestern Polytechnical University |
Liang, Bin | Tsinghua University |
Keywords: Motion Control, Redundant Robots, Biologically-Inspired Robots
Abstract: Retractable worm robots possess hyper-flexibility, allowing them to work in confined spaces that are difficult for humans. However, the spatial locomotion control of these robots remains challenging due to the robots' large degrees of freedom. To address this challenge, we propose a phase synthesis (PS) scheme for retractable worm robots. The scheme combines an undulating gait inspired by caterpillars with three-dimensional movement commands. We first introduce the kinematics model and real-world prototype of our retractable worm robot, called RW-Robot, and then we introduce footstep phases to express the timing of segments' spatial movement. According to the length of movement periods, we classify the movement into short-term movements and long-term movements and compress their patterns in the frequency domain. Our PS scheme aligns the patterns according to the footstep phases to generate new gaits of spatial locomotion. We evaluate the scheme in real-world experiments, including steering and climbing a slope. The experimental results indicate that our scheme allows the RW-Robot to perform flexible spatial locomotion from simple user input.
|
|
13:30-15:00, Paper TuBT18-AX.6 | Add to My Program |
Enhanced Robust Motion Control Based on Unknown System Dynamics Estimator for Robot Manipulators |
|
Jia, Xinyu | National University of Singapore |
Yang, Jun | National University of Singapore |
Kaixin, Lu | Faculty of Engineering, National University of Singapore |
Pan, Yongping | Sun Yat-Sen University |
Yu, Haoyong | National University of Singapore |
Keywords: Motion Control, Robust/Adaptive Control, Redundant Robots
Abstract: To achieve high-accuracy manipulation in the presence of unknown disturbances, we propose two novel efficient and robust motion control schemes for high-dimensional robot manipulators. Both controllers incorporate an unknown system dynamics estimator (USDE) to estimate disturbances without requiring acceleration signals and the inverse of inertia matrix. Then, based on the USDE framework, an adaptive-gain controller and a super-twisting sliding mode controller are designed to speed up the convergence of tracking errors and strengthen anti-perturbation ability. The former aims to enhance feedback portions through error-driven control gains, while the latter exploits finite-time convergence of discontinuous switching terms. We analyze the boundedness of control signals and the stability of the closed-loop system in theory, and conduct real hardware experiments on a robot manipulator with seven degrees of freedom (DoF). Experimental results verify the effectiveness and improved performance of the proposed controllers, and also show the feasibility of implementation on high-dimensional robots.
|
|
13:30-15:00, Paper TuBT18-AX.7 | Add to My Program |
Model-Free Control of a Class of High-Precision Scanning Motion Systems with Piezoceramic Actuators |
|
Al-Rawashdeh, Yazan | Memorial University of Newfoundland |
Al Saaideh, Mohammad | Memorial University of Newfoundland |
Heertjes, Marcel | Eindhoven University of Technology |
Al Janaideh, Mohammad | University of Guelph |
Keywords: Motion Control, Semiconductor Manufacturing, Motion and Path Planning
Abstract: To enhance the precision of coarse long-stroke motion axes, complementary short-stroke fine positioning stages are usually introduced. Being mechanically attached, the motion of the combined positioning stages needs to be controlled and synchronized. Therefore, typically suitable model-based controllers of fine stages are designed according to the so- phisticated models and identification techniques used. Due to their appealing features, Piezocermamic-based fine positioning stages were successfully utilized in many applications, which recently sparked their use in high-acceleration motion found in wafer scanners, for example, where high-precision motion is required despite the resulting high inertial forces involved. Unfortunately, hard nonlinear behavior is associated with piezo- electric actuators, which adds to the complexity of modeling, control, and synchronization processes. To overcome such a burden, in this study, the design procedure of a model-free control and synchronization technique of piezocermamic-based fine positioning stages is introduced and verified experimentally using a representative precision motion system comprising a planner stage and a uni-axial fine stage under step-and-scan trajectories commonly used in wafer scanners. Despite its sim- plicity, the herein proposed design procedure can be seamlessly extended to other robotics and automation applications
|
|
13:30-15:00, Paper TuBT18-AX.8 | Add to My Program |
Constrained Nonlinear Disturbance Observer for Robotic Systems |
|
Han, Ji Wan | Korea Advanced Institute of Science and Technology |
Park, Daehyung | Korea Advanced Institute of Science and Technology, KAIST |
Kim, Min Jun | KAIST |
Keywords: Motion Control
Abstract: Disturbance observer (DOB) is a well-known two-loop control structure that imparts robustness to a controller with a simple implementation. As a nonlinear DOB for the robotic systems, we proposed so-called nonlinear robust internal-loop compensator (NRIC) framework in our previous work. In this paper, we further extend the NRIC in such a way that an optimization scheme can be embedded in the control structure. The proposed method is called constrained NRIC (C-NRIC), because the optimization allows us to impose constraints, by which a controller acquires additional properties. As a particular use case of the C-NRIC framework, we design contact-responsive motion controllers that enables a robot to react to unknown interactions while accurately tracking the desired trajectory in free motion. The effectiveness of such designs is validated through the real-world experiments.
|
|
13:30-15:00, Paper TuBT18-AX.9 | Add to My Program |
An Integrated Position-Velocity-Force Method for Safety-Enhanced Shared Control in Robot-Assisted Surgical Cutting |
|
Xiao, Xilin | Hefei University of Technology |
Li, Xiaojian | Hefei University of Technology |
Yudong, Shi | HeFei University of Technology |
Fang, Jin | Hefei University of Technology |
Li, Ling | Hefei University of Technology |
He, Pengfei | Hefei University of Technology |
Mo, Hangjie | City University of HongKong |
Keywords: Human-Robot Collaboration, Motion Control, Safety in HRI
Abstract: Numerous studies have emphasized the application of autonomous intelligence in human-robot shared control to enhance surgical convenience and efficiency. However, the neglect of human dominance may reduce surgical safety. This paper developed a safety-enhanced human-robot shared control method by intelligently allocating control authority, with the surgeon remaining the leader during the surgical procedure. Three controllers are designed initially, including a master hand position (MP) controller and a master hand velocity (MV) controller related to the surgeon's manipulation, and a planned trajectory tracking (PT) controller related to the robot. In precision surgical manipulation scenarios, precise tracking of the human's operation is achieved by combining MP and MV controllers, while a combination of MV and PT controllers is developed in high-efficiency surgical scenarios, which relaxes the requirement for precise tracking of hand position and enables precise robot assistance guided by the velocity of human hand. The autonomous scenarios and controllers switching are accomplished through a motion fusion mechanism, which is achieved via optimizing evaluation functions that are reliant on future states. Furthermore, a force feedback mechanism is proposed to help human understand the intent of autonomous control to improve safety. The feasibility and effectiveness of this method have been validated through simulations and experiments.
|
|
TuBT19-NT Oral Session, NT-G301 |
Add to My Program |
Medical Robots II |
|
|
Chair: Tsumura, Ryosuke | National Institute of Advanced Industrial Science and Technology(AIST) |
Co-Chair: Wang, Junchen | Beihang University |
|
13:30-15:00, Paper TuBT19-NT.1 | Add to My Program |
DopUS-Net: Quality-Aware Robotic Ultrasound Imaging Based on Doppler Signal (I) |
|
Jiang, Zhongliang | Technical University of Munich |
Duelmer, Felix | Technical University of Munich |
Navab, Nassir | TU Munich |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: Medical ultrasound (US) is widely used to evaluate and stage vascular diseases, in particular for the preliminary screening program, due to the advantage of being radiation-free. However, automatic segmentation of small tubular structures (e.g., the ulnar artery) from cross-sectional US images is still challenging. To address this challenge, this paper proposes the DopUS-Net and a vessel re-identification module that leverage the Doppler effect to enhance the final segmentation result. Firstly, the DopUS-Net combines the Doppler images with B-mode images to increase the segmentation accuracy and robustness of small blood vessels. It incorporates two encoders to exploit the maximum potential of the Doppler signal and recurrent neural network modules to preserve sequential information. Input to the first encoder is a two-channel duplex image representing the combination of the grey-scale Doppler and B-mode images to ensure anatomical spatial correctness. The second encoder operates on the pure Doppler images to provide a region proposal. Secondly, benefiting from the Doppler signal, this work first introduces an online artery re-identification module to qualitatively evaluate the real-time segmentation results and automatically optimize the probe pose for enhanced Doppler images. This quality-aware module enables the closed-loop control of robotic screening to further improve the confidence and robustness of image segmentation. The experimental results demonstrate that the prop
|
|
13:30-15:00, Paper TuBT19-NT.2 | Add to My Program |
Robotic Craniomaxillofacial Osteotomy System Using Acoustic 3D Registration |
|
Zhu, Jiayu | Beihang University |
Han, Runzhe | Beihang University |
Yuan, Mengning | Peking University School and Hospital of Stomatology |
Jie, Bimeng | Peking University |
Du, Shanshan | Beihang University |
He, Yang | Peking University School and Hospital of Stomatology |
Zhang, Runshi | Beihang University |
Wang, Junchen | Beihang University |
Keywords: Medical Robots and Systems, Deep Learning Methods, Compliance and Impedance Control
Abstract: Osteotomy holds a pivotal position among the fundamental procedures in craniomaxillofacial (CMF) surgery. However, there are inherent challenges and risks associated with ensuring the recuperation of occlusion, safeguarding the facial nerves and blood vessels, as well as preserving facial aesthetics. In this study, a non-invasive image-to-patient registration method for navigation/robotic CMF surgery based on intraoperative freehand ultrasound (US) 3D reconstruction is proposed. Building upon this, a CMF osteotomy robotic system with compliant human-robot interaction and osteotomy trajectory planning was devised. In the freehand US 3D reconstruction and registration experiments, the registration errors for human volunteers and phantoms were consistently less than 1 mm. In robot osteotomy experiments based on the resulting registration, the average osteotomy error was below 1.5 mm. The proposed US 3D reconstruction-based registration method is non-invasive and radiation-free, and shows the promising accuracy which is suitable for CMF robotic or navigation systems.
|
|
13:30-15:00, Paper TuBT19-NT.3 | Add to My Program |
Elliptical Torus-Based Six-Axis FBG Force Sensor with In-Situ Calibration for Condition Monitoring of Orthopedic Surgical Robot |
|
Li, Tianliang | Wuhan University of Technology |
Zhao, Chen | Wuhan University of Technology |
Wen, Yuhang | Wuhan University of Technology |
Chen, Fayin | Wuhan University of Technology |
Tan, Yuegang | Wuhan University of Technology |
Zhou, Zude | Wuhan University of Technology |
Keywords: Medical Robots and Systems, Force and Tactile Sensing, Deep Learning Methods
Abstract: Six-axis force/moment (6-A F/M) sensors make surgical robots effectively sense intraoperative force feedback and drilling status information, reducing the operating challenges and psychological burden of doctors, which also improves the quality and safety of surgery. However, it is difficult for current commercial electrical 6-A F/M sensors to adapt the electromagnetic environment in the operating room, and status changes after installation can also reduce accuracy. At the same time, there is a strong vibration coupling of low-frequency force information, leading to low identification accuracy and slow response speed in the drilling and milling status. Aiming at these problems, an elliptical torus-based 6-A fiber optic F/M sensor and its in-situ calibration method for orthopedic surgical robot force sensing are proposed. Furthermore, combined with the multichannel one-dimensional convolutional gated recurrent unit (M1-DCGRU), a fast and accurate identification of seven drilling stages was realized. The final force sensing error is less than 7.1%, and the drilling state identification accuracy is at least 93.9%. The designed sensor has higher accuracy, is compatible with magnetic resonance imaging (MRI), and accurately identifies finer drilling stages without relying on other sensors.
|
|
13:30-15:00, Paper TuBT19-NT.4 | Add to My Program |
Vision-And-Force-Based Compliance Control for a Posterior Segment Ophthalmic Surgical Robot |
|
Wang, Ning | Xi'an Jiaotong University |
Zhang, Xiaodong | Xi’an Jiaotong University |
Stoyanov, Danail | University College London |
Zhang, Hongbing | The First Affiliated Hospital of Northwestern University |
Stilli, Agostino | University College London |
Keywords: Medical Robots and Systems, Force Control, Compliance and Impedance Control
Abstract: In ophthalmic surgery, particularly in procedures involving the posterior segment, clinicians face significant challenges in maintaining precise control of hand-held instruments without damaging the fundus tissue. Typical targets of this type of surgery are the internal limiting membrane (ILM) and the epiretinal membrane (ERM) which have an average thickness of only 60 μm and 2 μm, respectively, making it challenging, even for experienced clinicians utilising dedicated ophthalmic surgical robots, to peel these delicate membranes successfully without damaging the healthy tissue. Minimal intra-operative motion errorswhen driving both hand-held and robotic-assisted surgical tools may result in significant stress on the delicate tissue of the fundus, potentially causing irreversible damage to the eye. To address these issues, this work proposes an intra-operative vision-and-force-based compliance control method for a posterior segment ophthalmic surgical robot. This method aims to achieve compliance control of the surgical instrument in contact with the tissue to minimise the risk of tissue damage. In this work we demonstrate that we can achieve a maximum motion error for the end effector (EE) of our ophthalmic robot of just 8 μm, resulting in a 64% increase in motion accuracy compared to our previous work where the system was firstly introduced. The results of the proposed compliance control demonstrate consistent performance in the force range of 40 mN during mem
|
|
13:30-15:00, Paper TuBT19-NT.5 | Add to My Program |
A Hybrid Admittance Control Algorithm for Automatic Robotic Cranium-Milling |
|
Qian, Chen | Institute of Automation, Chinese Academy of Sciences |
Li, Zhen | Institute of Automation, Chinese Academy of Sciences |
Ye, Qiang | Institute of Automation, Chinese Academy of Sciences |
Ge, Pei Cong | Beijing Tiantan Hospital, Capital Medical University |
Zhao, Jizong | Beijing Tiantan Hospital, Capital Medical University |
Bian, Gui-Bin | Institute of Automation, Chinese Academy of Sciences |
Keywords: Medical Robots and Systems, Force Control
Abstract: Prior robot-assisted cranium-milling studies only considered controlling the force in the skull's vertical direction and neglected the milling cutter's feed force. Additionally, achieving stable force control in multiple directions is challenging for robots due to the uneven skull surface. Here a hybrid admittance control algorithm incorporating a model-free adaptive nonlinear force control and fuzzy control algorithms is proposed to accomplish effective automatic cranial-milling tasks. First, a pure data-driven model-free adaptive control method based on partial form dynamic linearization is used to control the vertical force. Second, fuzzy control minimizes the total error of both the vertical and feed force by adaptively adjusting the milling cutter's velocity and position. 42 ex vivo animal skull-milling experiments conducted by the automatic robotic cranium-milling system indicate that when using the proposed control algorithm, the force error percentage can be maintained below 5.0% within 3 s and the maximal root mean square error percentages for vertical and feed force are 1.85% and 1.94%, respectively. Moreover, no instances of dura mater damage are observed and the robotic system exhibits a high level of autonomy as it performs the skull milling task with minimal human involvement throughout the entire experiment. The results suggest the potential for advancing the intelligence level of neurosurgery in the future.
|
|
13:30-15:00, Paper TuBT19-NT.6 | Add to My Program |
Preliminary Study of Fingertip and Wrist Motion Based Haptic Controller for Robotically Assisted Micro and Supermicrosurgy |
|
Miyasaka, Muneaki | Riverfield Inc |
van Esch, Pepijn | Riverfield Inc |
Morikawa, Atsushi | Riverfield Inc |
Tadano, Kotaro | Tokyo Institute of Technology |
Keywords: Medical Robots and Systems, Haptics and Haptic Interfaces
Abstract: One issue of robotic microsurgery is that compared to manual surgery, the operation time tends to be longer due to high motion scaling. To address this issue, we developed a new controller that can provide the accuracy required for microsurgery without a high scaling factor by utilizing fingertip and wrist motions. Also, for the better outcome of surgery, the proposed controller has a force feedback function which is not available for the existing controllers for microsurgical robots. A challenge of designing such a controller is associated with the size requirement. For conventional microsurgery, surgeons perform surgical procedures while looking at the eyepieces of a surgical microscope and the same applies for robotic microsurgery. The only space available to manipulate controllers is a narrow space between the patient/surgical bed and the surgeon. To satisfy this constraint, the proposed controller is integrated with a handrest and the controller's DOFs are strategically allocated. In this work, as the first step of addressing the issue of prolonged operational time, we built a prototype controller and evaluated the accuracy and task space with simulations. The results indicated that by using the fingertip and wrist motions with a scaling factor of 3x, 0.5 mm diameter circles could be traced with a mean bidirectional precision of 0.0485 mm. Also, 10.0 mm diameter circles were traceable with the same scaling factor.
|
|
13:30-15:00, Paper TuBT19-NT.7 | Add to My Program |
Haptic-Assisted Collaborative Robot Framework for Improved Situational Awareness in Skull Base Surgery |
|
Ishida, Hisashi | Johns Hopkins University |
Sahu, Manish | Johns Hopkins |
Munawar, Adnan | Johns Hopkins University |
Nagururu, Nimesh | Johns Hopkins University School of Medicine |
Galaiya, Deepa | Johns Hopkins |
Kazanzides, Peter | Johns Hopkins University |
Creighton, Francis | Johns Hopkins School of Medicine |
Taylor, Russell H. | The Johns Hopkins University |
Keywords: Medical Robots and Systems, Hardware-Software Integration in Robotics, Human-Robot Collaboration
Abstract: Skull base surgery is a demanding field in which surgeons operate in and around the skull while avoiding critical anatomical structures including nerves and vasculature. While image-guided surgical navigation is the prevailing standard, limitation still exists, requiring personalized planning and recognizing the irreplaceable role of a skilled surgeon. This paper presents a collaboratively controlled robotic system tailored for assisted drilling in skull base surgery. Our central hypothesis posits that this collaborative system, enriched with haptic assistive modes to enforce virtual fixtures, holds the potential to significantly enhance surgical safety, streamline efficiency, and alleviate the physical demands on the surgeon. The paper describes the intricate system development work required to enable these virtual fixtures through haptic assistive modes. To validate our system's performance and effectiveness, we conducted initial feasibility experiments involving a medical student and two experienced surgeons. The experiment focused on drilling around critical structures following cortical mastoidectomy, utilizing dental stone phantom and cadaveric models. Our experimental results demonstrate that our proposed haptic feedback mechanism enhances the safety of drilling around critical structures compared to systems lacking haptic assistance. With the aid of our system, surgeons were able to safely skeletonize the critical structures without breaching any critical structure even under obstructed view of the surgical site.
|
|
13:30-15:00, Paper TuBT19-NT.8 | Add to My Program |
Intelligent Disinfection Robot with High-Touch Surface Detection and Dynamic Pedestrian Avoidance |
|
Luan, Yunfei | Shanghai Jiao Tong University |
He, Muhang | Shanghai Jiao Tong University |
Tian, Yudong | Shanghai Jiao Tong University |
Lin, Chengjie | Shanghai Jiao Tong University |
Fang, Yunhan | Shanghai Jiaotong University |
Zhao, Zihao | Shanghai Jiao Tong University |
Yang, Jianxin | Shanghai Jiao Tong University |
Guo, Yao | Shanghai Jiao Tong University |
Keywords: Medical Robots and Systems, Human-Aware Motion Planning, Object Detection, Segmentation and Categorization
Abstract: The increasing awareness of public health issues has highlighted the need for effective disinfection of crowded indoor public areas, leading to the development of automated disinfection robots. However, most of the existing robots spray disinfectant in all areas, and they are still immature to navigate in densely populated environments. Hence, in this paper, we design a new disinfection robotic system consisting of a mobile platform, an RGB-D camera, and a robotic arm with a spray disinfection device. To address the above challenges, we propose a vision-based method for accurately detecting high-touch areas in the surroundings, enabling the disinfection robot to achieve superior disinfection efficiency. In addition, we propose a dynamic pedestrian avoidance method, namely Socially Aware APF (SA-APF), which can predict the movement trend of pedestrians and plan the path in real-time. Both simulated and real-world experiments are conducted to demonstrate the effectiveness of our disinfection robot system, especially highlighting the ability to detect high-touch areas and navigate in the environment while avoiding dynamic pedestrians.
|
|
13:30-15:00, Paper TuBT19-NT.9 | Add to My Program |
Toward a Framework Integrating Augmented Reality and Virtual Fixtures for Safer Robot-Assisted Lymphadenectomy |
|
Chen, Ziyang | Politecnico Di Milano |
Fan, Ke | Politecnico Di Milano |
Cruciani, Laura | Politecnico Di Milano |
Fontana, Matteo | European Institute of Oncology |
Muraglia, Lorenzo | European Institute of Oncology |
Ceci, Francesco | European Institute of Oncology |
Travaini, Laura | European Institute of Oncology |
Ferrigno, Giancarlo | Politecnico Di Milano |
De Momi, Elena | Politecnico Di Milano |
Keywords: Medical Robots and Systems, Human-Robot Collaboration, Computer Vision for Medical Robotics
Abstract: Lymphadenectomy generally accompanies various oncology surgeries to remove infected cancer cells. However, there are two limitations in robot-assisted lymphadenectomy: 1) lymph nodes are not visible during operation since they are hidden by the superficial fat layer; 2) intra-operative bleeding may occur during lymph node removal caused by collisions between surgical instruments and delicate blood vessels (arteries or veins) near the lymph nodes. Therefore, we propose a framework integrating augmented reality and virtual fixtures to address these limitations. Augmented reality intra-operatively visualizes the hidden lymph nodes by projecting the corresponding 3D pre-operative model, and virtual fixtures are used to provide force feedback to surgeons to avoid possible collisions when they operate the surgical instruments to resect the lymph nodes surrounding the blood vessel. Ten human subjects were invited to perform an emulated lymphadenectomy based on the da Vinci robot in a dry lab. Experimental results demonstrated that the proposed framework can keep localizing the hidden lymph nodes, and reduce the number of collisions (21% and 48% reduction rates using two different force models compared to the standard setup, respectively) between the instruments and the delicate blood vessel during lymph node resection. It shows the potential to enhance the safety of robot-assisted lymphadenectomy.
|
|
TuBT20-NT Oral Session, NT-G302 |
Add to My Program |
Robotics and Automation in Construction |
|
|
Co-Chair: Osa, Takayuki | University of Tokyo |
|
13:30-15:00, Paper TuBT20-NT.1 | Add to My Program |
Hyblock: Hardware Realization and Control of Modular Hydraulic Robots with Dowel Connectors |
|
Hyon, Sang-Ho | Ritsumeikan University |
Ando, Ryo | Ritsumeikan University |
Sono, Eiji | Ritsumeikan University |
Sugimoto, Shunichi | Ritsumeikan University |
Saito, Yasushi | KYB-YS Co. Ltd |
Keywords: Robotics and Automation in Construction, Hydraulic/Pneumatic Actuators, Motion Control
Abstract: This paper presents the hardware design and development of Hyblock, a modular hydraulic robot for heavy-duty application such as construction. The robot is equipped with a simple docking mechanism called a C-type expansion dowel and a novel hydraulic circuit MHSB that matches the modular structure. In this paper, we first report on the design of the robot hardware including the dowel and hydraulic circuit, then present preliminary experiments on pressure-based torque control and docking control using proximal magnetic sensors. Next, we propose a framework for dynamic reconfiguration and task-space motion control built on the concept of dowel connectors. Simulation results demonstrate that a collective modular robot achieves desired motion tasks while keeping all normal contact forces of the connectors being lower-bound. The results are also explained in the supplementary video.
|
|
13:30-15:00, Paper TuBT20-NT.2 | Add to My Program |
PLASTR: Planning for Autonomous Sampling-Based Trowelling |
|
Kuhlmann-Jørgensen, Mads Alber | ETH Zurich |
Pankert, Johannes | ETH Zuerich |
Pietrasik, Lukasz Leszek | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Robotics and Automation in Construction, Motion and Path Planning, Optimization and Optimal Control
Abstract: Plaster is commonly used in the construction industry to finish walls and ceilings, but the application is labor-intensive and physically strenuous, which motivates the need for automation. We present PLASTR, a receding horizon optimization-based planning algorithm for robotic plaster trowelling. It samples trowelling sequence rollouts from a new plaster simulator and weights them according to the flatness of the finished wall. The proposed simulator approximates the real-world plaster-trowel interaction adequately while allowing execution orders of magnitude faster than real-time. We evaluate PLASTR in simulation and on a real-world test setup and compare it to two handcrafted heuristic baseline algorithms. PLASTR performs equal to or better than the best heuristic in terms of material coverage for both simulated and real-world experiments while being 50% more efficient in terms of trowelled distance.
|
|
13:30-15:00, Paper TuBT20-NT.3 | Add to My Program |
Self-Reconfigurable Robots for Collaborative Discrete Lattice Assembly |
|
Smith, Miana | MIT |
Abdel-Rahman, Amira | MIT |
Gershenfeld, Neil | Massachusetts Institute of Technology |
Keywords: Robotics and Automation in Construction, Multi-Robot Systems, Assembly
Abstract: We present a robotic system for the assembly of 3D discrete lattice structures in which the robots are able to self-reproduce, such that the assembly system may scale its own parallelization. Robots and structures are made from a set of compatible building blocks, or voxels, which can be assembled and reassembled into more complex structures. Robotic modules are made by combining actuators with a functional voxel, which routes electrical power and signals. Robotic modules then assemble into reconfigurable robots via a reversible solder joint. The robot assembles higher performance structures using a set of construction voxels, which do not contain electrical features. This paper describes the design, development, and evaluation of this assembly system, including the robotic hardware, lattice material, and planning and controls methods. We demonstrate the system through a set of fundamental assembly tasks: the robot assembling another robot, and the two robots collaborating to assemble a small structure.
|
|
13:30-15:00, Paper TuBT20-NT.4 | Add to My Program |
LiSTA: Geometric Object-Based Change Detection in Cluttered Environments |
|
Rowell, Joseph | University of Oxford, Oxford Robotics Institute |
Zhang, Lintong | University of Oxford |
Fallon, Maurice | University of Oxford |
Keywords: Robotics and Automation in Construction, Object Detection, Segmentation and Categorization, SLAM
Abstract: We present LiSTA (LiDAR Spatio-Temporal Analysis), a system to detect probabilistic object-level change over time using multi-mission SLAM. Many applications require such a system, including construction, robotic navigation, long-term autonomy, and environmental monitoring. We focus on the semi-static scenario where objects are added, subtracted, or changed in position over weeks or months. Our system combines multi-mission LiDAR SLAM, volumetric differencing, object instance description, and correspondence grouping using learned descriptors to keep track of an open set of objects. Object correspondences between missions are determined by clustering the object's learned descriptors. We demonstrate our approach using datasets collected in a simulated environment and a real-world dataset captured using a LiDAR system mounted on a quadruped robot monitoring an industrial facility containing static, semi-static, and dynamic objects. Our method demonstrates superior performance in detecting changes in semi-static environments compared to existing methods.
|
|
13:30-15:00, Paper TuBT20-NT.5 | Add to My Program |
Scalable Underwater Assembly with Reconfigurable Visual Fiducials |
|
Lensgraf, Samuel | Dartmouth College |
Sarkar, Ankita | Dartmouth College |
Pediredla, Adithya | Dartmouth College |
Balkcom, Devin | Dartmouth College |
Quattrini Li, Alberto | Dartmouth College |
Keywords: Robotics and Automation in Construction, Perception for Grasping and Manipulation, Marine Robotics
Abstract: We present a scalable combined localization infrastructure deployment and task planning algorithm for underwater assembly. Infrastructure is autonomously modified to suit the needs of manipulation tasks based on an uncertainty model on the infrastructure's positional accuracy. Our uncertainty model can be combined with the noise characteristics from multiple devices. For the task planning problem, we propose a layer-based clustering approach that completes the manipulation tasks one cluster at a time. We employ movable visual fiducial markers as infrastructure and an autonomous underwater vehicle (AUV) for manipulation tasks. The proposed task planning algorithm is computationally simple, and we implement it on AUV without any offline computation requirements. Combined hardware experiments and simulations over large datasets show that the proposed technique is scalable to large areas.
|
|
13:30-15:00, Paper TuBT20-NT.6 | Add to My Program |
Automatic Loading of Unknown Material with a Wheel Loader Using Reinforcement Learning |
|
Eriksson, Daniel | Tampere University |
Ghabcheloo, Reza | Tampere University |
Geimer, Marcus | Karlsruhe Institute of Technology |
Keywords: Robotics and Automation in Construction, Reinforcement Learning
Abstract: Loading multiple different materials with wheel loaders is a challenging task because various materials require different loading techniques. It's, therefore, difficult to find a single controller capable of handling them all. One solution is to use a base controller and fine-tune it for different materials. Reinforcement Learning (RL) automates this process without the need for collecting additional human-annotated data. We investigated the feasibility of this approach using a full-size 24-tonnes wheel loader in the real world and demonstrated that it's possible to fine-tune a neural network controller that was originally trained with imitation learning on blasted rock for use with an unknown gravel material, requiring 20 bucket fillings. Additionally, we showcased the adaptability of a controller pre-trained on woodchips for an unknown gravel material, requiring 40 bucket fillings. We also proposed a novel reward function for the material loading task. Finally, we examined how the sampling time of the reinforcement learning algorithm affects convergence speed and adaptability. Our results demonstrate that it's optimal to match the sampling time of the RL algorithm to the delays of the wheel loader's hydraulic actuators.
|
|
13:30-15:00, Paper TuBT20-NT.7 | Add to My Program |
Learning Adaptive Policies for Autonomous Excavation under Various Soil Conditions by Adversarial Domain Sampling |
|
Osa, Takayuki | University of Tokyo |
Osajima, Naoto | Kyushu Institute of Technology |
Aizawa, Masanori | Komatsu Ltd |
Harada, Tatsuya | The University of Tokyo |
Keywords: Robotics and Automation in Construction, Reinforcement Learning
Abstract: Excavation is a frequent task in construction. In this context, automation is expected to reduce hazard risks and labor-intensive work. To this end, recent studies have investigated using reinforcement learning (RL) to automate construction machines. One of the challenges in applying RL to excavation tasks concerns obtaining skills adaptable to various conditions. When the conditions of soils differ, the optimal plans for efficiently excavating the target area will significantly differ. In existing meta-learning methods, the domain parameters are often uniformly sampled; this implicitly assumes that the difficulty of the task does not change significantly for different domain parameters. In this study, we empirically show that uniformly sampling the domain parameters is insufficient when the task difficulty varies according to the task parameters. Correspondingly, we develop a framework for learning a policy that can be generalized to various domain parameters in excavation tasks. We propose two techniques for improving the performance of an RL method in our problem setting: adversarial domain sampling and domain parameter estimation with a sensitivity-aware importance weight. In the proposed adversarial domain sampling technique, the domain parameters leading to low expected Q-values are actively sampled during the training phase. We empirically show that our approach outperforms existing meta-learning and domain adaptation methods for excavation tasks.
|
|
13:30-15:00, Paper TuBT20-NT.8 | Add to My Program |
Robotic Inspection and Subsurface Defect Mapping Using Impact-Echo and Ground Penetrating Radar |
|
Hoxha, Ejup | The City College of New York |
Feng, Jinglun | The City College of New York |
Sanakov, Diar | New York University |
Xiao, Jizhong | The City College of New York |
Keywords: Robotics and Automation in Construction, Sensor-based Control, Sensor Fusion
Abstract: Concrete infrastructure often develops a variety of internal flaws that cannot be detected through visual inspection alone, and must be regularly inspected with other methods to maintain structural integrity. It has been demonstrated through previous studies that relying solely on a single non-destructive evaluation (NDE) method can be insufficient in providing a comprehensive evaluation of the structure's condition. In addition, manual NDE data collection can be labor-intensive for on-site engineers. This paper presents a robotic inspection system that uses vision-based positioning and tags NDE measurement with pose information to reveal and map subsurface defects. The system consists of three modules: 1) an Omni-directional robotic data collection platform equipped with a Realsense D435i camera for localization, an impact-echo (IE) sensor, and a ground penetrating radar (GPR), to perform automatic NDE data collection; 2) an IE data processing module that utilizes both learning-based and classical methods to interpret the IE data and reveal subsurface objects; 3) a GPR data processing module to reconstruct underground targets and create a 3D map for better visualization. Field testing demonstrates that the robotic system significantly increases the data collection speed, and the correlation of findings from both IE and GPR sensors give a comprehensive evaluation of concrete structures that will benefit the inspection and maintenance industry of civil infrastructure.
|
|
TuBT21-NT Oral Session, NT-G303 |
Add to My Program |
Bioinspired Flight and Swimming |
|
|
Chair: Ollero, Anibal | AICIA. G41099946 |
Co-Chair: Liu, Chunbao | Jilin University |
|
13:30-15:00, Paper TuBT21-NT.1 | Add to My Program |
Ospreys-Inspired Self-Takeoff Strategy of an Eagle-Scale Flapping-Wing Robot: System Design and Flight Experiments |
|
Wang, Haoyu | Harbin Institute of Technology, Shenzhen |
Xu, Wenfu | Harbin Institute of Technology, Shenzhen |
Hou, Linpo | Harbin Institute of Technology, Shenzhen |
Pan, Erzhen | Harbin Institute of Technology, Shenzhen |
Keywords: Biologically-Inspired Robots, Aerial Systems: Applications, Dynamics
Abstract: In this work, we achieved self-takeoff of an eagle-scale flapping-wing robot for the first time. Inspired by the takeoff process of ospreys, we propose a bio-inspired takeoff strategy, then discuss the dynamic model and the requirements for self-takeoff. Based on the requirements of flight strategy, we designed a system with two parts, including a flapping-wing aircraft with a wingspan of 1.8m and a take-off weight of 870g, and an auxiliary platform with initial pitch angle adjustment function. In order to explore the differences in the take-off process under different conditions, we conduct the flight experiments under different time-averaged thrust-to-weight ratios (0.745-0.876) and launch angles (45°-90°). The results of flight experiments confirmed the theoretical analysis that the flapping-wing robot can achieve self-takeoff with no potential energy cost and maintain high maneuverability (The video shows a rapid climb immediately after takeoff) even when the time-averaged thrust-to-weight ratio is smaller than 1. This is significantly different from conventional rotary-wing and vertical take-off and landing (VTOL) UAVs. This work solves the challenge of self-takeoff for large-scale flapping-wing robots using a designable method, and demonstrates the superior performance potential of flapping-wing robots compared to conventional UAVs.
|
|
13:30-15:00, Paper TuBT21-NT.2 | Add to My Program |
Design and Analysis of Adaptive Flipper with Origami Structure for Frog-Inspired Swimming Robot |
|
Wang, Shuqi | Harbin Institute of Technology |
Fan, Jizhuang | Robot Research Institute, Harbin Institute of Technology |
Pan, Yitao | Harbin Institute of Technology |
Liu, Gangfeng | Harbin Institute of Technology |
Liu, Yubin | Harbin Institude of Technology |
Keywords: Biologically-Inspired Robots, Biomimetics, Mechanism Design
Abstract: Flippers are important components for improving the locomotion efficiency and stability of bionic underwater robots. A novel origami-based adaptive flipper is presented to address a lack of environmental adaptability and low performance efficiency due to the structural design or inherent characteristics of its main constituent materials. The design decision process and locomotion principle of the flipper are introduced in detail. It can exhibit better adaptive deformation under the action of hydrodynamics without affecting the propulsion efficiency. Kinematics and simulation analysis are performed to characterize the influence of structural parameters on the motion performance. Experimental swimming results show that compared with ordinary flippers, the locomotion efficiency is greatly improved with the help of origami flippers. The origami flipper also shows good adaptability when in contact with the external environment and overcomes the inability of open-close flippers to cross a 90° corner, which shows the rationality of the structural design and the feasibility of its application in underwater robots.
|
|
13:30-15:00, Paper TuBT21-NT.3 | Add to My Program |
Model-Based Approach for Lateral Maneuvers of Bird-Size Ornithopter |
|
Sanchez-Laulhe, Ernesto | University of Malaga |
Satué Crespo, Álvaro César | GRVC Robotics Lab., Universidad De Sevilla |
Rafee Nekoo, Saeed | GRVC Robotics Lab, Universidad De Sevilla |
Ollero, Anibal | AICIA. G41099946 |
Keywords: Biologically-Inspired Robots, Dynamics
Abstract: A model-based approach for lateral maneuvering of flapping wing UAVs in closed spaces is presented. Bird-size ornithopters do not have asymmetric variables in the wing due to mechanical complexity, so they rely upon the tail for lateral maneuvering. The prototype E-Flap can deflect the vertical tail to make maneuvers out of the longitudinal plane. This work defines simplified equations for the steady turning maneuver based on the body roll angle. The relation between the velocity of the prototype and the turning radius is also stated. Then, an approach to the attitude is proposed, defining the relation between the deflection of the vertical tail and the roll angle. We prove that, even though this deflection causes a yaw moment, the coupling between yaw and roll dynamics generates also a roll rate. To validate this simplified model, a simple control is presented for continuous circular trajectory tracking inside an indoor flight zone. The objective is to track circular trajectories of a radius 2 times greater than the wingspan at a constant height. Results show a very good agreement between the theoretical and experimental turning radius. In addition, the direct relation between the vertical tail deflection and the roll rate of the ornithopter is identified. Even though the desired radius is not reached, the FWUAV is capable of maintaining a closed turning maneuver for several laps. Therefore, the insight provided by the model proves to be an appropriate approach for aggressive lateral maneuvers of bird-size ornithopters.
|
|
13:30-15:00, Paper TuBT21-NT.4 | Add to My Program |
A General Kinematic Model of Fish Locomotion Enables Robot Fish to Master Multiple Swimming Motions |
|
Zhong, Yong | South China University of Technology |
Hong, Zicun | South China University of Technology |
Li, Yuhan | South China University of Technology |
Yu, Junzhi | Chinese Academy of Sciences |
Keywords: Biologically-Inspired Robots, Kinematics, Dynamics, Robotic Fish
Abstract: Fish locomotion which adopts body and/or caudal fin swimming mode consists of different motions, such as Cruising-straight, Cruising-turn, and various fast turns, etc. Currently, there is no single mathematical model that could illustrate all these motions. Thus, for scientists and engineers, it is quite cumbersome and complicated to model and control different motions with multiple principles. In this paper, we proposed a general kinematic model to illustrate the kinematics of all aforementioned swimming motions. The model is synthesized by a nonlinear oscillator and a traveling wave equation. By changing four parameters extracted from the model, the kinematic model can demonstrate all the aforementioned swimming motions with different amplitudes and frequencies. To verify the model, we built a multi-joint robotic fish, and developed its dynamic model and control method to perform all the maneuvers under the guidance of the general kinematic model. Through this systematic methodology, one can easily study the principles of different swimming motions and design the multi-motions controller for a robotic fish through only one governing kinematic model.
|
|
13:30-15:00, Paper TuBT21-NT.5 | Add to My Program |
Adaptation of Flipper-Mud Interactions Enables Effective Terrestrial Locomotion on Muddy Substrates |
|
Liu, Shipeng | University of Southern California |
Huang, Boyuan | University of Southern California |
Qian, Feifei | University of Southern California |
Keywords: Biologically-Inspired Robots, Legged Robots, Contact Modeling
Abstract: Moving on natural muddy terrains, where soil composition and water content vary significantly, is complex and challenging. To understand how mud properties and robot-mud interaction strategies affect locomotion performance on mud, we study the terrestrial locomotion of a mudskipper-inspired robot on synthetic mud with precisely-controlled ratios of sand, clay, and water. We observed a non-monotonic dependence of the robot speed on mud water content. Robot speed was the largest on mud with intermediate levels of water content (25%-26%), but decreased significantly on higher or lower water content. Measurements of mud reaction force revealed two distinct failure mechanisms. At high water content, the reduced mud shear strength led to a large slippage of robot appendages and a significantly reduced step length. At low water content, the increased mud suction force caused appendage entrapment, resulting in a large negative displacement in the robot body during the swing phase. A simple model successfully captured the observed robot performance, and informed adaptation strategies that increased robot speed by more than 200%. Our study is a beginning step to extend robot mobility beyond simple substrates towards a wider range of complex, heterogeneous terrains.
|
|
13:30-15:00, Paper TuBT21-NT.6 | Add to My Program |
RoboTwin: A Platform to Study Hydrodynamic Interactions in Schooling Fish |
|
Li, Liang | Max-Planck Institute of Animal Behavior |
Chao, Li-Ming | Max Planck Institute of Animal Behavior |
Wang, Siyuan | Max Planck Institute of Animal Behavior |
Deussen, Oliver | University of Konstanz |
Couzin, Iain D. | Max Planck Institute of Animal Behavior |
Keywords: Biologically-Inspired Robots, Marine Robotics, Robotics and Automation in Life Sciences
Abstract: By living and moving in groups, fish can gain many benefits, such as heightened predator detection, greater hunting efficiency, more accurate environmental sensing, and energetic saving. Although the benefits of hydrodynamic interactions in schooling fish have drawn growing interest in fields such as biology, physics, and engineering, and multiple hypotheses for how such benefits may arise have been proposed, it is still largely unknown which mechanisms fish employ to obtain hydrodynamic benefits, such as in increased thrust, or improved movement efficiency. One main bottleneck has been the difficulty in collecting detailed sensory information, corresponding locomotory responses and hydrodynamic information from real schooling fish. In this paper, we present the RoboTwin platform designed to aid such data collection: it allows us to replay the dynamic movements and body posture kinematics of real fish in fish-like robots, allowing us to measure the power cost, thrust, and detailed flow fields, all of which is extremely challenging for real animals. We mutually verified our platform with our previously-proposed mechanism of energy saving ('vortex phase matching') by flow visualization through PIV (particle image velocimetry). Our results demonstrate the effectiveness of our design and highlight the potential of RoboTwin for future applications in exploring further hydrodynamic interactions among schooling fish.
|
|
13:30-15:00, Paper TuBT21-NT.7 | Add to My Program |
Real-Time Estimation for the Swimming Direction of Robotic Fish Based on IMU Sensors |
|
Li, Shikun | Peking University |
Zhai, Yufan | Peking University |
Wang, Chen | Peking University |
Xie, Guangming | Peking University |
Keywords: Biologically-Inspired Robots, Sensor Fusion, Marine Robotics
Abstract: An increasing number of underwater robots inspired by Carangidae are developed, which is characterized by high efficiency and flexibility. However, estimating the swimming direction of these robotic fish is challenging due to the constant swinging of the head during movement, which complicates precise control. In this study, we installed two low-cost inertial measurement unit (IMU) sensors separately on the head and tail parts of a double-joint robotic fish and presented a method for accurately and timely estimating the swimming direction. Firstly, we effectively compensated for the yaw angle drift of the IMU sensors through a fused Kalman Filter. Furthermore, we propose the Anti-Shake Estimation (ASE) algorithm to calculate the real-time swimming direction using filtered yaw angles at a high updating rate of 100Hz. Finally, we applied the method to swimming direction feedback control for evaluation and comparison. The results show that our ASE method performs better than other existing methods in straight-line swimming experiments. The experiment of S-curve swimming also demonstrates the effectiveness of our method in complex missions.
|
|
13:30-15:00, Paper TuBT21-NT.8 | Add to My Program |
Tunable Stiffness Caudal Peduncle Leads to Higher Swimming Speed without Extra Energy |
|
Liu, Sijia | Jilin University |
Liu, Chunbao | Jilin University |
Liang, Yunhong | Jilin University |
Ren, Luquan | Jilin University |
Ren, Lei | University of Manchester |
Keywords: Biologically-Inspired Robots, Soft Robot Applications, Soft Robot Materials and Design
Abstract: Tuning body stiffness like fish to improve swimming efficiency and speed has been adopted by many fish-inspired robotics. However, it is unknown whether the energy saved from improved efficiency can compensate for the energy consumption brought by tuning stiffness itself. To explore this issue, we develop a robotic fish with a tunable stiffness caudal peduncle (TSCP), simultaneously allowing for untethered swimming and online tunable stiffness, and conduct a series of tests. We first apply interchangeable caudal peduncles to our robot to explore the effect of a tunable stiffness mechanism and determine the optimal tunable stiffness interval. The results show that tunable stiffness can significantly improve the response of robot at different frequencies. Then we develop the TSCP by embedding shape memory alloy wire into a silicone matrix. TSCP can adjust the stiffness in real time through current and increase the initial stiffness by up to 57.4%. More importantly, we incorporate the cost of tunable stiffness into the total cost of transport compared to previous robots for the first time. The cost of maintaining medium and maximum stiffness accounts for 8.72% and 17.87% of the total cost of transport, respectively. As a result, TSCP increases the swimming speed by up to 35.5% and reduces the Strouhal number by up to 21.9% at high frequencies without extra power.
|
|
13:30-15:00, Paper TuBT21-NT.9 | Add to My Program |
A Novel Fish-Inspired Self-Adaptive Approach to Collective Escape of Swarm Robots Based on Neurodynamic Models |
|
Li, Junfei | University of Guelph |
Yang, Simon X. | University of Guelph |
Keywords: Biologically-Inspired Robots, Swarm Robotics, Cooperating Robots
Abstract: Fish schools present high-efficiency group behaviors to collective migration and dynamic escape from the predator through simple individual interactions. The purpose of this research is to infuse swarm robots with “fish-like” intelligence that will enable safe navigation and efficient cooperation, and successful completion of escape tasks in changing environments. In this paper, a novel fish-inspired self-adaptive approach is proposed for the collective escape of swarm robots. A bio-inspired neural network (BINN) is introduced to generate collision-free escape trajectories through the dynamics of neural activity and the combination of attractive and repulsive forces. In addition, a neurodynamics-based self-adaptive mechanism is proposed to improve the self-adaptive performance of the swarm robots in dynamic environments. Similar to fish escape maneuvers, simulations and real-robot experiments show that the swarm robots can collectively leave away from the threat and respond to sudden environmental changes. Several comparison studies demonstrated that the proposed approach can significantly improve the effectiveness, efficiency, and flexibility of swarm robots in complex environments.
|
|
TuBT22-NT Oral Session, NT-G304 |
Add to My Program |
Marine Robotics II |
|
|
Chair: Johnson-Roberson, Matthew | Carnegie Mellon University |
Co-Chair: Kelasidi, Eleni | SINTEF Ocean |
|
13:30-15:00, Paper TuBT22-NT.1 | Add to My Program |
RUMP: Robust Underwater Motion Planning in Dynamic Environments of Fast Moving Obstacles |
|
Amundsen, Herman Biørn | NTNU |
Olsen, Torben Falleth | NTNU |
Xanthidis, Marios | SINTEF Ocean AS |
Føre, Martin | NTNU |
Kelasidi, Eleni | SINTEF Ocean |
Keywords: Marine Robotics, Collision Avoidance, Autonomous Vehicle Navigation
Abstract: Robust underwater motion planning of autonomous underwater vehicles (AUVs) in dynamic cluttered environments is a problem that has yet to be addressed in depth. Due to advances in technology and computational capacity, AUVs are expected to operate safely and autonomously in increasingly challenging environments, necessitating methods that are able to safely navigate robots in real-time. Though, most solutions remain overly cautious and conservative. This paper proposes RUMP, a novel locally-optimal motion planning framework for robust real-time autonomous underwater navigation in 3D cluttered environments consisting of observed static and dynamic obstacles. The problem is modeled using path-optimization and can be solved in real-time with a common non-linear solver. The constructed objective function allows deciding the local goal during optimization to both maximize safety within a planning horizon and minimize the expected distance to the target position. Furthermore, path safety is considered for the entire transition between consecutive states, utilizing a novel approach for continuous spatiotemporal collision checks. The proposed formulation provides safe performance even in environments with obstacles that may move orders of magnitude faster than the AUV itself. Simulation experiments, in different challenging scenarios, showcase robustness and efficient real-time performance of more than 16 Hz.
|
|
13:30-15:00, Paper TuBT22-NT.2 | Add to My Program |
Metrically Scaled Monocular Depth Estimation through Sparse Priors for Underwater Robots |
|
Ebner, Luca | ETH Zurich |
Billings, Gideon | University of Sydney, Australian Center for Field Robotics |
Williams, Stefan B. | University of Sydney |
Keywords: Marine Robotics, Computer Vision for Automation, Deep Learning for Visual Perception
Abstract: In this work, we address the problem of real-time dense depth estimation from monocular images for mobile underwater vehicles. We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions and solve the problem of scale ambiguity. To allow prior inputs of arbitrary sparsity, we apply a dense parameterization method. Our model extends recent state-of-the-art approaches to monocular image based depth estimation, using an efficient encoder-decoder backbone and modern lightweight transformer optimization stage to encode global context. The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea. Evaluation results on this dataset demonstrate significant improvement in depth prediction accuracy by the fusion of the sparse feature priors. In addition, without any retraining, our method achieves similar depth prediction accuracy on a downward looking dataset we collected with a diver operated camera rig, conducting a survey of a coral reef. The method achieves real-time performance, running at 24 FPS on a NVIDIA Jetson Xavier NX, 160 FPS on a NVIDIA RTX 2080 GPU and 7 FPS on a single Intel i9-9900K CPU core, making it suitable for direct deployment on embedded GPU systems. The implementation of this work is made publicly available at https://github.com/ebnerluca/uw_depth.
|
|
13:30-15:00, Paper TuBT22-NT.3 | Add to My Program |
Model-Based Underwater 6D Pose Estimation from RGB |
|
Sapienza, Davide | Unimore |
Govi, Elena | Unimore |
Aldhaheri, Sara | TII |
Marko, Bertgona | Unimore |
Roura, Eloy | Technology Innovation Institute |
Pairet Artau, Èric | Technology Innovation Institute |
Verucchi, Micaela | University of Modena and Reggio Emilia |
Ardón, Paola | Technology Innovation Institute |
Keywords: Marine Robotics, Data Sets for Robotic Vision, Engineering for Robotic Systems
Abstract: Object pose estimation underwater allows an autonomous system to perform tracking and intervention tasks. Nonetheless, underwater target pose estimation is remarkably challenging due to, among many factors, limited visibility, light scattering, cluttered environments, and constantly varying water conditions. An approach is to employ sonar or laser sensing to acquire 3D data, however, the data is not clear and the sensors expensive. For this reason, the community has focused on extracting pose estimates from RGB input. In this work, we propose an approach that leverages 2D object detection to reliably compute 6D pose estimates in different underwater scenarios. We test our proposal with 4 objects with symmetrical shapes and poor texture spanning across 33, 920 synthetic and 10 real scenes. All objects and scenes are made available in an open-source dataset that includes annotations for object detection and pose estimation. When benchmarking against similar end-to-end methodologies for 6D object pose estimation, our pipeline provides estimates that are 8% more accurate. We also demonstrate the real-world usability of our pose estimation pipeline on an underwater robotic manipulator in a reaching task.
|
|
13:30-15:00, Paper TuBT22-NT.4 | Add to My Program |
SONIC: Sonar Image Correspondence Using Pose Supervised Learning for Imaging Sonars |
|
Gode, Samiran | Carnegie Mellon University |
Hinduja, Akshay | Carnegie Mellon University |
Kaess, Michael | Carnegie Mellon University |
Keywords: Marine Robotics, Deep Learning for Visual Perception
Abstract: In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions,restricting vision to a few meters of often featureless expanses.This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets are made public on https://github.com/rpl-cmu/sonic to facilitate further development in the field.
|
|
13:30-15:00, Paper TuBT22-NT.5 | Add to My Program |
CVAE-SM: A Conditional Variational Autoencoder with Style Modulation for Efficient Uncertainty Quantification |
|
Ullah, Amin | Oregon State University |
Yan, Taiqing | Oregon State University |
Fuxin, Li | Oregon State University |
Keywords: Marine Robotics, Deep Learning for Visual Perception
Abstract: Deep learning has brought transformative advancements to object segmentation, especially in marine robotics contexts such as waste management and subaquatic infrastructure oversight. However, a central challenge persists: calibrating the prediction confidence of the model to ensure robust and reliable outcomes, especially within the demanding underwater environment. Existing solutions for estimating uncertainty are often computationally intensive and have largely centered around Bayesian neural networks or ensemble methods. In this paper, we present a Conditional Variational Autoencoder-based framework (CVAE-SM), which is capable of generating diverse latent codes for improved uncertainty quantification in image segmentation. Our method, enhanced by a style modulator, merges content features, and latent codes more effectively, leading to refined prediction of uncertainty levels. We further introduce a dataset of perturbed underwater images to benchmark uncertainty quantification in this domain. The proposed model not only surpasses peers in segmentation metrics but also matches ensemble models in uncertainty predictions, all while being 2.5 times faster.
|
|
13:30-15:00, Paper TuBT22-NT.6 | Add to My Program |
Beyond NeRF Underwater: Learning Neural Reflectance Fields for True Color Correction of Marine Imagery |
|
Zhang, Tianyi | Carnegie Mellon University |
Johnson-Roberson, Matthew | Carnegie Mellon University |
Keywords: Marine Robotics, Deep Learning for Visual Perception
Abstract: Underwater imagery often exhibits distorted coloration as a result of light-water interactions, which complicates the study of benthic environments in marine biology and geography. In this research, we propose an algorithm to restore the true color (albedo) in underwater imagery by jointly learning the effects of the medium and neural scene representations. Our approach models water effects as a combination of light attenuation with distance and backscattered light. The proposed neural scene representation is based on a neural reflectance field model, which learns albedos, normals, and volume densities of the underwater environment. We introduce a logistic regression model to separate water from the scene and apply distinct light physics during training. Our method avoids the need to estimate complex backscatter effects in water by employing several approximations, enhancing sampling efficiency and numerical stability during training. The proposed technique integrates underwater light effects into a volume rendering framework with end-to-end differentiability. Experimental results on both synthetic and real-world data demonstrate that our method effectively restores true color from underwater imagery, outperforming existing approaches in terms of color consistency.
|
|
13:30-15:00, Paper TuBT22-NT.7 | Add to My Program |
CaveSeg: Deep Semantic Segmentation and Scene Parsing for Autonomous Underwater Cave Exploration |
|
Abdullah, Adnan | University of Florida |
Barua, Titon | University of South Carolina |
Tibbetts, Reagan | University of South Carolina |
Chen, Zijie | Mississippi State University |
Islam, Md Jahidul | University of Florida |
Rekleitis, Ioannis | University of South Carolina |
Keywords: Marine Robotics, Deep Learning for Visual Perception, Data Sets for Robotic Vision
Abstract: In this paper, we present CaveSeg - the first visual learning pipeline for semantic segmentation and scene parsing for AUV navigation inside underwater caves. We address the problem of scarce annotated training data by preparing a comprehensive dataset for semantic segmentation of underwater cave scenes. It contains pixel annotations for important navigation markers (e.g., caveline, arrows), obstacles (e.g., ground plain and overhead layers), scuba divers, and open areas for servoing. Through comprehensive benchmark analyses on cave systems in USA, Mexico, and Spain locations, we demonstrate that robust deep visual models can be developed based on CaveSeg for fast semantic scene parsing of underwater cave environments. In particular, we formulate a novel transformer-based model that is computationally light and offers near real-time execution in addition to achieving state-of-the-art performance. Finally, we explore the design choices and implications of semantic segmentation for visual servoing by AUVs inside underwater caves. The proposed model and benchmark dataset open up promising opportunities for future research in autonomous underwater cave exploration and mapping.
|
|
13:30-15:00, Paper TuBT22-NT.8 | Add to My Program |
Discovering Biological Hotspots with a Passively Listening AUV |
|
McCammon, Seth | Woods Hole Oceanographic Institution |
Jamieson, Stewart | Massachusetts Institute of Technology |
Mooney, T. Aran | Woods Hole Oceanographic Instituttion |
Girdhar, Yogesh | Woods Hole Oceanographic Institution |
Keywords: Marine Robotics, Environment Monitoring and Management, Robot Audition
Abstract: We present a novel system which blends multiple distinct sensing modalities in audio-visual surveys to assist marine biologists in collecting datasets for understanding the ecological relationship of fish and other organisms with their habitats on and around coral reefs. Our system, designed for the CUREE AUV, uses four hydrophones to determine the bearing to biological sound sources through beamforming. These observations are merged in a Bayesian Occupancy Grid to produce a 2D map of the acoustic activity of a coral reef. Simultaneously, the AUV uses unsupervised topic modeling to identify different benthic habitats. Combining these maps allows us to determine the level of acoustic activity within each habitat. We demonstrated the system in field trials on reefs in the U.S. Virgin Islands, where it was able to autonomously discover the favored habitats of snapping shrimp (genus Alpheus).
|
|
13:30-15:00, Paper TuBT22-NT.9 | Add to My Program |
A Du-Octree Based Cross-Attention Model for LiDAR Geometry Compression |
|
Cui, Mingyue | Sun Yat-Sen University |
Feng, Mingjian | Sun Yat-Sen University |
Long, Junhua | Sun Yat-Sen University |
Hu, Daosong | Sun Yat-Sen University |
Zhao, Shuai | Sun Yat-Sen University |
Huang, Kai | Sun Yat-Sen University |
Keywords: Sensor Networks, Deep Learning for Visual Perception, Intelligent Transportation Systems
Abstract: Point cloud compression is an essential technology for the efficient storage and transmission of 3D data. Previous methods usually use hierarchical tree data structures for encoding the spatial sparseness of point clouds. However, the node context within the tree is not fully discovered since the feature space among nodes varies significantly. To address this problem, we innovatively represent the LiDAR points in a two-octree structure instead of using traditional single-octree coding, and then design the cross-attention model to capture the hierarchical features between different octrees, of which each octree incorporates a transformer-based deep entropy and an arithmetic encoder. Besides, we introduce the untied cross-aware position encoding with principal component analysis and different projection matrices, which enhances the correlations over two octrees's attention feature embeddings. Experimental results show that our method outperforms the previous state-of-the-art works, achieving up to 8.2% Bpp savings on point cloud benchmark datasets with different lasers.
|
|
TuBT23-NT Oral Session, NT-G401 |
Add to My Program |
Aerial Systems: Mechanics and Control II |
|
|
Chair: Kocer, Basaran Bahadir | Imperial College London |
Co-Chair: Martinoli, Alcherio | EPFL |
|
13:30-15:00, Paper TuBT23-NT.1 | Add to My Program |
Energy Consumption Modelling of Coaxial-Rotor in Vortex Ring State for Controllable High-Speed Descending |
|
Sun, Jiawei | Guangxi University |
Zhou, Xiang | Guangxi University |
Ban, Taoze | Co., Ltd. Mystical Bow Technology |
Zhao, Jiannan | Guangxi University |
Shuang, Feng | Guangxi University |
Keywords: Aerial Systems: Mechanics and Control, Robotics in Hazardous Fields, Dynamics
Abstract: The ability to fast climb and descend is crucial for Unmanned Aerial Vehicle (UAV) applications in the mountains. The slower descent speed will affect the UAV's working efficiency in reaching the rescue area. However, during the fast descent of the rotorcraft, a chaotic flow field rampages as the rotorcraft falls into its wake flow. This is known as the vortex ring. Therefore, the safe descent velocity of consumer UAVs is usually limited to approximately 3m/s. This limitation reduces the potential of UAVs to execute tasks in mountainous and plateau regions. To broaden the task capability constrained by the maximum descending speed, it is necessary to jointly analyze the flow field and the energy consumption during descending. Existing research mainly focused on how to avoid entering the vortex ring instead of offering sufficient power to fly with it. In this paper, in order to achieve an efficient rotorcraft for rescuing in mountainous and plateaus, we break through the maximum-descending-speed of a coaxial rotors UAV. Hence, a power consumption managing pipeline is proposed to extend the power tolerance of the UAV. Specifically, a theoretic model for the coaxial rotors is proposed to analyze the induced velocity and energy consumption during vertical descending. Then, the theoretic model is verified to be consistent with the Computational Fluid Dynamics (CFD) and wind tunnel experiment results. Finally, we optimized the tolerance of the power and dynamic system according to the theoretic model. With this pipeline, our real-time flight achieved 8m/s controlled vertical-descent-speed (CVDS), which is a leading result in both quadrotors and coaxial UAVs.
|
|
13:30-15:00, Paper TuBT23-NT.2 | Add to My Program |
Equilibria, Stability, and Sensitivity for the Aerial Suspended Beam Robotic System Subject to Parameter Uncertainty |
|
Gabellieri, Chiara | University of Twente |
Tognon, Marco | Inria Rennes |
Sanalitro, Dario | University of Catania |
Franchi, Antonio | University of Twente / Sapienza University of Rome |
Keywords: Aerial Systems: Mechanics and Control, Motion Control, Dynamics, Cooperative Aerial Manipulation
Abstract: This work studies how parametric uncertainties affect the cooperative manipulation of a cable-suspended beam-shaped load by means of two aerial robots not explicitly communicating with each other. In particular, the work sheds light on the impact of the uncertain knowledge of the model parameters available to an established communication-less force-based controller. First, we find the closed-loop equilibrium configurations in the presence of the aforementioned uncertainties, and then we study their stability. Hence, we show the fundamental role played in the robustness of the load attitude control by the internal force induced in the manipulated object by non-vertical cables. Furthermore, we formally study the sensitivity of the attitude error to such parametric variations, and we provide a method to act on the load position error in the presence of the uncertainties. Eventually, we validate the results through an extensive set of numerical tests in a realistic simulation environment including underactuated aerial vehicles and sagging-prone cables, and through hardware experiments.
|
|
13:30-15:00, Paper TuBT23-NT.3 | Add to My Program |
Aerial Tensile Perching and Disentangling Mechanism for Long-Term Environmental Monitoring |
|
Lan, Tian | Technical University of Munich |
Romanello, Luca | TUM |
Kovac, Mirko | Imperial College London |
Armanini, Sophie Franziska | Technical University of Munich |
Kocer, Basaran Bahadir | Imperial College London |
Keywords: Aerial Systems: Applications, Mechanism Design, Environment Monitoring and Management
Abstract: Aerial robots show significant potential for forest canopy research and environmental monitoring by providing data collection capabilities at high spatial and temporal resolutions. However, limited flight endurance hinders their application. Inspired by natural perching behaviors, we propose a multi-modal aerial robot system that integrates tensile perching for energy conservation and a suspended actuated pod for data collection. The system consists of a quadrotor drone, a slewing ring mechanism allowing 360° tether rotation, and a streamlined pod with two ducted propellers connected via a tether. Winding and unwinding the tether allows the pod to move within the canopy, and activating the propellers allows the tether to be wrapped around branches for perching or disentangling. We experimentally determined the minimum counterweights required for stable perching under various conditions. Building on this, we devised and evaluated multiple perching and disentangling strategies. Comparisons of perching and disentangling maneuvers demonstrate energy savings that could be further maximized with the use of the pod or of tether winding. These approaches can reduce energy consumption to only 22% and 1.5%, respectively, compared to a drone disentangling maneuver. We also calculated the minimum idle time required by the proposed system after the system perching and motor shut down to save energy on a mission, which is 48.9% of the operating time. Overall, the integrated system expands the operational capabilities and enhances the energy efficiency of aerial robots for long-term monitoring tasks.
|
|
13:30-15:00, Paper TuBT23-NT.4 | Add to My Program |
Millimeter-Level Pick and Peg-In-Hole Task Achieved by Aerial Manipulator |
|
Wang, Meng | Beihang University |
Chen, Zeshuai | Beihang University |
Guo, Kexin | Beihang University |
Yu, Xiang | Beihang University |
Zhang, Youmin | Concordia University |
Guo, Lei | Beihang University |
Wang, Wei | China Aerospace Science and Technology Corporation, Beijing Inst |
Keywords: Aerial Systems: Applications, Dexterous Manipulation, Assembly, Disturbance Dissolution
Abstract: Achieving accurate control performance of the end-effector is critical for practical applications of aerial manipulator. However, due to the presence of floating-base disturbance from the UAV platform and the kinematic error amplification effect from multi-link structure of the manipulator, it is extremely challenging to ensure the high-precision performance of aerial manipulator. Building upon the philosophy of disturbance rejection, we propose a predictive optimization scheme that allows aerial manipulator to successfully execute millimeter-level flying pick and peg-in-hole task. Firstly, the error amplification effect of the floating base is quantitatively analyzed by virtue of the aerial manipulator kinematics. Intuitively, it is found that if the further motion of the UAV platform is well predicted, the manipulator can directly counteract the floating disturbance by following a modified reference trajectory. Hence, a learning-based prediction approach is leveraged to rapidly forecast the UAV platform motion online. Subsequently, an optimization controller is formulated to follow the reference trajectory by incorporating multiple practical constraints of aerial manipulator.
|
|
13:30-15:00, Paper TuBT23-NT.5 | Add to My Program |
Lumped Drag Model Identification and Real-Time External Force Detection for Rotary-Wing Micro Aerial Vehicles |
|
Waelti, Lucas | EPFL |
Martinoli, Alcherio | EPFL |
Keywords: Aerial Systems: Applications, Environment Monitoring and Management, Calibration and Identification
Abstract: This work focuses on understanding and identifying the drag forces applied to a rotary-wing Micro Aerial Vehicle (MAV). We propose a lumped drag model that concisely describes the aerodynamical forces the MAV is subject to, with a minimal set of parameters. We only rely on commonly available sensor information onboard a MAV, such as accelerometer data, pose estimate, and throttle commands, which makes our method generally applicable. The identification uses an offline gradient-based method on flight data collected over specially designed trajectories. The identified model allows us to predict the aerodynamical forces experienced by the aircraft due to its own motion in real-time and, therefore, will be useful to distinguish them from external perturbations, such as wind or physical contact with the environment. The results show that we are able to identify the drag coefficients of a rotary-wing MAV through onboard flight data and observe the close correlation between the motion of the MAV, the measured external forces, and the predicted drag forces.
|
|
13:30-15:00, Paper TuBT23-NT.6 | Add to My Program |
Flight Validation of a Global Singularity-Free Aerodynamic Model for Flight Control of Tail Sitters |
|
Murali, Krishna | ISAE-SUPAERO |
Ponce Moreno, Elena | ISAE-Supaero |
Lustosa, Leandro | ISAE-SUPAERO |
Keywords: Aerial Systems: Mechanics and Control
Abstract: This work validates through flight tests a previously developed wide-envelope singularity-free aerodynamic framework, called phi-theory, for modeling dual-engine tail-sitting flying-wing vehicles for optimization-based control. The phi-theory methodology imposes a specific geometry on aerodynamic coefficients that leads to polynomial differential equations of motion amenable to semidefinite programming optimization. Through phi-theory, we illustrate a typical predicted longitudinal and lateral flight envelope of a tail-sitting vehicle, which, while commonplace for fixed-wing aircraft in performance textbooks, is a novel figure that generalizes fixed-wing doghouse plots to tail-sitting vehicles. This flight envelope figure suggests a novel, natural and intuitive remote piloting interface that we validate in flight tests. Furthermore, we further validate phi-theory through the computation of flight features in simulation and their subsequent observation in flight tests.
|
|
13:30-15:00, Paper TuBT23-NT.7 | Add to My Program |
Empirical Study of Ground Proximity Effects for Small-Scale Electroaerodynamic Thrusters |
|
Nations, Grant | University of Utah |
Nelson, Charles Luke | University of Utah |
Drew, Daniel S. | University of Utah |
Keywords: Aerial Systems: Mechanics and Control, Micro/Nano Robots, Actuation and Joint Mechanisms
Abstract: Electroaerodynamic (EAD) propulsion, where thrust is produced by collisions between electrostatically-accelerated ions and neutral air, is a potentially transformative method for indoor flight owing to its silent and solid-state nature. Like rotors, EAD thrusters exhibit changes in performance based on proximity to surfaces. Unlike rotors, they have no fragile and quickly spinning parts that have to avoid those surfaces; taking advantage of the efficiency benefits from proximity effects may be a route towards longer-duration indoor operation of ion-propelled fliers. This work presents the first empirical study of ground proximity effects for EAD propulsors, both individually and as quad-thruster arrays. It focuses on multi-stage ducted centimeter-scale actuators suitable for use on small robots envisioned for deployment in human-proximal and indoor environments. Three specific effects (ground, suckdown, and fountain lift), each occurring with a different magnitude at a different spacing from the ground plane, are investigated and shown to have strong dependencies on geometric parameters including thruster-to-thruster spacing, thruster protrusion from the fuselage, and inclusion of flanges or strakes. Peak thrust enhancement ranging from 300 to 600% is found for certain configurations operated in close proximity (0.2 mm) to the ground plane and as much as a 20% increase is measured even when operated centimeters away.
|
|
13:30-15:00, Paper TuBT23-NT.8 | Add to My Program |
The Weighted Markov-Dubins Problem |
|
Kumar, Deepak Prakash | Texas A&M University |
Darbha, Swaroop | TAMU |
Manyam, Satyanarayana Gupta | Infoscitex Corp |
Casbeer, David | AFRL |
Keywords: Aerial Systems: Mechanics and Control, Motion and Path Planning
Abstract: In this letter, a variation of the classical Markov- Dubins problem is considered, which deals with curvature constrained least-cost paths in a plane with prescribed initial and final configurations, different bounds for the sinistral and dextral curvatures, and penalties μL and μR for the sinistral and dextral turns, respectively. The addressed problem generalizes the classical Markov-Dubins problem and the asymmetric sinistral/dextral Markov-Dubins problem. The proposed formulation can be used to model an Unmanned Aerial Vehicle (UAV) with a penalty associated with a turn due to the required additional thrust to maintain altitude and airspeed while turning, or a UAV with different curvature bounds and costs for the sinistral and dextral turns due to hardware failures. Using optimal control theory, the main result of this letter shows that the optimal path belongs to a set of at most 21 candidate paths, each comprising of at most five segments. Unlike in the classical Markov-Dubins problem, the CCC path, which is a candidate path for the classical Markov-Dubins problem, is not optimal for the weighted Markov-Dubins problem. Moreover, the obtained list of candidate paths for the weighted Markov-Dubins problem reduces to the standard CSC and CCC paths and the corresponding degenerate paths when μL and μR approach zero.
|
|
13:30-15:00, Paper TuBT23-NT.9 | Add to My Program |
The Price of a Safe Flight: Risk Cost Based Path Planning |
|
Pilko, Aliaksei | University of Southampton |
Oakey, Andy | University of Southampton |
Ferraro, Mario | University of Southampton |
Scanlan, James | University of Southampton |
Keywords: Aerial Systems: Mechanics and Control, Motion and Path Planning, Robot Safety
Abstract: A risk aware UAS path planning methodology is proposed using monetary value as the sole cost metric. A third party ground risk model is used to generate a non-uniform costmap for a modified A* heuristic search. The Value of a Prevented Fatality provides a basis to convert fatality risk to monetary value terms as a Human Value at Risk (HVaR) measure. Additional operating and UAS Capital Value at Risk (CVaR) costs are modelled to provide a holistic monetary cost model for path cost minimisation. A number of future cost variants are investigated based upon prior work for a realistic urban-rural mix logistics case study in Southern England. Results show increasingly risk averse paths with decreasing future UAS operating costs.
|
|
TuBT24-NT Oral Session, NT-G402 |
Add to My Program |
Multi-Robot SLAM |
|
|
Chair: Kim, Ayoung | Seoul National University |
Co-Chair: Beltrame, Giovanni | Ecole Polytechnique De Montreal |
|
13:30-15:00, Paper TuBT24-NT.1 | Add to My Program |
Tight Fusion of Odometry and Kinematic Constraints for Multiple Aerial Vehicles in Physical Interconnection |
|
Fan, Yingjun | Beijing Institute of Technology |
Shi, Chuanbeibei | Univeristy of Bristol |
Lai, Ganghua | Beijing Institute of Technology |
Zhang, Ruiheng | Beijing Institute of Technology |
Yu, Yushu | Beijing Institute of Technology |
Sun, Fuchun | Tsinghua University |
Dong, Yiqun | Nanyang Technological University |
Keywords: Multi-Robot SLAM, Aerial Systems: Perception and Autonomy, Visual-Inertial SLAM
Abstract: Integrated aerial Platforms (IAPs), comprising multiple aircrafts, are typically fully actuated and hold significant potential for aerial manipulation tasks. Differing from a multiple aerial swarm, the aircrafts within the IAP are interconnected, presenting promising opportunities for enhancing localization. Incorporating the physical constraints of these multiple aircrafts to improve the accuracy and reliability of integrated aircraft positioning and navigation systems is a challenging yet highly significant problem. In this paper, we introduce a distributed multi-aircraft visual-inertial-range odometry system that analyzes the position, velocity, and attitude constraints within the IAP. Leveraging constraint relationships in the IAP, we propose corresponding methods that tightly fuse visual-inertial-range odometry and kinematic constraints to optimize odometry accuracy. Our system's performance is validated using a collected dataset, resulting in a notable 28.7% reduction in drift compared to the baseline.
|
|
13:30-15:00, Paper TuBT24-NT.2 | Add to My Program |
Robust Multi-Robot Global Localization with Unknown Initial Pose Based on Neighbor Constraints |
|
Zhang, Yaojie | Shenzhen Institute of Advanced Technology,Chinese Academy |
Luo, Haowen | Shenzhen Institute of Advanced Technology,Chinese Academy |
Wang, Weijun | Guangzhou Institute of Advanced Technology, Chinese Academy of Sc |
Feng, Wei | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Keywords: Multi-Robot SLAM, Localization, Mapping
Abstract: Multi-robot global localization (MR-GL) with unknown initial positions in a large scale environment is a challenging task. The key point is the data association between different robots' viewpoints. It also makes traditional Appearance-based localization methods unusable. Recently, researchers have utilized the object's semantic invariance to generate a semantic graph to address this issue. However, previous works lack robustness and are sensitive to the overlap rate of maps, resulting in unpredictable performance in real-world environments. In this paper, we propose a data association algorithm based on neighbor constraints to improve the robustness of the system. We demonstrate the effectiveness of our method on three different datasets, indicating a significant improvement in robustness compared to previous works.
|
|
13:30-15:00, Paper TuBT24-NT.3 | Add to My Program |
Swarm-SLAM: Sparse Decentralized Collaborative Simultaneous Localization and Mapping Framework for Multi-Robot Systems |
|
Lajoie, Pierre-Yves | École Polytechnique De Montréal |
Beltrame, Giovanni | Ecole Polytechnique De Montreal |
Keywords: Multi-Robot SLAM, Multi-Robot Systems, SLAM
Abstract: Collaborative Simultaneous Localization And Mapping (C-SLAM) is a vital component for successful multi-robot operations in environments without an external positioning system, such as indoors, underground or underwater. In this paper, we introduce Swarm-SLAM, an open-source C-SLAM system that is designed to be scalable, flexible, decentralized, and sparse, which are all key properties in swarm robotics. Our system supports lidar, stereo, and RGB-D sensing, and it includes a novel inter-robot loop closure prioritization technique that reduces communication and accelerates convergence. We evaluated our ROS 2 implementation on five different datasets, and in a real-world experiment with three robots communicating through an ad-hoc network. Our code is publicly available: https://github.com/MISTLab/Swarm-SLAM
|
|
13:30-15:00, Paper TuBT24-NT.4 | Add to My Program |
AutoFusion: Autonomous Visual Geolocation and Online Dense Reconstruction for UAV Cluster |
|
Zhang, Yizhu | Northwestern Polytechnical University |
Bu, Shuhui | Northwestern Polytechnical University |
Dong, Yifei | Northwestern Polytechnical University |
Yu, Zhang | NorthWestern Polytechnical University |
Li, Kun | Northwestern Polytechnical University |
Chen, Lin | Northwestern Polytechnical University |
Keywords: Multi-Robot SLAM, SLAM, Mapping
Abstract: Real-time dense reconstruction using Unmanned Aerial Vehicle (UAV) is becoming increasingly popular in large-scale rescue and environmental monitoring tasks. However, due to the energy constraints of a single UAV, the efficiency can be greatly improved through the collaboration of multi-UAVs. Nevertheless, when faced with unknown environments or the loss of Global Navigation Satellite System (GNSS) signal, most multi-UAV SLAM systems can't work, making it hard to construct a global consistent map. In this paper, we propose a real-time dense reconstruction system called AutoFusion for multiple UAVs, which robustly supports scenarios with lost global positioning and weak co-visibility. A method for Visual Geolocation and Matching Network (VGMN) is suggested by constructing a graph convolutional neural network as a feature extractor. It can acquire geographical location information solely through images. We also present a real-time dense reconstruction framework for multi-UAV with autonomous visual geolocation. UAV agents send images and relative positions to the ground server, which processes the data using VGMN for multi-agent geolocation optimization, including initialization, pose graph optimization, and map fusion. Extensive experiments demonstrate that our system can efficiently and stably construct large-scale dense maps in real-time with high accuracy and robustness.
|
|
13:30-15:00, Paper TuBT24-NT.5 | Add to My Program |
CoLRIO: LiDAR-Ranging-Inertial Centralized State Estimation for Robotic Swarms |
|
Zhong, Shipeng | Sun Yat-Sen University |
Chen, Hongbo | Sun Yat-Sen University |
Qi, Yuhua | Sun Yat-Sen University |
Feng, Dapeng | Sun Yat-Sen University |
Chen, Zhiqiang | Sun Yat-Sen University |
Jin, Wu | UESTC |
Wen, Weisong | Hong Kong Polytechnic University |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Multi-Robot SLAM, SLAM, Range Sensing
Abstract: Collaborative state estimation using heterogeneous multi-sensors is a fundamental prerequisite for robotic swarms operating in GPS-denied environments, presenting a formidable research challenge. In this work, we propose a centralized system designed to facilitate collaborative LiDAR-ranging-inertial state estimation in expansive environments, enabling robotic swarms to operate without the need for anchor deployment. The system optimally distributes computationally intensive tasks to a potent central server, thereby alleviating the computational burden on individual robots for local odometry calculations. The server back-end establishes a global reference by harnessing the shared data, refining the joint pose graph optimization through place recognition, global optimization, and the removal of redundant data to ensure an precise and robust collaborative state estimation. Extensive evaluations of our system using both public and our custom datasets showcase notable improvements in the accuracy of collaborative SLAM estimates. Furthermore, our system demonstrates its competence in large-scale missions, where ten robots collaborate seamlessly in performing SLAM tasks. To benefit the community, we will open-source our code at https://github.com/PengYu-team/Co-LRIO.
|
|
13:30-15:00, Paper TuBT24-NT.6 | Add to My Program |
Relative Localization Estimation for Multiple Robots Via the Rotating Ultra-Wideband Tag |
|
Liu, Jinxin | Nanyang Technological University |
Hu, Guoqiang | Nanyang Technological University, |
Keywords: Range Sensing, Multi-Robot Systems, Localization
Abstract: Most distributed algorithms for robot coordination require relative location information, but how to obtain relative locations in a distributed manner is still a primary problem to address in multi-robot applications. In order to obtain the relative locations between robots, no matter whether they are in relative motion or stationary situation, we design a rotating ultra-wideband tag to provide the persistency of excitation condition and two estimation algorithms to estimate the relative locations in a distributed manner. Moreover, our approach relies only on on-board sensors and requires only one ultra-Wideband tag per robot, eliminating the need for any ground anchors, thus allowing deployment in GNSS-denied environments without range restrictions. The proposed approach in this letter is also tested in simulations and experiments to verify the theoretical findings and effectiveness in practice.
|
|
13:30-15:00, Paper TuBT24-NT.7 | Add to My Program |
Asynchronous Multiple LiDAR-Inertial Odometry Using Point-Wise Inter-LiDAR Uncertainty Propagation |
|
Jung, Minwoo | Seoul National University |
Jung, Sangwoo | Seoul National University |
Kim, Ayoung | Seoul National University |
Keywords: Range Sensing, SLAM, Mapping
Abstract: In recent years, multiple Light Detection and Ranging (LiDAR) systems have grown in popularity due to their enhanced accuracy and stability from the increased field of view (FOV). However, integrating multiple LiDARs can be challenging, attributable to temporal and spatial discrepancies. Common practice is to transform points among sensors while requiring strict time synchronization or approximating transformation among sensor frames. Unlike existing methods, we elaborate the inter-sensor transformation using continuous time (CT) inertial measurement unit (IMU) modeling and derive associated ambiguity as a point-wise uncertainty. This uncertainty, modeled by combining the state covariance with the acquisition time and point range, allows us to alleviate the strict time synchronization and to overcome FOV difference. The proposed method has been validated on both public and our datasets and is compatible with various LiDAR manufacturers and scanning patterns. We open-source the code for public access at https://github.com/minwoo0611/MA-LIO.
|
|
13:30-15:00, Paper TuBT24-NT.8 | Add to My Program |
AutoMerge: A Framework for Map Assembling and Smoothing in City-Scale Environments |
|
Yin, Peng | City University of Hong Kong |
Zhao, Shiqi | University of California San Diego |
Lai, Haowen | University of Pennsylvania |
Ge, Ruohai | Carnegie Mellon Univeristy |
Zhang, Ji | Carnegie Mellon University |
Choset, Howie | CMU |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Map Merging, Mapping, Multi-Robot Systems, SLAM
Abstract: In the era of advancing autonomous driving and increasing reliance on geospatial information, high-precision mapping not only demands accuracy but also flexible construction. Current approaches mainly rely on expensive mapping devices, which are time-consuming for city-scale map construction and vulnerable to erroneous data associations without accurate GPS assistance. We present AutoMerge, a novel framework for merging large-scale maps that surpasses these limitations, which (i) provides robust place recognition performance despite differences in both translation and viewpoint, (ii) is capable of identifying and discarding incorrect loop closures caused by perceptual aliasing, and (iii) effectively associates and optimizes large-scale and numerous map segments in the real-world scenario. AutoMerge utilizes multi-perspective fusion and adaptive loop closure detection for accurate data associations, and it uses incremental merging to assemble large maps from individual trajectory segments given in random order and with no initial estimations. Furthermore, AutoMerge performs posegraph optimization after assembling the segments to smooth the merged map globally. We demonstrate AutoMe
|
|
TuBT25-NT Oral Session, NT-G403 |
Add to My Program |
Localization II |
|
|
Chair: Civera, Javier | Universidad De Zaragoza |
Co-Chair: Milford, Michael J | Queensland University of Technology |
|
13:30-15:00, Paper TuBT25-NT.1 | Add to My Program |
TP3M: Transformer-Based Pseudo 3D Image Matching with Reference Image |
|
Han, Liming | China Unicom |
Liu, Zhaoxiang | China Unicom |
Lian, Shiguo | China Unicom |
Keywords: Localization, SLAM
Abstract: Image matching is still challenging in such scenes with large viewpoints or illumination changes or with low textures. In this paper, we propose a Transformer-based pseudo 3D image matching method. It upgrades the 2D features extracted from the source image to 3D features with the help of a reference image and matches to the 2D features extracted from the destination image by the coarse-to-fine 3D matching. Our key discovery is that by introducing the reference image, the source image's fine points are screened and furtherly their feature descriptors are enriched from 2D to 3D, which improves the match performance with the destination image. Experimental results on multiple datasets show that the proposed method achieves the state-of-the-art on the tasks of homography estimation, pose estimation and visual localization especially in challenging scenes.
|
|
13:30-15:00, Paper TuBT25-NT.2 | Add to My Program |
Adaptive Outlier Thresholding for Bundle Adjustment in Visual SLAM |
|
Fontan, Alejandro | Queensland University of Technology |
Civera, Javier | Universidad De Zaragoza |
Milford, Michael J | Queensland University of Technology |
Keywords: Localization, SLAM
Abstract: State-of-the-art V-SLAM pipelines utilize robust cost functions and outlier rejection techniques to remove incorrect correspondences. However, these methods are typically fine-tuned to overfit certain benchmarks and struggle to adapt effectively to changes in the application domain or environmental conditions. This renders them impractical for many robotic applications in which robustness in a wide variety of conditions is essential. In this paper we introduce a novel distribution-based approach for online outlier rejection that reduces the necessity for scene-specific fine-tuning while simultaneously improving the overall SLAM performance. Through experiments across 3 different public datasets, we show that our approach consistently outperforms state-of-the-art methods in various real-world settings. Our code is available at https://github.com/alejandrofontan/ORB_SLAM2_Distribution
|
|
13:30-15:00, Paper TuBT25-NT.3 | Add to My Program |
From Satellite to Ground: Satellite Assisted Visual Localization with Cross-View Semantic Matching |
|
Guo, Xiyue | Zhejiang University |
Peng, Haocheng | Zhejiang University |
Hu, Junjie | The Chinese University of Hong Kong, Shenzhen |
Bao, Hujun | Zhejiang University |
Zhang, Guofeng | Zhejiang University |
Keywords: Localization, SLAM
Abstract: One of the key challenges of visual Simultaneous Localization and Mapping (SLAM) in large-scale environments is how to effectively use global localization to correct the cumulative errors from long-term tracking. This challenge presents itself in two main aspects: first, the difficulty for robots in revisiting previous locations to perform loop closure, and second, the considerable memory resources required to maintain point-cloud-based global maps. Recent solutions have resorted into neural networks, using satellite images as the references for ground-level localization. However, most of these methods merely provide cross-view patch-matching results, which leads to unfeasible in integration with the SLAM system. To address these issues, we present a semantic-based cross-view localization method. This approach combines semantic information with a reward and penalty mechanism, enabling us to obtain a global probability map and achieve precise 3-degree-of-freedom (3-DoF) localization. Based on that, we develop a SLAM system that capitalizes on satellite imagery for global localization. This strategy effectively bridges the gap between SLAM and real-world coordinates while also substantially reducing accumulated errors. Our experimental results demonstrate that our global localization method significantly outperforms existing satellite-based systems. Moreover, in scenarios where the robot struggles to find loop closures, employing our localization method improves the SLAM accuracy.
|
|
13:30-15:00, Paper TuBT25-NT.4 | Add to My Program |
Self-Supervised Learning of Monocular Visual Odometry and Depth with Uncertainty-Aware Scale Consistency |
|
Wang, Changhao | Northwestern Polytechnical University |
Zhang, Guanwen | Northwestern Polytechnical University, |
Zhou, Wei | Northwestern Polytechnical University, |
Keywords: Localization, SLAM
Abstract: The inherent scale ambiguity issue greatly limits the performance of monocular visual odometry. In recent years, a variety of methods have been proposed for self-supervised learning of ego-motion and depth estimation, incorporating specifically designed scale-consistency constraints that utilize estimated depth as a reference. However, these existing methods neglect the influence of the depth uncertainty introduced by the dominant photometric loss, which leads to unreliable depth estimation in difficult regions and detrimentally affects scale alignment. To solve these problems, we introduces a feature-based visual odometry learning system with an effective scale recovery strategy in this paper. Additionally, we propose a learning method to estimate the photometric-sensitive depth uncertainty for guiding the scale recovery. The proposed method is evaluated on KITTI odometry, and the experimental results demonstrate that our system can predict scale-consistent trajectories from monocular videos and achieves state-of-the-art performance. Moreover, the proposed method achieves competitive performance on KITTI depth estimation.
|
|
13:30-15:00, Paper TuBT25-NT.5 | Add to My Program |
Unifying Local and Global Multimodal Features for Place Recognition in Aliased and Low-Texture Environments |
|
García-Hernández, Alberto | Universidad De Zaragoza |
Giubilato, Riccardo | German Aerospace Center (DLR) |
Strobl, Klaus H. | German Aerospace Center (DLR) |
Civera, Javier | Universidad De Zaragoza |
Triebel, Rudolph | German Aerospace Center (DLR) |
Keywords: Localization, SLAM, Mapping
Abstract: Perceptual aliasing and weak textures pose significant challenges to the task of place recognition, hindering the performance of Simultaneous Localization and Mapping (SLAM) systems. This paper presents a novel model, called UMF (standing for Unifying Local and Global Multimodal Features) that 1) leverages multi-modality by cross-attention blocks between vision and LiDAR features, and 2) includes a re-ranking stage that re-orders based on local feature matching the top-k candidates retrieved using a global representation. Our experiments, particularly on sequences captured on a planetary-analogous environment, show that UMF outperforms significantly previous baselines in those challenging aliased environments. Since our work aims to enhance the reliability of SLAM in all situations, we also explore its performance on the widely used RobotCar dataset, for broader applicability. Code and models are available at https://github.com/DLR-RM/UMF.
|
|
13:30-15:00, Paper TuBT25-NT.6 | Add to My Program |
RELEAD: Resilient Localization with Enhanced LiDAR Odometry in Adverse Environments |
|
Chen, Zhiqiang | Sun Yat-Sen University |
Chen, Hongbo | Sun Yat-Sen University |
Qi, Yuhua | Sun Yat-Sen University |
Zhong, Shipeng | Sun Yat-Sen University |
Feng, Dapeng | Sun Yat-Sen University |
Jin, Wu | UESTC |
Wen, Weisong | Hong Kong Polytechnic University |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Localization, SLAM, Range Sensing
Abstract: LiDAR-based localization is valuable for applications like mining surveys and underground facility maintenance. However, existing methods can struggle when dealing with uninformative geometric structures in challenging scenarios. This paper presents RELEAD, a LiDAR-centric solution designed to address scan-matching degradation. Our method enables degeneracy-free point cloud registration by solving constrained ESIKF updates in the front end and incorporates multisensor constraints, even when dealing with outlier measurements, through graph optimization based on Graduated Non-Convexity (GNC). Additionally, we propose a robust Incremental Fixed Lag Smoother (rIFL) for efficient GNC-based optimization. RELEAD has undergone extensive evaluation in degenerate scenarios and has outperformed existing state-of-the-art LiDAR-Inertial odometry and LiDAR-Visual-Inertial odometry methods.
|
|
13:30-15:00, Paper TuBT25-NT.7 | Add to My Program |
Semantic-Focused Patch Tokenizer with Multi-Branch Mixer for Visual Place Recognition |
|
Xu, Zhenyu | CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems |
Ziliang, Ren | Dongguan University of Technology |
Zhang, Qieshi | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Jie, Lou | China Nuclear Power Operations Co., Ltd |
Tao, Dacheng | The University of Sydney |
Cheng, Jun | Shenzhen Institutes of Advanced Technology |
Keywords: Localization, SLAM, Recognition
Abstract: Visual Place Recognition (VPR) is critical for navigation and loop closure in autonomous driving tasks, mitigating the impact of shift errors caused by dynamic changes in the environment. Due to the limited ability of backbone networks and extreme environmental changes, current methods fail to capture foundational semantic details that include the distinctive attributes for unique place identification. To address this problem, we propose a new visual token-guided VPR framework that contains a semantic-focused patch tokenizer and a multi-branch Mixer. To mitigate the inference from place-unrelated objects, the semantic-focused patch tokenizer exploits attention-based channel selection and spatial partition, which efficiently captures important semantic information within the channels and preserve spatial relationships among the backbone features. To extract abstract features with spatial structure information, the multi-branch Mixer utilizes a multi-branch structure to aggregate local and global position information, improving the robustness of global representations to environmental changes. Experimental results demonstrate that our method outperforms state-of-the-art methods, achieving 85.3% Recall@1 on the MSLS_val dataset and 59.1% Recall@1 on the Nordland dataset when using ResNet18 as the backbone.
|
|
13:30-15:00, Paper TuBT25-NT.8 | Add to My Program |
FF-LINS: A Consistent Frame-To-Frame Solid-State-LiDAR-Inertial State Estimator |
|
Tang, Hailiang | Wuhan University |
Zhang, Tisheng | Wuhan University |
Niu, Xiaoji | Wuhan University |
Wang, Liqiang | Wuhan University |
Wei, Linfu | Wuhan University |
Jingnan, Liu | Wuhan University |
Keywords: Localization, SLAM, Visual-Inertial SLAM
Abstract: Most of the existing LiDAR-inertial navigation systems are based on frame-to-map registrations, leading to inconsistency in state estimation. The newest solid-state LiDAR with a non-repetitive scanning pattern makes it possible to achieve a consistent LiDAR-inertial estimator by employing a frame-to-frame data association. In this letter, we propose a robust and consistent frame-to-frame LiDAR-inertial navigation system (FF-LINS) for solid-state LiDARs. With the INS-centric LiDAR frame processing, the keyframe point-cloud map is built using the accumulated point clouds to construct the frame-to-frame data association. The LiDAR frame-to-frame and the inertial measurement unit (IMU) preintegration measurements are tightly integrated using the factor graph optimization, with online calibration of the LiDAR-IMU extrinsic and time-delay parameters. The experiments on the public and private datasets demonstrate that the proposed FF-LINS achieves superior accuracy and robustness than the state-of-the-art systems. Besides, the LiDAR-IMU extrinsic and time-delay parameters are estimated effectively, and the online calibration notably improves the pose accuracy.
|
|
13:30-15:00, Paper TuBT25-NT.9 | Add to My Program |
VioLA: Aligning Videos to 2D LiDAR Scans |
|
Chao, Jun-Jee | University of Minnesota |
Engin, Kazim Selim | Samsung Research America |
Chavan-Dafle, Nikhil | Samsung Research America |
Lee, Bhoram | SRI International |
Isler, Volkan | University of Minnesota |
Keywords: Localization, View Planning for SLAM, RGB-D Perception
Abstract: We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed point completion module improves the pose registration performance by up to 20%.
|
|
TuBT26-NT Oral Session, NT-G404 |
Add to My Program |
Mapping I |
|
|
Chair: Tan, U-Xuan | Singapore University of Techonlogy and Design |
|
13:30-15:00, Paper TuBT26-NT.1 | Add to My Program |
Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps |
|
Luo, Katie | Cornell University |
Weng, Xinshuo | NVIDIA Corporation |
Wang, Yan | NVIDIA |
Wu, Shuang | Nvidia |
Li, Jie | Toyota Research Institute |
Weinberger, Kilian | Cornell University |
Wang, Yue | USC |
Pavone, Marco | Stanford University |
Keywords: Mapping, Deep Learning for Visual Perception, Autonomous Vehicle Navigation
Abstract: Autonomous driving has traditionally relied heavily on costly and labor-intensive High Definition (HD) maps, hindering scalability. In contrast, Standard Definition (SD) maps are more affordable and have worldwide coverage, offering a scalable alternative. In this work, we systematically explore the effect of SD maps for real-time lane-topology understanding. We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SMERF, to leverage priors in SD maps for the lane-topology prediction task. This enhancement consistently and significantly boosts (by up to 60%) lane detection and topology prediction on current state-of-the-art online map prediction methods without bells and whistles and can be immediately incorporated into any Transformer-based lane-topology method. Code is available at https://github.com/NVlabs/SMERF.
|
|
13:30-15:00, Paper TuBT26-NT.2 | Add to My Program |
3QFP: Efficient Neural Implicit Surface Reconstruction Using Tri-Quadtrees and Fourier Feature Positional Encoding |
|
Sun, Shuo | Orebro University |
Mielle, Malcolm | Schindler |
Lilienthal, Achim J. | Orebro University |
Magnusson, Martin | Örebro University |
Keywords: Mapping, Deep Learning for Visual Perception, SLAM
Abstract: Neural implicit surface representations are currently receiving a lot of interest as a means to achieve high-fidelity surface reconstruction at a low memory cost, compared to traditional explicit representations. However, state-of-the-art methods still struggle with excessive memory usage and non-smooth surfaces. This is particularly problematic in large-scale applications with sparse inputs, as is common in robotics use cases. To address these issues, we first introduce a sparse structure, tri-quadtrees, which represents the environment using learnable features stored in three planar quadtree projections. Secondly, we concatenate the learnable features with a Fourier feature positional encoding. The combined features are then decoded into signed distance values through a small multi-layer perceptron. We demonstrate that this approach facilitates smoother reconstruction with a higher completion ratio with fewer holes. Compared to two recent baselines, one implicit and one explicit, our approach requires only 10%–50% as much memory, while achieving competitive quality
|
|
13:30-15:00, Paper TuBT26-NT.3 | Add to My Program |
Towards Large-Scale Incremental Dense Mapping Using Robot-Centric Implicit Neural Representation |
|
Liu, Jianheng | Harbin Institute of Technology Shenzhen, P.R. China |
Chen, Haoyao | Harbin Institute of Technology, Shenzhen |
Keywords: Mapping, Deep Learning Methods, Range Sensing
Abstract: Large-scale dense mapping is vital in robotics, digital twins, and virtual reality. Recently, implicit neural mapping has shown remarkable reconstruction quality. However, incremental large-scale mapping with implicit neural representations remains problematic due to low efficiency, limited video memory, and the catastrophic forgetting phenomenon. To counter these challenges, we introduce the Robot-centric Implicit Mapping (RIM) technique for large-scale incremental dense mapping. This method employs a hybrid representation, encoding shapes with implicit features via a multi-resolution voxel map and decoding signed distance fields through a shallow MLP. We advocate for a robot-centric local map to boost model training efficiency and curb the catastrophic forgetting issue. A decoupled scalable global map is further developed to archive learned features for reuse and maintain constant video memory consumption. Validation experiments demonstrate our method's exceptional quality, efficiency, and adaptability across diverse scales and scenes over advanced dense mapping methods using range sensors. Our system's code will be accessible at https://github.com/HITSZ-NRSL/RIM.git.
|
|
13:30-15:00, Paper TuBT26-NT.4 | Add to My Program |
Camera Relocalization in Shadow-Free Neural Radiance Fields |
|
Xu, Shiyao | Institute for AI Industry Research, Tsinghua University |
Liu, Caiyun | Institute for AI Industry Research, Tsinghua University |
Chen, Yuantao | Xi'an University of Architecture and Technology |
Zhu, Zhenxin | Beihang University |
Yan, Zike | Tsinghua University, Peking University |
Shi, Yongliang | Tsinghua University |
Zhao, Hao | Tsinghua University |
Zhou, Guyue | Tsinghua University |
Keywords: Mapping, Localization, Recognition
Abstract: Camera relocalization is a crucial problem in computer vision and robotics. Recent advancements in neural radiance fields (NeRFs) have shown promise in synthesizing photo-realistic images. Several works have utilized NeRFs for refining camera poses, but they do not account for lighting changes that can affect scene appearance and shadow regions, causing a degraded pose optimization process. In this paper, we propose a two-staged pipeline that normalizes images with varying lighting and shadow conditions to improve camera relocalization. We implement our scene representation upon a hash-encoded NeRF which significantly boosts up the pose optimization process. To account for the noisy image gradient computing problem in grid-based NeRFs, we further propose a re-devised truncated dynamic low-pass filter (TDLF) and a numerical gradient averaging technique to smoothen the process. Experimental results on several datasets with varying lighting conditions demonstrate that our method achieves state-of-the-art results in camera relocalization under varying lighting conditions. Code and data will be made publicly available.
|
|
13:30-15:00, Paper TuBT26-NT.5 | Add to My Program |
QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds |
|
Wu, Ji | Wuhan University |
Yu, Huai | Wuhan University |
Yang, Wen | Wuhan University |
Xia, Gui-Song | Wuhan University |
Keywords: Mapping, Object Detection, Segmentation and Categorization, Semantic Scene Understanding
Abstract: This paper presents a novel framework to learn a concise geometric primitive representation for 3D point clouds. Different from representing each type of primitive individually, we focus on the challenging problem of how to achieve a concise and uniform representation robustly. We employ quadrics to represent diverse primitives with only 10 parameters and propose the first end-to-end learning-based framework, namely QuadricsNet, to parse quadrics in point clouds. The relationships between quadrics mathematical formulation and geometric attributes, including the type, scale and pose, are insightfully integrated for effective supervision of QuaidricsNet. Besides, a novel pattern-comprehensive dataset with quadrics segments and objects is collected for training and evaluation. Experiments demonstrate the effectiveness of our concise representation and the robustness of QuadricsNet. Our code is available at url{https://github.com/MichaelWu99-lab/QuadricsNet}.
|
|
13:30-15:00, Paper TuBT26-NT.6 | Add to My Program |
ERASOR++: Height Coding Plus Egocentric Ratio Based Dynamic Object Removal for Static Point Cloud Mapping |
|
Zhang, Jiabao | Zhejiang University |
Zhang, Yu | Zhejiang University |
Keywords: Mapping, Range Sensing
Abstract: Mapping plays a crucial role in location and navigation within automatic systems. However, the presence of dynamic objects in 3D point cloud maps generated from scan sensors can introduce map distortion and long traces, thereby posing challenges for accurate mapping and navigation. To address this issue, we propose ERASOR++, an enhanced approach based on the Egocentric Ratio of Pseudo Occupancy for effective dynamic object removal. To begin, we introduce the Height Coding Descriptor, which combines height difference and height layer information to encode point cloud. Subsequently, we propose the Height Stack Test, Ground Layer Test, and Surrounding Point Test methods to precisely and efficiently identify the dynamic bins within point cloud bins, thus overcoming the limitations of prior approaches. Through extensive evaluation on open-source datasets, our approach demonstrates superior performance in terms of precision and efficiency compared to existing methods. Furthermore, the techniques described in our work hold promise for addressing various challenging tasks or aspects through subsequent migration.
|
|
13:30-15:00, Paper TuBT26-NT.7 | Add to My Program |
H2-Mapping: Real-Time Dense Mapping Using Hierarchical Hybrid Representation |
|
Jiang, Chenxing | The Hong Kong University of Science and Technology |
Zhang, Hanwen | Sun Yat-Sen University |
Liu, Peize | The Hong Kong University of Science and Technology, Robotic Inst |
Yu, Zehuan | Hong Kong University of Science and Technology |
Cheng, Hui | Sun Yat-Sen University |
Zhou, Boyu | Sun Yat-Sen University |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Mapping, RGB-D Perception, Visual Learning
Abstract: Constructing a high-quality dense map in real-time is essential for robotics, AR/VR, and digital twins applications. As Neural Radiance Field (NeRF) greatly improves the mapping performance, in this paper, we propose a NeRF-based mapping method that enables higher-quality reconstruction and real-time capability even on edge computers. Specifically, we propose a novel hierarchical hybrid representation that leverages implicit multiresolution hash encoding aided by explicit octree SDF priors, describing the scene at different levels of detail. This representation allows for fast scene geometry initialization and makes scene geometry easier to learn. Besides, we present a coverage-maximizing keyframe selection strategy to address the forgetting issue and enhance mapping quality, particularly in marginal areas. To the best of our knowledge, our method is the first to achieve high-quality NeRF-based mapping on edge computers of handheld devices and quadrotors in real-time. Experiments demonstrate that our method outperforms existing NeRF-based mapping methods in geometry accuracy, texture realism, and time consumption.
|
|
13:30-15:00, Paper TuBT26-NT.8 | Add to My Program |
Uncertainty-Aware 3D Object-Level Mapping with Deep Shape Priors |
|
Liao, Ziwei | University of Toronto |
Yang, Jun | University of Toronto |
Qian, Jingxing | University of Toronto |
Schoellig, Angela P. | TU Munich |
Waslander, Steven Lake | University of Toronto |
Keywords: Mapping, Semantic Scene Understanding, RGB-D Perception
Abstract: 3D object-level mapping is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. We propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected objects. The core idea is to leverage a learnt generative model for a category of object shapes as priors and to formulate a probabilistic, uncertainty-aware optimization framework for 3D reconstruction. We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions. Unlike current state-of-the-art approaches, we explicitly model the uncertainty of the object shapes and poses during our optimization, resulting in a high-quality object-level mapping system. Moreover, the estimated shape and pose uncertainties, which we demonstrate can accurately reflect the true errors of our object maps, can be useful for downstream robotics tasks such as active vision. We perform extensive evaluations on indoor and outdoor real-world datasets, achieving substantial improvements over state-of-the-art methods. Our code is available at https://github.com/TRAILab/UncertainShapePose.
|
|
13:30-15:00, Paper TuBT26-NT.9 | Add to My Program |
RoboHop: Segment-Based Topological Map Representation for Open-World Visual Navigation |
|
Garg, Sourav | University of Adelaide |
Rana, Krishan | Queensland University of Technology |
Hosseinzadeh, Mehdi | The Australian Institute for Machine Learning (AIML) -- the Univ |
Mares, Lachlan | University of Adelaide |
Sünderhauf, Niko | Queensland University of Technology |
Dayoub, Feras | The University of Adelaide |
Reid, Ian | University of Adelaide |
Keywords: Mapping, Vision-Based Navigation, Semantic Scene Understanding
Abstract: Mapping is crucial for spatial reasoning, planning and robot navigation. Existing approaches range from metric, which require precise geometry-based optimization, to purely topological, where image-as-node based graphs lack explicit object-level reasoning and interconnectivity. In this paper, we propose a novel topological representation of an environment based on `image segments', which are semantically meaningful and open-vocabulary queryable, conferring several advantages over previous works based on pixel-level features. Unlike 3D scene graphs, we create a purely topological graph with segments as nodes, where edges are formed by a) associating segment-level descriptors between pairs of consecutive images and b) connecting neighboring segments within an image using their pixel centroids. This unveils a `continuous sense of a place', defined by inter-image persistence of segments along with their intra-image neighbours. It further enables us to represent and update segment-level descriptors through neighborhood aggregation using graph convolution layers, which improves robot localization based on segment-level retrieval. Using real-world data, we show how our proposed map representation can be used to i) generate navigation plans in the form of `hops over segments' and ii) search for target objects using natural language queries describing spatial relations of objects. Furthermore, we quantitatively analyze data association at the segment level, which underpins inter-image connectivity during mapping and segment-level localization when revisiting the same place. Finally, we show preliminary trials on segment-level `hopping' based zero-shot real-world navigation. Project page with supplementary details: oravus.github.io/RoboHop/.
|
|
TuBT27-NT Oral Session, NT-G2 |
Add to My Program |
Grasping II |
|
|
Chair: Lan, Xuguang | Xi'an Jiaotong University |
Co-Chair: Maeda, Yusuke | Yokohama National University |
|
13:30-15:00, Paper TuBT27-NT.1 | Add to My Program |
Grasp Manipulation Relationship Detection Based on Graph Sample and Aggregation |
|
Luo, Jiayuan | Xi'an Jiaotong University |
Liu, YaXin | Xi'an Jiaotong University |
Wang, Han | Xi'an Jiaotong University |
Ding, Mengyuan | Xi'an Jiaotong University |
Lan, Xuguang | Xi'an Jiaotong University |
Keywords: Grasping, Manipulation Planning
Abstract: In multi-object stacking scenarios, exploring the relationships among objects and determining the correct sequence of operations are crucial for robotic manipulation. However, previous algorithms inefficiently combine global and local information, often focusing solely on the local features of objects or the interactions of object features at a global level. This approach leads to imbalanced distribution of features and the generation of redundant or missing relationships in complex scenes, such as multi-object stacking and partial occlusion. To address this issue, we have developed a grasp manipulation relationship detection algorithm called Graph Sampling Aggregation Network for Visual Manipulation Relationship Detection (GSAGED). This algorithm assists robots in detecting targets in complex scenes and determining the appropriate grasping order. Firstly, the Positional Encoding Module in GSAGED enhances object feature information by considering global contexts. Secondly, the Graph Sampling Aggregation method effectively integrates global and local information, relieving imbalanced distribution of features. Finally, we applied the developed algorithm to a physical robot for grasping. Experimental results on the Visual Manipulation Relationship Dataset (VMRD) and the large-scale relational grasp dataset named REGRAD demonstrate that our method significantly improves the accuracy of relationship detection in complex scenes and exhibits robust generalization capabilities in real-world applications.
|
|
13:30-15:00, Paper TuBT27-NT.2 | Add to My Program |
Acoustic Soft Tactile Skin (AST Skin) |
|
Rajendran, S. Vishnu | University of Lincoln |
Mandil, Willow | University of Lincoln |
Nazari, Kiyanoush | University of Lincoln |
Parsons, Simon | University of Lincoln |
Ghalamzan Esfahani, Amir Masoud | University of Surrey |
Keywords: In-Hand Manipulation, Force and Tactile Sensing, Force Control
Abstract: This paper presents a novel acoustic soft tactile (AST) skin technology operating with sound waves. In this innovative approach, the sound waves generated by a speaker travel in channels embedded in a soft membrane and get modulated due to a deformation of the channel when pressed by an external force and received by a microphone at the end of the channel. The sensor leverages regression and classification methods for estimating the normal force and its contact location. Our sensor can be affixed to any robot part, e.g., end effectors or arm. We tested several regression and classifier methods to learn the relation between sound wave modulation, the applied force, and its location, respectively and picked the best-performing models for force and location predictions. The best skin configurations yield more than 93% of the force estimation within ±1.5 N tolerances for a range of 0-30 +1 N and contact locations with over 96% accuracy. We also demonstrated the performance of AST Skin technology for a real-time gripping force control application.
|
|
13:30-15:00, Paper TuBT27-NT.3 | Add to My Program |
Domain Randomization for Sim2real Transfer of Automatically Generated Grasping Datasets |
|
Huber, Johann | ISIR, Sorbonne Université |
Hélénon, François | Sorbonne Université |
Watrelot, Hippolyte Christian Sébastien | Sorbonne Université ISIR |
Ben Amar, Faiz | Université Pierre Et Marie Curie, Paris 6 |
Doncieux, Stéphane | Sorbonne University |
Keywords: Grasping, Evolutionary Robotics, Data Sets for Robot Learning
Abstract: Robotic grasping refers to making a robotic system pick an object by applying forces and torques on its surface. Many recent studies use data-driven approaches to address grasping, but the sparse reward nature of this task made the learning process challenging to bootstrap. To avoid constraining the operational space, an increasing number of works propose grasping datasets to learn from. But most of them are limited to simulations. The present paper investigates how automatically generated grasps can be exploited in the real world. More than 7000 reach-and-grasp trajectories have been generated with Quality-Diversity (QD) methods on 3 different arms and grippers, including parallel fingers and a dexterous hand, and tested in the real world. Conducted analysis on the collected measure shows correlations between several Domain Randomization-based quality criteria and sim-to-real transferability. Key challenges regarding the reality gap for grasping have been identified, stressing matters on which researchers on grasping should focus in the future. A QD approach has finally been proposed for making grasps more robust to domain randomization, resulting in a transfer ratio of 84% on the Franka Research 3 arm.
|
|
13:30-15:00, Paper TuBT27-NT.4 | Add to My Program |
Kinematic Synergy Primitives for Human-Like Grasp Motion Generation |
|
Starke, Julia | Karlsruhe Institute of Technology |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Grasping, Multifingered Hands, Human and Humanoid Motion Analysis and Synthesis
Abstract: Grasping with five-fingered humanoid hands is a complex control problem. Throughout the entire grasping motion, all finger joints need to be coordinated to achieve a stable grasp. Grasp synergies provide a simplified, low-dimensional representation of grasp postures and motions, that can be used for the description of human grasps as well as the generation of novel, human-like grasps. However, the abstract synergy representation complicates the association of relevant high-level grasp parameters, as for example the grasp type and final posture or the grasp speed. Therefore, it is difficult to control these grasp characteristics in the synergy space. This paper presents an adaptable representation for kinematic grasping motions in synergy space, that allows the generation of novel, human-like grasps under direct control of high-level grasp parameters. It is based on via-point movement primitives trained on synergy trajectories of human grasping motions. The representation using synergy primitives allows for a straightforward adaptation of grasp characteristics while preserving the essential grasping motion learned from human demonstration. The kinematic synergy primitives have a low reproduction error of 3.9% of the maximum finger joint angle and are able to generate successful grasps on a simulated human hand and a real prosthetic hand.
|
|
13:30-15:00, Paper TuBT27-NT.5 | Add to My Program |
VFAS-Grasp: Closed Loop Grasping with Visual Feedback and Adaptive Sampling |
|
Piacenza, Pedro | Samsung Research America |
Yuan, Jiacheng | University of Minnesota |
Huh, Jinwook | Samsung |
Isler, Volkan | University of Minnesota |
Keywords: Grasping, Perception for Grasping and Manipulation
Abstract: We consider the problem of closed-loop robotic grasping and present a novel planner which uses Visual Feedback and an uncertainty-aware Adaptive Sampling strategy (VFAS) to close the loop. At each iteration, our method VFAS-Grasp builds a set of candidate grasps by generating random perturbations of a seed grasp. The candidates are then scored using a novel metric which combines a learned grasp-quality estimator, the uncertainty in the estimate and the distance from the seed proposal to promote temporal consistency. Additionally, we present two mechanisms to improve the efficiency of our sampling strategy: We dynamically scale the sampling region size and number of samples in it based on past grasp scores. We also leverage a motion vector field estimator to shift the center of our sampling region. We demonstrate that our algorithm can run in real time (20 Hz) and is capable of improving grasp performance for static scenes by refining the initial grasp proposal. We also show that it can enable grasping of slow moving objects, such as those encountered during human to robot handover.
|
|
13:30-15:00, Paper TuBT27-NT.6 | Add to My Program |
The Fractal Hand-II: Reviving a Classic Mechanism for Contemporary Grasping Challenges |
|
Tisdale, Malcolm | The California Institute of Technology |
Burdick, Joel | California Institute of Technology |
Keywords: Grasping, Grippers and Other End-Effectors, Multifingered Hands
Abstract: This paper and its companion propose a new fractal robotic gripper, drawing inspiration from the century-old Fractal Vise. The unusual synergistic properties allow it to passively conform to diverse objects using only one actuator. Designed to be easily integrated with prevailing parallel jaw grippers, it alleviates the complexities tied to perception and grasp planning, especially when dealing with unpredictable object poses and geometries. We build on the foundational principles of the Fractal Vise to a broader class of gripping mechanisms and address the limitations that had led to its obscurity. Two Fractal Fingers, coupled with a closing actuator, can form an adaptive and synergistic Fractal Hand. We articulate a design methodology for low-cost, easy-to-fabricate, large workspace, and compliant Fractal Fingers. The companion paper delves into the kinematics and grasping properties of a specific class of Fractal Fingers and Hands.
|
|
13:30-15:00, Paper TuBT27-NT.7 | Add to My Program |
ICGNet: A Unified Approach for Instance-Centric Grasping |
|
Zurbrügg, René | ETH Zürich |
Liu, Yifan | ETH Zurich |
Engelmann, Francis | ETH Zurich |
Kumar, Suryansh | ETH Zurich |
Hutter, Marco | ETH Zurich |
Patil, Vaishakh | RSL ETH Zurich |
Yu, Fisher | ETH Zürich |
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation
Abstract: Accurate grasping is the key to several robotic tasks including assembly and household robotics. Executing a successful grasp in a cluttered environment requires multiple levels of scene understanding: First, the robot needs to analyze the geometric properties of individual objects to find feasible grasps. These grasps need to be compliant with the local object geometry. Second, for each proposed grasp, the robot needs to reason about the interactions with other objects in the scene. Finally, the robot must compute a collision-free grasp trajectory while taking into account the geometry of the target object. Most grasp detection algorithms directly predict grasp poses in a monolithic fashion, which does not capture the composability of the environment. In this paper, we introduce an end-to-end architecture for object-centric grasping. The method uses pointcloud data from a single arbitrary viewing direction as an input and generates an instance-centric representation for each partially observed object in the scene. This representation is further used for object reconstruction and grasp detection in cluttered table-top scenes. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets, indicating superior performance for grasping and reconstruction. Additionally, we demonstrate real-world applicability by decluttering scenes with varying numbers of objects. Videos and Code at icgraspnet.github.io
|
|
13:30-15:00, Paper TuBT27-NT.8 | Add to My Program |
The Grasp Reset Mechanism: An Automated Apparatus for Conducting Grasping Trials |
|
DuFrene, Kyle | Oregon State University |
Nave, Keegan | Oregon State University |
Campbell, Joshua | Southwest Research Institute |
Balasubramanian, Ravi | Oregon State University |
Grimm, Cindy | Oregon State University |
Keywords: Grasping, Mechanism Design, Performance Evaluation and Benchmarking
Abstract: Advancing robotic grasping and manipulation requires the ability to test algorithms and/or train learning models on large numbers of grasps. Towards the goal of more advanced grasping, we present the Grasp Reset Mechanism (GRM), a fully automated apparatus for conducting large-scale grasping trials. The GRM automates the process of resetting a grasping environment, repeatably placing an object in a fixed location and controllable 1-D orientation. It also collects data and swaps between multiple objects enabling robust dataset collection with no human intervention. We also present a standardized state machine interface for control, which allows for integration of most manipulators with minimal effort. In addition to the physical design and corresponding software, we include a dataset of 1,020 grasps. The grasps were created with a Kinova Gen3 robot arm and Robotiq 2F-85 Adaptive Gripper to enable training of learning models and to demonstrate the capabilities of the GRM. The dataset includes ranges of grasps conducted across four objects and a variety of orientations. Manipulator states, object pose, video, and grasp success data are provided for every trial.
|
|
13:30-15:00, Paper TuBT27-NT.9 | Add to My Program |
Model-Based Runtime Monitoring with Interactive Imitation Learning |
|
Liu, Huihan | University of Texas, Austin |
Dass, Shivin | UT Austin |
Martín-Martín, Roberto | University of Texas at Austin |
Zhu, Yuke | The University of Texas at Austin |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Deep Learning Methods
Abstract: Robot learning methods have recently made great strides but generalization and robustness challenges still hinder their widespread deployment. Failing to detect potential failures and learn to solve them renders state-of-the-art learning systems not combat-ready for high-stakes tasks. Recent advancements in interactive imitation learning have proposed a promising framework for human-robot teaming, enabling the robots to operate safely and to continually improve their performances through deployment data. Nonetheless, existing methods typically require constant human supervision and preemptive feedback, limiting their usability in realistic domains. In this work, we aim to endow a robot with the ability to monitor and detect errors during runtime task execution. We introduce MoMo, a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures. Unlike prior work that cannot foresee future failures or requires failure experiences for training, MoMo learns a latent-space dynamics model and a failure classifier that combined enable MoMo to simulate future action outcomes and detect out-of-distribution states and high-risk situations preemptively. We train MoMo within an interactive imitation learning framework, where it continually updates the model from the experiences of the human-robot team collected from trustworthy deployments. Consequently, our method reduces the human workload needed over time while ensuring reliable task execution. We demonstrate that MoMo outperforms the baselines across system-level and unit-test metrics, with on average 23% and 40% higher success rates in simulation and on physical hardware, respectively. More information at https://ut-austin-rpl.github.io/sirius-runtime-monitor/
|
|
TuBT28-NT Oral Session, NT-G4 |
Add to My Program |
Grippers and Other End-EffectorsII |
|
|
Chair: Stuart, Hannah | UC Berkeley |
Co-Chair: Sakaino, Sho | University of Tsukuba |
|
13:30-15:00, Paper TuBT28-NT.1 | Add to My Program |
The Fractal Hand--I: A Non-Anthropomorphic, but Synergistic, Adaptable Gripper |
|
Burdick, Joel | California Institute of Technology |
Tisdale, Malcolm | The California Institute of Technology |
Keywords: Grasping, Grippers and Other End-Effectors, Multifingered Hands
Abstract: This paper introduces a novel Fractal Hand robotic gripper. The hand has only 1 actuator, but (2^{n+1}-1) joints, where n is a design parameter that defines the depth of the fingers' tree structures. The hand is synergistic in its operation (because its joint movements are highly coupled through the hand's interaction with the grasped object), but it is not anthropomorphic. The basic finger and hand geometry, governing kinematics, and quasi-statics mechanics of rigid and compliant versions of the hand are developed. These analyses remarkably show that under very mild constraints on the hand design, the hand is compliantly stable at every equilibrium condition. Therfore, the Fractal Hand adapts to a very wide range of planar objects with a single design. A companion paper introduces a design methodology for this new class of robot hands, and multiple prototypes.
|
|
13:30-15:00, Paper TuBT28-NT.2 | Add to My Program |
The Double-Scoop Gripper: A Tendon-Driven Soft-Rigid End-Effector for Food Handling Exploiting Constraints in Narrow Spaces |
|
Franco, Leonardo | University of Siena |
Turco, Enrico | Istituto Italiano Di Tecnologia |
Bo, Valerio | Istituto Italiano Di Tecnologia |
Pozzi, Maria | University of Siena |
Malvezzi, Monica | University of Siena |
Prattichizzo, Domenico | University of Siena |
Salvietti, Gionata | University of Siena |
Keywords: Grippers and Other End-Effectors, Grasping, Soft Robot Applications
Abstract: Food handling is a challenging task for robotic grippers, which are required to manipulate highly deformable and fragile items, that can be easily damaged. Moreover, ingredients for the preparation of the different dishes are usually stored in small containers that are often not easily accessible. This paper introduces an innovative soft-rigid, tendon-driven gripper: the Double-Scoop Gripper (DSG). Its two-fingered design exploits a specialized structure to cope with constrained spaces (e.g., containers in narrow shelves). The DSG can delicately grasp objects of various shapes by employing two scoop-shaped fingertips that can form a single plate when fingers are flexed. Data obtained from an on-board camera are used to detect the food item features and plan the grasping strategy that better exploits the possible environmental constraints regulating the opening of the two fingers and the approaching direction of the gripper. DSG capabilities are verified with experiments conducted using real food ingredients within a pick-and-place setup to evaluate both the grasping and the releasing capability of the gripper. Obtained results are promising and suggest that this approach could be particularly advantageous in the context of automated food serving
|
|
13:30-15:00, Paper TuBT28-NT.3 | Add to My Program |
Co-Designing Manipulation Systems Using Task-Relevant Constraints |
|
Vaish, Apoorv | TU Berlin |
Brock, Oliver | Technische Universität Berlin |
Keywords: Grippers and Other End-Effectors, Hardware-Software Integration in Robotics, Methods and Tools for Robot System Design
Abstract: A robotic system's hardware and control policy must be co-optimized to ensure they complement each other to interact robustly with the environment. However, this combined search is extremely high-dimensional and intractable without a suitable underlying representation. This paper uses environmental constraints to structure the co-design space for manipulation. We show that task-relevant constraints encode regions of the search space containing reasonable co-design solutions. Furthermore, this underlying representation renders a co-design space amenable to gradient-based optimization. For efficient search, we present the co-design Jacobian that describes how the robot's motion varies with control as well as hardware design changes. This Jacobian exploits the structure induced by environmental constraints for iterative design updates in the co-design space. Using these two conceptual tools, we co-design manipulators, grippers, and multi-fingered hands, showing that environmental constraints are an effective representation for co-designing diverse manipulation systems. Our methodology scales well with increased co-design parameters, rendering the co-design of complex, high-dimensional manipulation systems feasible.
|
|
13:30-15:00, Paper TuBT28-NT.4 | Add to My Program |
Squirrel-Inspired Tendon-Driven Passive Gripper for Agile Landing |
|
Wang, Stanley | University of California, Berkeley |
Kuang, Duyi | University of California, Berkeley |
Lee, Sebastian | University of California, Berkeley |
Full, Robert | University of California at Berkeley |
Stuart, Hannah | UC Berkeley |
Keywords: Grippers and Other End-Effectors, Biologically-Inspired Robots, Actuation and Joint Mechanisms
Abstract: Squirrels exhibit agile leaping between tree branches, often using non-prehensile gripping with compliant and passively adaptive fingers. We aim to test the utility of such gripping in agile robotic maneuvering. In the present study, we first examine the parametric design of a squirrel-inspired underactuated gripper for passive landing on impact. We fix the geometry of the gripper and vary the joint stiffness and contact conditions. We find that stiffer fingers with soft foam pads enlarge the landing sufficiency region. Specifically, friction appears to enlarge horizontal error tolerance, while joint stiffness and pad damping allow for higher impact speeds. Thus, these features should be considered in the design of future agile robot hands and feet that include high impact landings on rods with pose inaccuracy.
|
|
13:30-15:00, Paper TuBT28-NT.5 | Add to My Program |
HASHI: Highly Adaptable Seafood Handling Instrument for Manipulation in Industrial Settings |
|
Allison, Austin | Northeastern University |
Hanson, Nathaniel | Massachusetts Institute of Technology |
Wicke, Sebastian | Northeastern University |
Padir, Taskin | Northeastern University |
Keywords: Grippers and Other End-Effectors, Dexterous Manipulation, Kinematics
Abstract: The seafood processing industry provides fertile ground for robotics to impact the future-of-work from multiple perspectives including productivity, worker safety, and quality of work life. The robotics research challenge is the realization of flexible and reliable manipulation of soft, deformable, slippery, spiky and scaly objects. In this paper, we propose a novel robot end effector, called HASHI, that employs chopstick-like appendages for precise and dexterous manipulation. This gripper is capable of in-hand manipulation by rotating its two constituent sticks relative to each other and offers control of objects in all three axes of rotation by imitating human use of chopsticks. HASHI delicately positions and orients food through embedded 6-axis force-torque sensors. We derive and validate the kinematic model for HASHI, as well as demonstrate grip force and torque readings from the sensorization of each chopstick. We also evaluate the versatility of HASHI through grasping trials of a variety of real and simulated food items with varying geometry, weight, and firmness.
|
|
13:30-15:00, Paper TuBT28-NT.6 | Add to My Program |
All the Feels: A Dexterous Hand with Large-Area Tactile Sensing |
|
Bhirangi, Raunaq Mahesh | Carnegie Mellon University |
DeFranco, Abigail | Carnegie Mellon University |
Adkins, Jacob | University of Alberta |
Majidi, Carmel | Carnegie Mellon University |
Gupta, Abhinav | Carnegie Mellon University |
Hellebrekers, Tess | Meta AI Research |
Kumar, Vikash | Meta AI |
Keywords: Grippers and Other End-Effectors, Force and Tactile Sensing, Soft Sensors and Actuators
Abstract: High cost and lack of reliability has precluded the widespread adoption of dexterous hands in robotics. Furthermore, the lack of a viable tactile sensor capable of sensing over the entire area of the hand impedes the rich, low-level feedback that would improve learning of dexterous manipulation skills. This paper introduces an inexpensive, modular, and robust platform - the D'Manus - aimed at resolving these challenges while satisfying the large-scale data collection demands of deep robot learning paradigms. Studies on human manipulation point to the criticality of low-level tactile feedback in performing everyday dexterous tasks. The D'Manus comes with ReSkin sensing on the entire surface of the palm as well as the fingertips. We also demonstrate the generalizability of tactile models trained with the fully integrated system in a tactile-aware task - bin-picking and sorting. Code, documentation, design files, detailed assembly instructions, trained models, task videos, and all supplementary materials required to recreate the setup can be found on https://sites.google.com/view/dmanus.
|
|
13:30-15:00, Paper TuBT28-NT.7 | Add to My Program |
Soft and Rigid Object Grasping with Cross-Structure Hand Using Bilateral Control-Based Imitation Learning |
|
Yamane, Koki | University of Tsukuba |
Saigusa, Yuki | University of Tsukuba |
Sakaino, Sho | University of Tsukuba |
Tsuji, Toshiaki | Saitama University |
Keywords: Imitation Learning, Force Control, Grasping
Abstract: Object grasping is an important ability required for various robot tasks. In particular, tasks that require precise force adjustments during operation, such as grasping an unknown object or using a grasped tool, are difficult for humans to program in advance. Recently, AI-based algorithms that can imitate human force skills have been actively explored as a solution. In particular, bilateral control-based imitation learning achieves human-level motion speeds with environmental adaptability, only requiring human demonstration and without programming. However, owing to hardware limitations, its grasping performance remains limited, and tasks that involves grasping various objects are yet to be achieved. Here, we developed a cross-structure hand to grasp various objects. We experimentally demonstrated that the integration of bilateral control-based imitation learning and the cross-structure hand is effective for grasping various objects and harnessing tools.
|
|
13:30-15:00, Paper TuBT28-NT.8 | Add to My Program |
GRASP: Grocery Robot’s Adhesion and Suction Picker |
|
Hajj-Ahmad, Amar | Stanford University |
Kaul, Lukas | Toyota Research Institute |
Matl, Carolyn | Toyota Research Institute |
Cutkosky, Mark | Stanford University |
Keywords: Grippers and Other End-Effectors, Mechanism Design, Biomimetics
Abstract: We present a solution to the separate challenges faced by suction cups and gecko adhesives for one-sided grasping of heavy, irregular items. The gripping technology combines suction with adhesion for grasping and placing a wide range of objects in packed spaces. Applications include shopping and restocking in retail and warehouse settings where products vary in size and weight and are packed tightly, which limits access. A single suction cup is compact enough to reach and grasp the smallest items (down to 5 cm in size) but cannot provide the shear force needed for handling bulky items. Gecko-inspired adhesion provides extra lifting capability for objects up to 2.3 kg, using a 7.6 x 12.7 cm adhesive swatch – 2.5x heavier than with suction alone. The adhesive is fabricated on a flexible nylon fabric. A small fan blows gently to help the fabric conform to irregular surfaces prior to lifting.
|
|
13:30-15:00, Paper TuBT28-NT.9 | Add to My Program |
Improved Generalization of Probabilistic Movement Primitives for Manipulation Trajectories |
|
Yao, Xueyang | University of Waterloo |
Chen, Yinghan | University of Waterloo |
Tripp, Bryan Patrick | University of Waterloo |
Keywords: Imitation Learning, Learning from Demonstration
Abstract: Imitation learning methods have proven effective in learning robotic tasks by leveraging multiple human-controlled demonstrations. However, existing approaches often struggle to generalize across a wide range of tasks, such as extrapolating to unseen object locations, incorporating via-point modulation, accurately modeling orientation, handling trajectories with multiple options, and capturing aiming actions. In this study, we propose a novel framework that combines ideas from task-parameterized Gaussian mixture models and probabilistic movement primitives to address these limitations and satisfy all the aforementioned properties within a single framework. We conduct comprehensive evaluations of our approach on four real-life tasks: pick-and-place, water pouring, shooting a hockey puck into a net, and sweeping.
|
|
TuBT29-NT Oral Session, NT-G5 |
Add to My Program |
Object Detection I |
|
|
Chair: Martín-Martín, Roberto | University of Texas at Austin |
Co-Chair: Yang, Yongliang | Shenyang Institute of Automation, CAS |
|
13:30-15:00, Paper TuBT29-NT.1 | Add to My Program |
Road Obstacle Detection Based on Unknown Objectness Scores |
|
Noguchi, Chihiro | Toyota Motor Corporation |
Ohgushi, Toshiaki | Toyota Motor Corporation |
Yamanaka, Masao | Toyota Motor Corporation |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: The detection of unknown traffic obstacles is vital to ensure safe autonomous driving. The standard object-detection methods cannot identify unknown objects that are not included under predefined categories. This is because object-detection methods are trained to assign a background label to pixels corresponding to the presence of unknown objects. To address this problem, the pixel-wise anomaly-detection approach has attracted increased research attention. Anomaly-detection techniques, such as uncertainty estimation and perceptual difference from reconstructed images, make it possible to identify pixels of unknown objects as out-of-distribution (OoD) samples. However, when applied to images with many unknowns and complex components, such as driving scenes, these methods often exhibit unstable performance. The purpose of this study is to achieve stable performance for detecting unknown objects by incorporating the object-detection fashions into the pixel-wise anomaly detection methods. To achieve this goal, we adopt a semantic-segmentation network with a sigmoid head that simultaneously provides pixel-wise anomaly scores and objectness scores. Our experimental results show that the objectness scores play an important role in improving the detection performance. Based on these results, we propose a novel anomaly score by integrating these two scores, which we term as unknown objectness score. Quantitative evaluations show that the proposed method outperforms state-of-the-art methods when applied to the publicly available datasets.
|
|
13:30-15:00, Paper TuBT29-NT.2 | Add to My Program |
PVTransformer: Point-To-Voxel Transformer for Scalable 3D ObjectDetection |
|
Leng, Zhaoqi | Waymo LLC |
Sun, Pei | Waymo |
He, Tong | Waymo LLC |
Anguelov, Dragomir | Waymo |
Tan, Mingxing | Waymo Research |
Keywords: Object Detection, Segmentation and Categorization
Abstract: 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection. Our key idea is to replace the PointNet pooling operation with an attention module, leading to a better point-to-voxel aggregation function. Our design respects the permutation invariance of sparse 3D points while being more expressive than the pooling-based PointNet. Experimental results show our PVTransformer achieves much better performance compared to the latest 3D object detectors. On the widely used Waymo Open Dataset, our PVTransformer achieves state-of-the-art 76.5 mAPH L2, outperforming the prior art of SWFormer by +1.7 mAPH L2.
|
|
13:30-15:00, Paper TuBT29-NT.3 | Add to My Program |
Object-Centric Cross-Modal Feature Distillation for Event-Based Object Detection |
|
Li, Lei | ETH Zurich |
Liniger, Alexander | ETH Zurich |
Millhaeusler, Mario | Huawei Zurich |
Tsiminaki, Vagia | Huawei Zurich |
Li, Yuanyou | Huawei |
Dai, Dengxin | ETH Zurich |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: Event cameras are gaining popularity due to their unique properties, such as their low latency and high dynamic range. One task where these benefits can be crucial is real-time object detection. However, RGB detectors still outperform event-based detectors due to the sparsity of the event data and missing visual details. In this paper, we propose a cross-modality feature distillation method that can focus on regions where the knowledge distillation works best to shrink the detection performance gap between these two modalities. We achieve this by using an object-centric slot attention mechanism that can iteratively decouple feature maps into object-centric features and corresponding pixel-features used for distillation. We evaluate our novel distillation approach on a synthetic and a real event dataset with aligned grayscale images as a teacher modality. We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector, nearly halving the performance gap with respect to the teacher.
|
|
13:30-15:00, Paper TuBT29-NT.4 | Add to My Program |
Hierarchical Point Attention for Indoor 3D Object Detection |
|
Shu, Manli | University of Maryland, College Park |
Xue, Le | Salesforce Research |
Yu, Ning | Netflix |
Martín-Martín, Roberto | University of Texas at Austin |
Xiong, Caiming | Salesforce Inc |
Goldstein, Tom | University of Maryland |
Niebles, Juan Carlos | Stanford University |
Xu, Ran | Salesforce |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: 3D object detection is an essential vision technique for various robotic systems, such as augmented reality and domestic robots. Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer restrains its ability to learn features at different scales. Such limitation makes transformer detectors perform worse on smaller objects and affects their reliability in indoor environments where small objects are the majority. This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors. First, we propose Aggregated Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning. Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals. Both attention operations are model-agnostic network modules that can be plugged into existing point cloud transformers for end-to-end training. We evaluate our method on two widely used indoor detection benchmarks. By plugging our proposed modules into the state-of-the-art transformer-based 3D detectors, we improve the previous best results on both benchmarks, with more significant improvements on smaller objects.
|
|
13:30-15:00, Paper TuBT29-NT.5 | Add to My Program |
Frame Fusion with Vehicle Motion Prediction for 3D Object Detection |
|
Li, Xirui | Shanghai Jiao Tong University |
Wang, Feng | TuSimple |
Wang, Naiyan | TuSimple |
Ma, Chao | Shanghai Jiao Tong University |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Automation, AI-Based Methods
Abstract: In LiDAR-based 3D detection, history point clouds contain rich temporal information helpful for future prediction. In the same way, history detections should contribute to future detections. In this paper, we propose a detection enhancement method, namely FrameFusion, which improves 3D object detection results by fusing history detection frames. In FrameFusion, we ``forward'' history frames to the current frame and apply weighted Non-Maximum-Suppression on dense bounding boxes to obtain a fused frame with merged boxes. To ``forward'' frames, we use vehicle motion models to estimate the future pose of the bounding boxes. Our method is flexible in motion model selection. We explore three motion models in our work and show how the unicycle model and the bicycle model improve turning cases. On Waymo Open Dataset, our FrameFusion method consistently improves the performance of various 3D detectors by about 2.0 vehicle level 2 APH with negligible latency and slightly enhances the performance of the temporal fusion method MPPNet. We also conduct extensive experiments on motion model selection.
|
|
13:30-15:00, Paper TuBT29-NT.6 | Add to My Program |
FG-PFE: Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection |
|
Park, Konyul | Hanyang University |
Kim, Yecheol | Hanyang University |
Koh, Junho | Hanyang University |
Park, Byungwoo | Hanyang University |
Choi, Jun Won | Seoul National University |
Keywords: Object Detection, Segmentation and Categorization, AI-Based Methods, Deep Learning Methods
Abstract: Autonomous vehicles require real-time, high- performance 3D object detectors to guarantee system robust- ness and safety. Recent point cloud-based 3D object detectors are mainly categorized into three types based on input representation: point-based, voxel-based, and pillar-based. Among these, pillar-based models are most suitable for onboard deployment due to their light architecture. Despite their advantages, pillar-based methods often underperform compared to voxel-based and point-based, largely due to their coarse representation and simplistic architecture design. While most recent research has aimed to improve the backbone network to address this performance gap, we argue that there exist a room for improvement for the Pillar Feature Encoding (PFE) stage. We demonstrate that with sufficient representational power, pillar-based methods can achieve performance comparable to other representations. To achieve this, we introduce fine-grained pillar feature encoding (FG-PFE), which utilizes spatio-temporal virtual (STV) grids for fine-grained representation. We also present the attentive pillar aggregation module designed to selectively aggregate essential pillar features. Extensive experiments conducted on the nuScenes dataset show that our FG-PFE not only requires less computational power but also achieves significant performance gains compared to the baseline.
|
|
13:30-15:00, Paper TuBT29-NT.7 | Add to My Program |
Efficient Semantic Segmentation for Compressed Video |
|
Cai, Jiaxin | Fuzhou University |
Li, Qi | Fuzhou University |
Shen, Yulin | Fuzhou University |
Pan, Jia | University of Hong Kong |
Liu, Wenxi | Fuzhou University |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Robots, constrained by limited onboard computing resources, often encounter situations wherein high-resolution and high-bit-rate videos captured by their cameras necessitate compression before further analysis. In this paper, we propose a novel video semantic segmentation paradigm for compressed video. Specifically, our framework draws the inspiration from the principle of Wavelet Transform, and thus we design the network structure, WTDecomNet, approximating the decomposition of high-resolution image into its low-resolution counterpart and axial details. The aim is to well preserve the image content through decomposition and maintain model efficiency by obtaining semantics from low-resolution image. To facilitate this purpose, we propose an efficient axial subband approximation module for extracting axial details and a lightweight temporal alignment module for associating keyframes and non-keyframes of compressed video. Through comprehensive experiments, we show that our model can achieve the state-of-the-art performance on public benchmarks. Especially on CamVid, comparing to baseline, our proposed model reduces the computational overhead by 70% while improving mIoU by 4%.
|
|
13:30-15:00, Paper TuBT29-NT.8 | Add to My Program |
Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving |
|
Zhili, Chen | Hong Kong University of Science and Technology |
Pham, Trung Kien | The Hong Kong University of Science and Technology |
Ye, Maosheng | HKUST |
Shen, Zhiqiang | MBZUAI |
Chen, Qifeng | HKUST |
Keywords: Object Detection, Segmentation and Categorization
Abstract: We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving. Traditional point-based 3D object detectors often employ architectures that rely on a progressive downsampling of points. While this method effectively reduces computational demands and increases receptive fields, it will compromise the preservation of crucial non-local information for accurate 3D object detection, especially in the complex driving scenarios. To address this, we introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector by efficiently modeling longer-range inter-dependency while including only a negligible overhead. Concretely, the Cross-Cluster Shifting operation enhances the conventional design by shifting partial channels from neighboring clusters, which enables richer interaction with non-local regions and thus enlarges the receptive field of clusters. We conduct extensive experiments on the KITTI, Waymo, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD in both detection accuracy and runtime efficiency.
|
|
13:30-15:00, Paper TuBT29-NT.9 | Add to My Program |
BEE-Net: Bridging Semantic and Instance with Gated Encoding and Edge Constraint for Efficient Panoptic Segmentation |
|
Huang, Xinyang | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhang, Guanghui | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Sun, Yunpeng | Lotus Robotics |
Shi, Wenjun | Shanghai Institute of Microsystem and Information Technology |
Ye, Gang | Lotus Robotics |
Xiao, Yang | Lotus Technology Ltd |
Wang, Lei | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhang, Xiaolin | Shanghai Institute of Microsystem and Information Technology, Chi |
Li, Bo | Lotus Technology Ltd |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Panoptic segmentation is a challenging perception task, which can help robots to comprehensively perceive the surrounding environment. In the task, we notice that semantic, instance, and panoptic have rich relations, however, which are rarely explored. In this work, we propose a novel panoptic, instance, and semantic bridged network to delve into the reciprocal relation. To make semantic and instance benefit from each other, we design a novel Gated Encoding (GE) module, incorporating complementary cues between semantic and instance heads through the gated mechanism. In addition, a novel edge-aware consistency constraint among edges of each task is presented, which exhaustedly exploit geometric constraints, to boost the segmentation quality of challenging edges. Experimental results on the Cityscapes and MS-COCO datasets demonstrate that our approach achieves state-of-the-art performance in an efficient CNN-based paradigm, attaining a balance between accuracy and efficiency.
|
|
TuBT30-NT Oral Session, NT-G6 |
Add to My Program |
AI-Enabled Robotics I |
|
|
Chair: Gonzalez Arenas, Montserrat | Google |
Co-Chair: Busam, Benjamin | Technical University of Munich |
|
13:30-15:00, Paper TuBT30-NT.1 | Add to My Program |
Toward AI-Enabled Commercial Telepresence Robots to Combine Home Care Needs and Affordability |
|
Beraldo, Gloria | National Research Council of Italy |
De Benedictis, Riccardo | CNR-ISTC |
Cesta, Amedeo | CNR -- National Research Council of Italy, ISTC |
Fracasso, Francesca | National Research Council of Italy |
Cortellessa, Gabriella | CNR -- National Research Council of Italy, ISTC |
Keywords: AI-Enabled Robotics, Human-Centered Robotics, Social HRI
Abstract: As life expectancy increases, social and health assistance requires sustainable and affordable solutions possibly usable from one’s own domestic environment. In this article, we propose a transformer-based approach combined with a task-planning system and enhanced with several AI sub-modules to run on low-cost telepresence robots in order to support more advanced and autonomous assistance services. The proposed system allows to dynamically generate and adapt in autonomy heterogeneous robotic actions according to the information emerged during the interaction. The AI-enhanced telepresence robot was assessed in an unstructured domestic environment by 10 users. The results show an accuracy of more than 95% in the expected robot’s functioning. The participants judged the system efficient, useful, and intuitive, and showed a positive inclination to re-use the robot in the future. Such outcomes derive both from a proper coordination among the heterogeneous AI sub-modules in the system, and from the fast capability to frequently co-adapt the interaction.
|
|
13:30-15:00, Paper TuBT30-NT.2 | Add to My Program |
SliceIt! - a Dual Simulator Framework for Learning Robot Food Slicing |
|
Beltran-Hernandez, Cristian Camilo | Omron Sinic X |
Erbetti, Nicolas | Omron Sinic X |
Hamaya, Masashi | OMRON SINIC X Corporation |
Keywords: AI-Enabled Robotics, Domestic Robotics, Hardware-Software Integration in Robotics
Abstract: Cooking robots can enhance the home experience by reducing the burden of daily chores. However, these robots must perform their tasks dexterously and safely in shared human environments, especially when handling dangerous tools such as kitchen knives. This study focuses on enabling a robot to autonomously and safely learn food-cutting tasks. More specifically, our goal is to enable a collaborative robot or industrial robot arm to perform food-slicing tasks by adapting to varying material properties using compliance control. Our approach involves using Reinforcement Learning (RL) to train a robot to compliantly manipulate a knife, by reducing the contact forces exerted by the food items and by the cutting board. However, training the robot in the real world can be inefficient, and dangerous, and result in a lot of food waste. Therefore, we proposed SliceIt!, a framework for safely and efficiently learning robot food-slicing tasks in simulation. Following a real2sim2real approach, our framework consists of collecting a few real food slicing data, calibrating our dual simulation environment (a high-fidelity cutting simulator and a robotic simulator), learning compliant control policies on the calibrated simulation environment, and finally, deploying the policies on the real robot.
|
|
13:30-15:00, Paper TuBT30-NT.3 | Add to My Program |
SG-Bot: Object Rearrangement Via Coarse-To-Fine Robotic Imagination on Scene Graphs |
|
Zhai, Guangyao | Technical University of Munich |
Cai, Xiaoni | Technical University of Munich |
Huang, Dianye | Technical University of Munich |
Di, Yan | Technical University of Munich |
Manhardt, Fabian | Google |
Tombari, Federico | Technische Universität München |
Navab, Nassir | TU Munich |
Busam, Benjamin | Technical University of Munich |
Keywords: AI-Enabled Robotics, Deep Learning in Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: Object rearrangement is pivotal in robotic-environment interactions, representing a significant capability in embodied AI. In this paper, we present SG-Bot, a novel rearrangement framework that utilizes a coarse-to-fine scheme with a scene graph as the scene representation. Unlike previous methods that rely on either known goal priors or zero-shot large models, SG-Bot exemplifies lightweight, real-time, and user-controllable characteristics, seamlessly blending the consideration of commonsense knowledge with automatic generation capabilities. SG-Bot employs a three-fold procedure—observation, imagination, and execution—to adeptly address the task. Initially, objects are discerned and extracted from a cluttered scene during the observation. These objects are first coarsely organized and depicted within a scene graph, guided by either commonsense or user-defined criteria. Then, this scene graph subsequently informs a generative model, which forms a fine-grained goal scene considering the shape information from the initial scene and object semantics. Finally, for execution, the initial and envisioned goal scenes are matched to formulate robotic action policies. Experimental results demonstrate that SG-Bot outperforms competitors by a large margin.
|
|
13:30-15:00, Paper TuBT30-NT.4 | Add to My Program |
Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems? |
|
Chen, Yongchao | Harvard University |
Arkin, Jacob | Massachusetts Institute of Technology |
Zhang, Yang | IBM |
Roy, Nicholas | Massachusetts Institute of Technology |
Fan, Chuchu | Massachusetts Institute of Technology |
Keywords: AI-Enabled Robotics, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via prompting techniques, such as in-context learning or re-prompting with state feedback, placing new importance on the token budget for the context window. An under-explored but natural next direction is to investigate LLMs as multi-robot task planners. However, long-horizon, heterogeneous multi-robot planning introduces new challenges of coordination while also pushing up against the limits of context window length. It is therefore critical to find token-efficient LLM planning frameworks that are also able to reason about the complexities of multi-robot coordination. In this work, we compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents. We find that a hybrid framework achieves better task success rates across all four tasks and scales better to more agents. We further demonstrate the hybrid frameworks in 3D simulations where the vision-to-text problem and dynamical errors are considered. Please see project website for prompts, videos, and codes.
|
|
13:30-15:00, Paper TuBT30-NT.5 | Add to My Program |
Object-Centric Instruction Augmentation for Robotic Manipulation |
|
Wen, Junjie | East China Normal University |
Zhu, Yichen | Midea Group |
Zhu, MinJie | East China Normal University |
Li, Jinming | Shanghai University |
Xu, Zhiyuan | Midea Group |
Che, Zhengping | Midea Group |
Shen, Chaomin | East China Normal University |
Peng, Yaxin | Shanghai University |
Liu, Dong | Midea Group (Shanghai) Co., Ltd |
Feng, Feifei | Midea Group |
Tang, Jian | Midea Group (Shanghai) Co., Ltd |
Keywords: AI-Enabled Robotics, Deep Learning in Grasping and Manipulation, Learning from Demonstration
Abstract: Humans interpret scenes by recognizing both the identities and positions of objects in their observations. For a robot to perform tasks such as "pick and place", understanding both what the objects are and where they are located is crucial. While the former has been extensively discussed in the literature that uses the large language model to enrich the text descriptions, the latter remains underexplored. In this work, we introduce the Object-Centric Instruction Augmentation (OCI) framework to augment highly semantic and information-dense language instruction with position cues. We utilize a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction, thus aiding the policy network in mastering actions for versatile manipulation. Additionally, we present a feature reuse mechanism to integrate the vision-language features from off-the-shelf pre-trained MLLM into policy networks. Through a series of simulated and real-world robotic tasks, we demonstrate that robotic manipulator imitation policies trained with our enhanced instructions outperform those relying solely on traditional language instructions.
|
|
13:30-15:00, Paper TuBT30-NT.6 | Add to My Program |
Learning to Play Foosball: System and Baselines |
|
Moos, Janosch | TU Darmstadt, Institute for Mechatronic Systems |
Derstroff, Cedric | Technische Universität Darmstadt |
Schröder, Niklas | TU Darmstadt, Institute for Mechatronic Systems |
Clever, Debora | TU Darmstadt, Institute of Mechatronic Systems |
Keywords: AI-Enabled Robotics, Industrial Robots, Machine Learning for Robot Control
Abstract: This work stages Foosball as a versatile platform for advancing scientific research, particularly in the realm of robot learning. We present an automated Foosball table along with its corresponding simulated counterpart, showcasing a diverse range of challenges through example tasks within the Foosball environment. Initial findings are shared using a simple baseline approach. Foosball constitutes a versatile learning environment with the potential to yield cutting-edge research in various fields of artificial intelligence and machine learning, notably robust learning, while also extending its applicability to industrial robotics and automation setups. To transform our physical Foosball table into a research-friendly system, we augmented it with a 2 degrees of freedom kinematic chain to control the goalkeeper rod as an initial setup with the intention to be extended to the full game as soon as possible. Our experiments reveal that a realistic simulation is essential for mastering complex robotic tasks, yet translating these accomplishments to the real system remains challenging, often accompanied by a performance decline. This emphasizes the critical importance of research in this direction. In this concern, we spotlight the automated Foosball table as an invaluable tool, possessing numerous desirable attributes, to serve as a demanding learning environment for advancing robotics and automation research.
|
|
13:30-15:00, Paper TuBT30-NT.7 | Add to My Program |
Language-Conditioned Robotic Manipulation with Fast and Slow Thinking |
|
Zhu, MinJie | East China Normal University |
Zhu, Yichen | Midea Group |
Li, Jinming | Shanghai University |
Wen, Junjie | East China Normal University |
Xu, Zhiyuan | Midea Group |
Che, Zhengping | Midea Group |
Shen, Chaomin | East China Normal University |
Peng, Yaxin | Shanghai University |
Liu, Dong | Midea Group (Shanghai) Co., Ltd |
Feng, Feifei | Midea Group |
Tang, Jian | Midea Group (Shanghai) Co., Ltd |
Keywords: AI-Enabled Robotics, Deep Learning in Grasping and Manipulation, Planning, Scheduling and Coordination
Abstract: The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple enquote{pick-and-place} to tasks requiring intent recognition and visual reasoning. Inspired by the dual-process theory in cognitive science—which suggests two parallel systems of fast and slow thinking in human decision-making—we introduce textit{Robotics with Fast and Slow Thinking (RFST)}, a framework that mimics human cognitive architecture to classify tasks and makes decisions on two systems based on instruction types. Our RFST consists of two key components: 1) an instruction discriminator to determine which system should be activated based on the current user's instruction, and 2) a slow-thinking system that is comprised of a fine-tuned vision-language model aligned with the policy networks, which allow the robot to recognize user's intention or perform reasoning tasks. To assess our methodology, we built a dataset featuring real-world trajectories, capturing actions ranging from spontaneous impulses to tasks requiring deliberate contemplation. Our results, both in simulation and real-world scenarios, confirm that our approach adeptly manages intricate tasks that demand intent recognition and reasoning.
|
|
13:30-15:00, Paper TuBT30-NT.8 | Add to My Program |
How to Prompt Your Robot: A Prompt Book for Manipulation Skills with Code As Policies |
|
Gonzalez Arenas, Montserrat | Google |
Xiao, Ted | Google |
Singh, Sumeet | Google |
Jain, Vidhi | Carnegie Mellon University |
Ren, Allen Z. | Princeton University |
Vuong, Quan | UC San Diego |
Varley, Jacob | Google |
Herzog, Alexander | X, Inc. (Google) |
Leal, Isabel | Google Deepmind |
Kirmani, Sean | Google DeepMind |
Prats, Mario | Google |
Sadigh, Dorsa | Stanford University |
Sindhwani, Vikas | Google Brain, NYC |
Rao, Kanishka | Google |
Liang, Jacky | Google |
Zeng, Andy | Google DeepMind |
Keywords: AI-Enabled Robotics, Motion Control, Mobile Manipulation
Abstract: Large Language Models (LLMs) have demonstrated the ability to perform semantic reasoning, planning and write code for robotics tasks. However, most methods rely on pre-existing primitives, which heavily limits their scalability to new scenarios. Additionally, they use examples prompting style where the LLM is provided few-shot examples of robot code. This presents a challenge for LLMs to implicitly infer task information, constraints, and API usage from examples alone. Meanwhile, research outside robotics has successfully studied instruction-based prompting, where providing LLMs with API documentation and detailed descriptions can improve code synthesis capabilities. However, it is not clear how to document robotics tasks and naively providing full robot APIs presents a challenge to context-length limits in LLMs. However, it is not clear how to document robotics tasks and providing full robot APIs presents a challenge to context-length limits in LLMs. In this work, we discuss how to combine different LLM prompting styles to write code for new manipulation skills. Firstly, we evaluate different prompting styles across 3 robots in a high-level sorting task, and present a collection of empirical observations: (i)including both instructions and examples improves performance, (ii)interleaving state predictions in the examples helps reasoning,(iii)instruction-based prompting benefits from human feedback. Our observations lead to a prompt recipe we refer to as PromptBook that combines: example-based, instruction-based and chain-of-thought prompting to write robot code; as well as a method to build the prompt leveraging LLMs and human feedback. Secondly, we show PromptBook can write code for new low-level manipulation skills on the fly zero-shot. The prompt extracts motion trajectories from LLMs that the robot can execute directly with an IK controller. Finally, we evaluate the new skills on a mobile manipulator with 83% success rate at picking, 50-71% at opening drawers.
|
|
13:30-15:00, Paper TuBT30-NT.9 | Add to My Program |
A Multifidelity Sim-To-Real Pipeline for Verifiable and Compositional Reinforcement Learning |
|
Neary, Cyrus | The University of Texas at Austin |
Ellis, Christian | University of Massachusetts |
Samyal, Aryaman Singh | The University of Texas at Austin |
Lennon, Craig | United States Army Research Laboratory |
Topcu, Ufuk | The University of Texas at Austin |
Keywords: AI-Enabled Robotics, Reinforcement Learning
Abstract: We propose and demonstrate a compositional framework for training and verifying reinforcement learning (RL) systems within a multifidelity sim-to-real pipeline, in order to deploy reliable and adaptable RL policies on physical hardware. By decomposing complex robotic tasks into component subtasks and defining mathematical interfaces between them, the framework allows for the independent training and testing of the corresponding subtask policies, while simultaneously providing guarantees on the overall behavior that results from their composition. By verifying the performance of these subtask policies using a multifidelity simulation pipeline, the framework not only allows for efficient RL training, but also for a refinement of the subtasks and their interfaces in response to challenges arising from discrepancies between simulation and reality. In an experimental case study we apply the framework to train and deploy a compositional RL system that successfully pilots a Warthog unmanned ground robot.
|
|
TuBT31-NT Oral Session, NT-G7 |
Add to My Program |
Factory/Assembly Automation |
|
|
Chair: Liu, Fei | Chongqing University |
Co-Chair: Wan, Weiwei | Osaka University |
|
13:30-15:00, Paper TuBT31-NT.1 | Add to My Program |
Bridging the Sim-To-Real Gap with Dynamic Compliance Tuning for Industrial Insertion |
|
Zhang, Xiang | University of California, Berkeley |
Tomizuka, Masayoshi | University of California |
Li, Hui | Autodesk Research |
Keywords: Assembly, Compliance and Impedance Control, Machine Learning for Robot Control
Abstract: Contact-rich manipulation tasks often exhibit a large sim-to-real gap. For instance, industrial assembly tasks frequently involve tight insertions where the clearance is less than 0.1 mm and can even be negative when dealing with a deformable receptacle. This narrow clearance leads to complex contact dynamics that are difficult to model accurately in simulation, making it challenging to transfer simulation-learned policies to real-world robots. In this paper, we propose a novel framework for robustly learning manipulation skills for real-world tasks using only the simulated data. Our framework consists of two main components: the ``Force Planner'' and the ``Gain Tuner''. The Force Planner is responsible for planning both the robot motion and desired contact forces, while the Gain Tuner dynamically adjusts the compliance control gains to accurately track the desired contact forces during task execution. The key insight of this work is that by adaptively adjusting the robot's compliance control gains during task execution, we can modulate contact forces in the new environment, thereby generating trajectories similar to those trained in simulation and narrows the sim-to-real gap. Experimental results show that our method, trained in simulation on a generic square peg-and-hole task, can generalize to a variety of real-world insertion tasks involving narrow or even negative clearances, all without requiring any fine-tuning.
|
|
13:30-15:00, Paper TuBT31-NT.2 | Add to My Program |
Compliant Peg-In-Hole Assembly Using a Very Soft Wrist |
|
Zhang, Qi | Osaka University |
Hu, Zhengtao | Osaka University |
Wan, Weiwei | Osaka University |
Harada, Kensuke | Osaka University |
Keywords: Assembly, Compliant Assembly, Compliant Joints and Mechanisms
Abstract: This paper proposes to use a high-compliance soft wrist to improve the performance of robotic peg-in-hole in uncertain environments. In contrast to past research in this field, in which force control with relatively low compliance has been used, we propose a method that searching and aligning motions can be easily realized by taking advantage of high compliance wrist under the gravity. Our proposed PiH strategy is completely passive: After the peg is trapped in the hole during the hole-searching process using the spherical helix trajectory, it is guaranteed that the peg can automatically be inserted into the hole due to the effect of gravity and wrist compliance if the configuration of the peg is included in the no-escapable area. The no-escapable area can be obtained based on the potential analysis considering the contact state combined with the wrist compliance space. The effectiveness of the proposed method is experimentally verified by using the peg with various shapes and sizes.
|
|
13:30-15:00, Paper TuBT31-NT.3 | Add to My Program |
6D Pose Estimation Based on 3D Edge Binocular Reprojection Optimization for Robotic Assembly |
|
Li, Dong | Chongqing University |
Mu, Quan | Foreign Environmental Cooperation Center, Ministry of Ecology An |
Yuan, Yilin | Chongqing University |
Wu, Shiwei | Chongqing University |
Hong, Hualin | Chongqing University |
Tian, Ye | Chongqing University |
Jiang, Qian | Chongqing University |
Liu, Fei | Chongqing University |
Keywords: Assembly, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Accurate 6D pose estimation of object is important for robot assembly. This letter presents a novel method for achieving high precision 6D pose estimation by exploiting the reprojection of 3D edges onto binocular RGB image pairs. Our proposed method encompasses three phases: detection, pose initialization, and pose refinement. In the detection phase, an existing detector is employed to identify the objects within the image pairs. Subsequently, the object image patch of interest is extracted and fed into an encoder-decoder network that leverages edge maps and RGB images for the purpose of initial pose estimation. To refine the initial pose and achieve precise 6D pose estimation, we introduce a novel binocular edge-map-based nonlinear optimization technique. Our primary contributions entail an improved initial pose estimation network and a novel pose optimization technique. The improved network is dedicated to enhancing the accuracy of initial pose estimation, while the optimization technique focuses on refining the precision of the estimations. Experimental results demonstrate the effectiveness of our method, yielding an average translation precision of 0.48 mm and rotation precision of 0.45 degrees. Consequently, our proposed method can be seamlessly integrated into robotic manipulation platforms to successfully execute diverse assembly tasks.
|
|
13:30-15:00, Paper TuBT31-NT.4 | Add to My Program |
ASAP: Automated Sequence Planning for Complex Robotic Assembly with Physical Feasibility |
|
Tian, Yunsheng | MIT |
Willis, Karl | Autodesk |
Al Omari, Bassel | University of Waterloo |
Luo, Jieliang | Autodesk Research |
Ma, Pingchuan | MIT CSAIL |
Li, Yichen | MIT |
Javid, Farhad | Autodesk Research |
Gu, Edward | MIT |
Jacob, Joshua | MIT CSAIL |
Sueda, Shinjiro | Texas A&M University |
Li, Hui | Autodesk Research |
Chitta, Sachin | Autodesk Inc |
Matusik, Wojciech | MIT |
Keywords: Assembly, Planning, Scheduling and Coordination, Intelligent and Flexible Manufacturing
Abstract: The automated assembly of complex products requires a system that can automatically plan a physically feasible sequence of actions for assembling many parts together. In this paper, we present ASAP, a physics-based planning approach for automatically generating such a sequence for general-shaped assemblies. ASAP accounts for gravity to design a sequence where each sub-assembly is physically stable with a limited number of parts being held and a support surface. We apply efficient tree search algorithms to reduce the combinatorial complexity of determining such an assembly sequence. The search can be guided by either geometric heuristics or graph neural networks trained on data with simulation labels. Finally, we show the superior performance of ASAP at generating physically realistic assembly sequence plans on a large dataset of hundreds of complex product assemblies. We further demonstrate the applicability of ASAP on both simulation and real-world robotic setups. Project website: asap.csail.mit.edu
|
|
13:30-15:00, Paper TuBT31-NT.5 | Add to My Program |
Simulation-Based Approach for Automatic Roadmap Design in Multi-AGV Systems (I) |
|
Žužek, Tena | University of Ljubljana |
Vrabič, Rok | Faculty of Mechanical Engineering, University of Ljubljana |
Zdesar, Andrej | University of Ljubljana |
Škulj, Gašper | University of Ljubljana |
Banfi, Igor | Epilog D.o.o |
Bošnak, Matevž | Faculty of Electrical Engineering, University of Ljubljana |
Zaletelj, Viktor | Epilog D.o.o |
Klancar, Gregor | University of Ljubljana |
Keywords: Logistics, Factory Automation, Path Planning for Multiple Mobile Robots or Agents
Abstract: This paper addresses the problem of establishing efficient intralogistic systems, focusing on the generation of roadmaps on a given layout and the coordination of multiple Automated Guided Vehicles (AGVs). A simulation-based approach for automatic roadmap design is proposed. An event-based simulator is developed that uses ant-colony inspired optimization to generate roadmaps tailored to the specific characteristics of a given intralogistic problem, i.e., the plant layout, fleet size, statistical description of tasks, dispatching algorithm, etc. The generated solutions are evaluated with a Multi-Agent Path Finding (MAPF) simulator that uses a Safe Interval Path Planning (SIPP) algorithm. By analysing the system throughput, the optimal fleet size for the system is proposed. The approach is validated through various examples and benchmarked against existing methods in the literature.
|
|
13:30-15:00, Paper TuBT31-NT.6 | Add to My Program |
MM4MM: Map Matching Framework for Multi-Session Mapping in Ambiguous and Perceptually-Degraded Environments |
|
Wu, Zhenyu | Nanyang Technological University |
Wang, Wei | Nanyang Technological University |
Zhao, Chunyang | Nanyang Technological University |
Yue, Yufeng | Beijing Institute of Technology |
Zhang, Jun | Nanyang Technological University |
Shen, Hongming | Nanyang Technological University |
Wang, Danwei | Nanyang Technological University |
Keywords: Logistics, Mapping, Probability and Statistical Methods
Abstract: Multi-session mapping serves as the pre-requisite for autonomous robots to fulfill various long-term tasks (e.g., map updating, navigation, collaboration). However, it is challenging to implement multi-session mapping in enclosed or partially enclosed ambiguous environments (e.g., long corridors, industrial warehouses). Existing solutions either depend heavily on the matching of elementary geometric features (e.g., points, lines, and planes), which tends to fail in environments with ambiguous geometric features; or depend on the given guess of the initial transformation matrix of multiple single-session maps, which is not always obtainable and accurate enough. The ambient magnetic field has exhibited ubiquity and high distinctiveness at different location, which makes it suitable for estimating the initial transformation matrix. Thus, this paper proposes a novel probabilistic magnetic-aware Map Matching framework for Multi-session Mapping, namely MM4MM, to estimate the relative transformation of multiple single-session maps and to build the globally consistent maps in ambiguous and perceptually-degraded environments. The key novelties of this work are the designing of the hierarchical probabilistic map matching framework and the Particle Swarm Optimization strategy to associate the magnetic data of multiple sessions. Evaluations on both simulated and real world experiments demonstrate the greatly improved utility, accuracy, and robustness of multi-session mapping over the comparative methods.
|
|
13:30-15:00, Paper TuBT31-NT.7 | Add to My Program |
Learning Generalizable Patrolling Strategies through Domain Randomization of Attacker Behaviors |
|
Diaz Alvarenga, Carlos | University of California at Merced |
Basilico, Nicola | University of Milan |
Carpin, Stefano | University of California, Merced |
Keywords: Surveillance Robotic Systems, Planning, Scheduling and Coordination
Abstract: Graph-patrolling problems in the adversarial domain typically embed models and assumptions about how hostile events, from which an environment must be protected, are generated at a specific time and location. Relying upon such attacker models prevents algorithms from synthesizing strategies that can generalize in different settings, providing good performance under different and uncertain scenarios. In this paper, we propose a first method to deal with adversarial patrolling using a data driven approach. We cast the problem in an RL setting where the reward function is based on the ability to neutralize attacks that can follow an unknown strategy and that, hence, can be viewed as a black box component. We apply a policy gradient framework for optimizing action probabilities under such a reward model showing how effective patrolling strategies can be obtained from repeated attack-defense interactions between a patrolling agent and an attacker. Our results show that the data driven patroller can effectively provide protection against multiple, diverse attacker behaviors.
|
|
13:30-15:00, Paper TuBT31-NT.8 | Add to My Program |
Combining Coordination and Independent Coverage in Multirobot Graph Patrolling |
|
Diaz Alvarenga, Carlos | University of California at Merced |
Basilico, Nicola | University of Milan |
Carpin, Stefano | University of California, Merced |
Keywords: Surveillance Robotic Systems, Planning, Scheduling and Coordination
Abstract: Graph patrolling algorithms provide effective strategies to coordinate mobile robots in the context of autonomous surveillance of valuable assets. Optimizing patrolling strategies often aims at bounding the time between subsequent visits to a vertex -- a measure known in literature as idleness. In the domain of multi-robot patrolling, two approaches have received the most attention thus far. The first involves coordinating all robots to follow a shared patrolling strategy covering the entire graph, while the second approach partitions the environment into disjoint areas that are then assigned to individual robots. Starting from these existing solutions, in this paper we introduce a new method bridging these two complementary approaches. Our technique splits the vertices of the graph into a partition that includes a shared portion of the environment patrolled collectively by all robots, along with disjoint areas allocated exclusively to individual robots. This problem in formulated in terms of minimizing the maximum weighted idleness of the graph and is shown to be NP-hard. Then, we describe an exact solution for the problem and also propose various heuristics to efficiently compute solutions for large problem instances. We evaluate and compare the proposed techniques in simulation and show that our methods in most cases produce better patrolling strategies when compared to classic solutions. Moreover, for small problem instances where the exact solution can be found, we demonstrate that our proposed heuristic has a competitive performance ratio.
|
|
13:30-15:00, Paper TuBT31-NT.9 | Add to My Program |
Longitudinal Control Volumes: A Novel Centralized Estimation and Control Framework for Distributed Multi-Agent Sorting Systems |
|
Maier, James | Carnegie Mellon University |
Sriganesh, Prasanna | Carnegie Mellon University |
Travers, Matthew | Carnegie Mellon University |
Keywords: Sustainable Production and Service Automation, Process Control, Sensor-based Control
Abstract: Centralized control of a multi-agent system improves upon distributed control especially when multiple agents share a common task e.g., sorting different materials in a recycling facility. Traditionally, each agent in a sorting facility is tuned individually which leads to suboptimal performance if one agent is less efficient than the others. Centralized control overcomes this bottleneck by leveraging global system state information, but it can be computationally expensive. In this work, we propose a novel framework called Longitudinal Control Volumes (LCV) to model the flow of material in a recycling facility. We then employ a Kalman Filter that incorporates local measurements of materials into a global estimation of the material flow in the system. We utilize a model predictive control algorithm that optimizes the rate of material flow using the global state estimate in real-time. We show that our proposed framework outperforms distributed control methods by 40-100 percent in simulation and physical experiments.
|
|
TuBT32-NT Oral Session, NT-G8 |
Add to My Program |
Intelligent Transportation Systems II |
|
|
Chair: Triebel, Rudolph | German Aerospace Center (DLR) |
Co-Chair: Fang, Zhengru | City University of Hong Kong |
|
13:30-15:00, Paper TuBT32-NT.1 | Add to My Program |
A Safety-Adapted Loss for Pedestrian Detection in Autonomous Driving |
|
Lyssenko, Maria | Robert Bosch GmbH, University of Munich |
Pimplikar, Piyush | Robert Bosch GmbH, Corporate Research, Germany |
Bieshaar, Maarten | Robert Bosch GmbH |
Nozarian, Farzad | DFKI |
Triebel, Rudolph | German Aerospace Center (DLR) |
Keywords: Intelligent Transportation Systems, AI-Based Methods, Computer Vision for Transportation
Abstract: In safety-critical domains like autonomous driving (AD), errors by the object detector may endanger pedestrians and other vulnerable road users (VRU). As raw evaluation metrics are not an adequate safety indicator, recent works leverage domain knowledge to identify safety-relevant VRU and back-annotate the criticality of the interaction to the object detector. However, those approaches do not consider the safety factor in the deep neural network (DNN) training process. Thus, state-of-the-art DNN penalize all misdetections equally irrespective of their importance for the safe driving task. Hence, to mitigate the occurrence of safety-critical failure cases like false negatives, a safety-aware training strategy is needed to enhance the detection performance for critical pedestrians. In this paper, we propose a novel, safety-adapted loss variation that leverages the estimated per-pedestrian criticality during training. Therefore, we exploit the reachable set-based time-to-collision (TTC-RSB) metric from the motion domain along with distance information to account for the worst-case threat. Our evaluation results using RetinaNet and FCOS on the nuScenes dataset demonstrate that training the models with our safety-adapted loss function mitigates the misdetection of safety-critical pedestrians with robust performance for the general case, i.e., safety-irrelevant pedestrians.
|
|
13:30-15:00, Paper TuBT32-NT.2 | Add to My Program |
PCB-RandNet: Rethinking Random Sampling for LiDAR Semantic Segmentation in Autonomous Driving Scene |
|
Han, Xian-Feng | Southwest University |
Cheng, Huixian | Southwest University |
Jiang, Hang | Southwest University |
He, Dehong | Southwest University |
Xiao, Guo-Qiang | Southwest University |
Keywords: Intelligent Transportation Systems, AI-Based Methods, Semantic Scene Understanding
Abstract: Fast and efficient semantic segmentation of large-scale LiDAR point clouds is a fundamental problem in autonomous driving. To achieve this goal, the existing point-based methods mainly choose to adopt Random Sampling strategy to process large-scale point clouds. However, our quantative and qualitative studies have found that Random Sampling may be less suitable for the autonomous driving scenario, since the LiDAR points follow an uneven or even long-tailed distribution across the space, which prevents the model from capturing sufficient information from points in different distance ranges and reduces the model's learning capability. To alleviate this problem, we propose a new Polar Cylinder Balanced Random Sampling method that enables the downsampled point clouds to maintain a more balanced distribution and improve the segmentation performance under different spatial distributions. In addition, a sampling consistency loss is introduced to further improve the segmentation performance and reduce the model's variance under different sampling methods. Extensive experiments confirm that our approach produces excellent performance on both SemanticKITTI and SemanticPOSS benchmarks, achieving a 2.8% and 4.0% improvement, respectively.
|
|
13:30-15:00, Paper TuBT32-NT.3 | Add to My Program |
STT: Stateful Tracking with Transformers for Autonomous Driving |
|
Jing, Longlong | Waymo |
Yu, Ruichi | Waymo |
Chen, Xu | Waymo |
Zhao, Zhengli | UCI |
Sheng, Shiwei | Waymo |
Graber, Colin | Waymo |
Chen, Qi | Johns Hopkins University |
Li, Qinru | University of California San Diego |
Wu, Shangxuan | Waymo |
Deng, Han | Waymo LLC |
Lee, Sangjin | Waymo |
Sweeney, Chris | Waymo LLC |
He, Qiurui | Waymo LLC |
Hung, Wei-Chih | Waymo |
He, Tong | Waymo LLC |
Zhou, Xingyi | Google Research |
Moussavi, Farshid | Waymo |
Guo, Zijian | Waymo |
Zhou, Yin | Waymo |
Tan, Mingxing | Waymo Research |
Yang, Weilong | Waymo |
Li, Congcong | Waymo Inc |
Keywords: Intelligent Transportation Systems, AI-Enabled Robotics, AI-Based Methods
Abstract: Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model’s performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.
|
|
13:30-15:00, Paper TuBT32-NT.4 | Add to My Program |
SmartCooper: Vehicle Collaborative Perception under Adaptive Fusion and Judger Mechanism |
|
Zhang, Yuang | Tsinghua University |
An, Haonan | Nanyang Technological University |
Fang, Zhengru | City University of Hong Kong |
Xu, Guowen | City University of Hong Kong |
Zhou, Yuan | Nanyang Technological University |
Chen, Xianhao | The University of Hong Kong |
Fang, Yuguang | City Universty of Hong Kong |
Keywords: Intelligent Transportation Systems, Automation Technologies for Smart Cities, Computer Vision for Transportation
Abstract: In recent years, autonomous driving has garnered significant attention due to its potential for improving road safety through collaborative perception among connected and autonomous vehicles (CAVs). However, time-varying channel variations in vehicular transmission environments demand dynamic allocation of communication resources. Moreover, in the context of collaborative perception, it is important to recognize that not all CAVs contribute valuable data, and some CAV data even have detrimental effects on collaborative perception. In this paper, we introduce SmartCooper, an adaptive collaborative perception framework that incorporates communication optimization and a judger mechanism to facilitate CAV data fusion. Our approach begins with optimizing the connectivity of vehicles while considering communication constraints. We then train a learnable encoder to dynamically adjust the compression ratio based on the channel state information (CSI). Subsequently, we devise a judger mechanism to filter the detrimental image data reconstructed by adaptive decoders. We evaluate the effectiveness of our proposed algorithm on the OpenCOOD platform. Our results demonstrate a substantial reduction in communication costs by 23.10% compared to the non-judger scheme. Additionally, we achieve a significant improvement on the average precision of Intersection over Union (AP@IoU) by 7.15% compared with state-of-the-art schemes.
|
|
13:30-15:00, Paper TuBT32-NT.5 | Add to My Program |
A Neural-Evolutionary Algorithm for Autonomous Transit Network Design |
|
Holliday, Andrew | McGill University |
Dudek, Gregory | McGill University |
Keywords: Intelligent Transportation Systems, Automation Technologies for Smart Cities, Optimization and Optimal Control
Abstract: Planning a public transit network is a challenging optimization problem, but essential in order to realize the benefits of autonomous buses. We propose a novel algorithm for planning networks of routes for autonomous buses. We first train a graph neural net model as a policy for constructing route networks, and then use the policy as one of several mutation operators in a evolutionary algorithm. We evaluate this algorithm on a standard set of benchmarks for transit network design, and find that it outperforms the learned policy alone by up to 20% and a plain evolutionary algorithm approach by up to 53% on realistic benchmark instances.
|
|
13:30-15:00, Paper TuBT32-NT.6 | Add to My Program |
UDE-Based Robust Control of a Quadrotor-Slung-Load System |
|
Wang, Yanhu | Shanghai Jiao Tong University |
Yu, Gan | Shanghai Jiao Tong University |
Xie, Wei | Shanghai Jiao Tong University |
Zhang, Weidong | Shanghai JiaoTong University |
Silvestre, Carlos | University of Macau |
Keywords: Intelligent Transportation Systems, Robust/Adaptive Control
Abstract: This article addresses the robust trajectory tracking problem for a Quadrotor-Slung-Load System (QSLS), which consists of a point-mass load and a rigid-body quadrotor connected by an inelastic cable. To construct the controller, we employ the backstepping technique and propose an Uncertainty and Disturbance Estimator (UDE) to compensate for uncertainties arising from imprecise model parameters and exogenous time-varying disturbances affecting both quadrotor and load. The main feature of the UDE is its ability to convert the robust control problem into a low-pass filter design in the frequency domain, which generates an estimate of lumped uncertainties. To streamline the design process, we utilize a coordinate transformation strategy that converts the QSLS into a configuration that resembles the dynamics of a typical quadrotor system. The proposed controller ensures uniformly ultimate boundedness of closed-loop errors in the presence of time-varying exogenous disturbances, while guaranteeing asymptotic stability when disturbances are zero. Finally, we present comprehensive simulation and experimental results to validate the effectiveness and robustness of the proposed solution.
|
|
13:30-15:00, Paper TuBT32-NT.7 | Add to My Program |
Are You a Robot? Detecting Autonomous Vehicles from Behavior Analysis |
|
Maresca, Fabio | NEC Laboratories Europe GmbH |
Grazioli, Filippo | NEC Laboratories Europe GmbH |
Albanese, Antonio | Flyhound Co |
Sciancalepore, Vincenzo | NEC Laboratories Europe GmbH |
Negri, Gianpiero | Amazon Global Robotics - EU Innovation Lab |
Costa-Perez, Xavier | NEC Laboratories Europe |
Keywords: Intelligent Transportation Systems, Behavior-Based Systems, AI-Based Methods
Abstract: The tremendous hype around autonomous driving is eagerly calling for emerging and novel technologies to support advanced mobility use cases. As car manufactures keep developing SAE level 3+ systems to improve the safety and comfort of passengers, traffic authorities need to establish new procedures to manage the transition from human-driven to fully- autonomous vehicles while providing a feedback-loop mechanism to fine-tune envisioned autonomous systems. Thus, a way to automatically profile autonomous vehicles and differentiate those from human-driven ones is a must. In this paper, we present a fully-fledged framework that monitors active vehicles using camera images and state information in order to determine whether vehicles are autonomous, without requiring any active notification from the vehicles themselves. Essentially, it builds on the cooperation among vehicles, which share their data acquired on the road feeding a machine learning model to identify autonomous cars. We extensively tested our solution and created the NexusStreet dataset, by means of the CARLA simulator, employing an autonomous driving control agent and a steering wheel maneuvered by licensed drivers. Experiments show it is possible to discriminate the two behaviors by analyzing video clips with an accuracy of ∼ 80%, which improves up to ∼93% when the target’s state information is available. Lastly, we deliberately degraded the state to observe how the framework performs under non-ideal data collection conditions.
|
|
13:30-15:00, Paper TuBT32-NT.8 | Add to My Program |
RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud |
|
Pan, Zhijun | Royal College of Art |
Ding, Fangqiang | University of Edinburgh |
Zhong, Hantao | University of Cambridge |
Lu, Chris Xiaoxuan | University of Edinburgh |
Keywords: Intelligent Transportation Systems, Computer Vision for Transportation
Abstract: Mobile autonomy relies on the precise perception of dynamic environments. Robustly tracking moving objects in 3D world thus plays a pivotal role for applications like trajectory prediction, obstacle avoidance, and path planning. While most current methods utilize LiDARs or cameras for Multiple Object Tracking (MOT), the capabilities of 4D imaging radars remain largely unexplored. Recognizing the challenges posed by radar noise and point sparsity in 4D radar data, we introduce RaTrack, an innovative solution tailored for radar-based tracking. Bypassing the typical reliance on specific object types and 3D bounding boxes, our method focuses on motion segmentation and clustering, enriched by a motion estimation module. Evaluated on the View-of-Delft dataset, RaTrack showcases superior tracking precision of moving objects, largely surpassing the performance of the state of the art. We release our code and model at https://github.com/LJacksonPan/RaTrack.
|
|
13:30-15:00, Paper TuBT32-NT.9 | Add to My Program |
Mixed Traffic Control and Coordination from Pixels |
|
Villarreal, Michael | University of Tennessee, Knoxville |
Poudel, Bibek | University of Tennessee Knoxville |
Pan, Jia | University of Hong Kong |
Li, Weizi | University of Tennessee, Knoxville |
Keywords: Intelligent Transportation Systems, Reinforcement Learning, Computer Vision for Transportation
Abstract: Traffic congestion is a persistent problem in our society. Existing methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to mixed traffic control, where robot vehicles regulate human-driven vehicles through reinforcement learning (RL). However, most existing studies use precise observations that require domain expertise and hand engineering for each road network’s observation space. Additionally, precise observations use global information, such as environment outflow, and local information, i.e., vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor environments and communication to potentially unwilling human drivers. We consider image observations, a modality that has not been extensively explored for mixed traffic control via RL, as the alternative: 1) images do not require a complete re-imagination of the observation space from environment to environment; 2) images are ubiquitous through satellite imagery, in-car camera systems, and traffic monitoring systems; and 3) images only require communication to equipment. In this work, we show robot vehicles using image observations can achieve competitive performance to using precise information on environments, including ring, figure eight, intersection, merge, and bottleneck. In certain scenarios, our approach even outperforms using precision observations, e.g., up to 8% increase in average vehicle velocity in the merge environment, despite only using local traffic information as opposed to global traffic information.
|
|
TuBL-EX Poster Session, Exhibition Hall |
Add to My Program |
Late Breaking Results Poster II |
|
|
|
13:30-15:00, Paper TuBL-EX.1 | Add to My Program |
Learning User-Specific Control Policies for Lower-Limb Exoskeletons Using Gaussian Process Regression |
|
Shahrokhshahi, Ahmadreza | Simon Fraser University |
Khadiv, Majid | Technical University of Munich |
Mansouri, Saeed | Simon Fraser University |
Arzanpour, Siamak | Simon Fraser University |
Park, Edward J. | Simon Fraser University |
Keywords: Prosthetics and Exoskeletons, Machine Learning for Robot Control, Humanoid and Bipedal Locomotion
Abstract: Robotic exoskeletons provide a viable means for enabling individuals with limited or no walking ability to traverse various surfaces with maximal external support to the patient's body. However, to achieve effective performance, it is crucial to consider anatomical differences in body size and shape among users. In this paper, we propose a framework to infer adapted user-specific policies using a small dataset from past experiments performed with twelve users wearing a lower-limb self-balancing exoskeleton. Our framework utilizes Gaussian Process Regression (GPR) to learn a mapping between user characteristics and control policy parameters. We also propose to use hindsight data relabeling to improve the performance of the controller. We experimentally test the output of the GPR model on new users and demonstrate its effectiveness in predicting user-specific walking parameters that lead to high performance. We also compare the performance of this control policy with an expert-tuned policy and show that our framework can reach comparable results without the need to perform expensive and unsafe tuning of the controller for new users.
|
|
13:30-15:00, Paper TuBL-EX.2 | Add to My Program |
A Novel Material Handling System for Transporting Large-Size Components Using Multiple Collaborative Autonomous Mobile Robots |
|
Qi, Lipeng | Xi'an Jiaotong University |
Yan, Chao-Bo | Xi'an Jiaotong University |
Zhang, Meng | Xi'an Jiaotong University |
Hu, Jianchen | Xi'an Jiaotong University |
Keywords: Cooperating Robots, Multi-Robot Systems, Optimization and Optimal Control
Abstract: Autonomous mobile robots (AMRs) are playing an important role in factory logistics. However, for large-size components, it is hard for a single AMR to move it due to limited capabilities. The cobots (i.e., collaborative AMRs), which could provide flexible, reliable, and cost-effective solutions to transporting large-size components such as aircraft wings and wind turbine blades, have attracted increasing attention and interest recently. In this paper, a novel material handling system for transporting large-size parts system by using multiple collaborative AMRs is proposed. The virtual structure method is adopted to model the kinematics of the cobot formation. The system works under low-latency industrial routers or 5G base stations, and AMR uses LIDAR or motion capture systems for positioning. The control algorithm based on model predictive control (MPC) takes into account the formation shape and the smoothness of the motion by designing an appropriate objective function. Simulations and experiments demonstrate the effectiveness of the control method. The distance errors between AMRs are within ±2cm with LIDAR and ±4mm with the motion capture system. The movement of the AMRs is smooth, ensuring safe transportation of the loads.
|
|
13:30-15:00, Paper TuBL-EX.3 | Add to My Program |
A Multimodal Soft Gripper with Variable Stiffness and Variable Gripping Range Based on MASH Actuator |
|
Li, Dannuo | National University of Singapore |
Zhou, Xuanyi | National University of Singapore |
Xiong, Quan | National University of Singapore |
Yeow, Chen-Hua | National University of Singapore |
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Soft Robot Applications
Abstract: Soft pneumatic actuators with integrated strain-limiting layers have emerged as predominant components in the field of soft gripper technology for several decades. However, owing to their intrinsic strain-limiting layer design, these soft grippers possess a singular gripping functionality, rendering them incapable of adapting to diverse gripping tasks with different strategies. Based on our previous work, we introduce a novel soft gripper that offers variable stiffness, an adjustable gripping range, and multifunctionality. The MASH actuator-based soft gripper can expand its gripping range up to threefold compared to the original configuration and ensures secure grip by enhancing stiffness when handling heavy objects. Moreover, it supports multitasking gripping through specific gripping strategy control.
|
|
13:30-15:00, Paper TuBL-EX.4 | Add to My Program |
Autonomous Grasping Control of Multi-Fingered Robot Hand for Unseen Objects Via Vision-Language Model |
|
Heo, Si-Hwan | Korea Institute of Science and Technology |
Hwang, Donghyun | Korea Institute of Science and Technology |
Yang, Sungwook | Korea Institute of Science and Technology |
Keywords: Grasping, Multifingered Hands, Control Architectures and Programming
Abstract: With the advent of large-scale vision-language models (VLMs) capable of perceiving situations and making nuanced decisions, robotic systems have increasingly utilized them to handle our daily living environments. This study introduces an autonomous grasping control framework leveraging recent VLM advancements, especially focused on the multi-fingered robot hand applications. This approach aims to broaden VLM decision-making capabilities, extending beyond one-dimensional gripper motions focused on stability and success rates, to context-based versatile grasping. Our decision-making architecture comprises four models: GPT4-vision, GPT4-turbo, Grounding DINO with SAM, and DSQnet. The first two models are instrumental in developing the grasping strategy, while the latter two translate this strategy into spatial information the robotic system can interpret. The GPT models assess the grasping situation via RGB image, with user requirements input as prompts. Additionally, constraint prompts are used to orient the VLM towards considering itself a robotic agent with a hand. Consequently, the GPT model determines which object to grasp and the appropriate posture for doing so. An edge-cloud robot control framework was developed to apply this decision-making capability in real-world scenarios. With our proposed concept and framework, we demonstrate the ability to grasp various unseen objects within the same scene differently, based on user requirements.
|
|
13:30-15:00, Paper TuBL-EX.5 | Add to My Program |
Learning Manipulation Skills for Cosmetic Services |
|
Duan, Anqing | Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Service Robotics, Learning from Demonstration, Task and Motion Planning
Abstract: The increasing deployment of robots has significantly enhanced the automation levels across a wide and diverse range of industries. This paper investigates the automation challenges of laser-based dermatology procedures in the beauty industry; This group of related manipulation tasks involves delivering energy from a cosmetic laser onto the skin with repetitive patterns. To automate this procedure, we propose to use a robotic manipulator and endow it with the dexterity of a skilled dermatology practitioner through a learning-from-demonstration framework. To ensure that the cosmetic laser can properly deliver the energy onto the skin surface of an individual, we develop a novel structured prediction-based imitation learning algorithm with the merit of handling geometric constraints. Notably, our proposed algorithm effectively tackles the imitation challenges associated with quasi-periodic motions, a common feature of many laser-based cosmetic tasks. The conducted real-world experiments illustrate the performance of our robotic beautician in mimicking realistic dermatological procedures; Our new method is shown to not only replicate the rhythmic movements from the provided demonstrations but also to adapt the acquired skills to previously unseen scenarios and subjects.
|
|
13:30-15:00, Paper TuBL-EX.6 | Add to My Program |
Rapid and Energy-Efficient Stiffness Control of Continuum Robots with Modular Electropermanent Magnet Joints |
|
Song, ChangSeob | Korea Institue of Science and Technology |
Yang, Sungwook | Korea Institute of Science and Technology |
Hwang, Donghyun | Korea Institute of Science and Technology |
Keywords: Compliance and Impedance Control, Compliant Joints and Mechanisms, Surveillance Robotic Systems
Abstract: Current continuum manipulators for confined spaces often lack the ability to adjust their stiffness. Variable stiffness joints have been explored to address this, but existing solutions suffer from slow stiffening or require bulky tethered actuation systems. This work introduces a novel EPM-based variable stiffness joint that utilizes electropermanent magnets (EPMs) for rapid and high-rigidity stiffening. EPMs are switchable magnets requiring only switching energy to maintain their stiffness state. Our design combines EPMs with magneto-rheological elastomers to maximize stiffening torque. By leveraging the high energy density of NdFeB magnets, the EPM joint achieves significant rigidity variation in continuum manipulators. This innovation has the potential to improve robots performing maintenance and inspection tasks in confined environments like semiconductor fabrication plants and aircraft engines.
|
|
13:30-15:00, Paper TuBL-EX.7 | Add to My Program |
Exploring Robotic Arm Movement Profiles: How Movement Shapes User Perception |
|
Liberman-Pincu, Ela | Ben-Gurion University of the Negev |
Oron Gilad, Tal | Ben-Gurion University of the Negev |
Keywords: Human Factors and Human-in-the-Loop, Human-Robot Collaboration, Motion and Path Planning
Abstract: This study examines the impact of motion design on user perceptions and interactions with robotic arms, which are expanding from industrial applications to daily human tasks. The research investigates how different motion components influence user preferences. An online questionnaire allowed participants to watch videos of a robot performing laundry sorting tasks and then select descriptive terms for each robot’s behavior. Additionally, they rated various robot behavior profiles based on their willingness to use them. The outcomes revealed that motion modality significantly affects perceptions of a robot’s reliability, professionalism, and innovation. In addition, high levels of erratic behavior were found to be associated with friendliness, reliability, and intelligence and generally receiving more positive views. This indicates a complex relationship between motion patterns and human perception, where innovation is appreciated but not at the cost of operational functionality. The challenge lies in balancing innovative movement with perceived reliability and professionalism.
|
|
13:30-15:00, Paper TuBL-EX.8 | Add to My Program |
Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-Aware Structure-Of-Interest Planning |
|
Qi, Jiaming | Centre for Transformative Garment Production, HongKong |
Zhou, Peng | The University of Hong Kong |
Zheng, Pai | The Hong Kong Polytechnic University |
Wu, Hongmin | Institute of Intelligent Manufacturing, Guangdong Academy of Sci |
Lee, Hoi-Yin | The Hong Kong Polytechnic University |
Lu, Liang | University of Hong Kong |
Yang, Chenguang | University of Liverpool |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Pan, Jia | University of Hong Kong |
Keywords: Manipulation Planning, Dual Arm Manipulation, Perception for Grasping and Manipulation
Abstract: Bagging operations, common in packaging and assisted living applications, are challenging due to a bag’s complex deformable properties. To address this, we develop a robotic system for automated bagging tasks using an adaptive structure-of-interest (SOI) manipulation approach. Our method relies on real-time visual feedback to dynamically adjust manipulation without requiring prior knowledge of bag materials or dynamics. We present a robust pipeline featuring state estimation for SOIs using Gaussian Mixture Models (GMM), SOI generation via optimization-based bagging techniques, SOI motion planning with Constrained Bidirectional Rapidlyexploring Random Trees (CBiRRT), and dual-arm manipulation coordinated by Model Predictive Control (MPC). Experiments demonstrate the system’s ability to achieve precise, stable bagging of various objects using adaptive coordination of the manipulators. The proposed framework advances the capability of dual-arm robots to perform more sophisticated automation of common tasks involving interactions with deformable objects.
|
|
13:30-15:00, Paper TuBL-EX.9 | Add to My Program |
Time-Delay Compensation for Delayed Acceleration Input in CACC Using Non-Collocated Sensing of Inter-Vehicle Distance |
|
Yavuz, Ahmet | Keio University |
Kubo, Ryogo | Keio University |
Keywords: Intelligent Transportation Systems, Control Architectures and Programming, Cooperating Robots
Abstract: Cooperative adaptive cruise control (CACC) enables a shorter inter-vehicle distance (IVD) by using wireless communications and radar-based IVD measurements. Leader-follower CACC architecture with Smith predictor (SP) has been established to compensate for communication delays and further decrease the IVDs. By changing from front-IVD measurements of the leader-follower architecture to rear-IVD measurements, the number of communication paths is decreased by half. However, the leader-follower architecture with SP has a lack of flexibility because a vehicle is managed by its preceding vehicle. This study proposes the introduction of SP into a general one-vehicle look-ahead architecture and non-collocated sensing of IVD to synchronize feedforward-based acceleration control and feedback-based IVD control, which results in a higher flexibility than the conventional methods while keeping the benefit of less communication paths. The simulation results show that the proposed method compensates for the communication delay as in the leader-follower architecture with SP and rear-IVD measurements.
|
|
13:30-15:00, Paper TuBL-EX.10 | Add to My Program |
Paint with the Sun: A Robotic System for Heliography |
|
Hu, Luyin | Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Sensor-based Control, Motion Control, Art and Entertainment Robotics
Abstract: We introduce a robotic system for heliography, which refers to painting with the sun. To automate this manipulation task, we developed a unique method that merges thermal servoing with vision-guided controls. Our robotic system is equipped with a magnifying lens affixed to its end-effector, complemented by thermal and visual sensing equipment. The performance of our method is evaluated with a series of heliography experiments.
|
|
13:30-15:00, Paper TuBL-EX.11 | Add to My Program |
Vision-Based Collaborative Robot Automation and Voice Control System Using Mobile Robot Arms |
|
Kim, Hanjun | Seoul National University |
Ahn, Sung-Hoon | Seoul National University |
Kim, Ayoung | Seoul National University |
Keywords: Human-Robot Collaboration, Object Detection, Segmentation and Categorization, Natural Dialog for HRI
Abstract: A collaborative robot is a robot that interacts with and performs tasks with workers, which can increase work efficiency. However, currently commercially available cooperative robot solutions have limitations in their functionality and disadvantages such as inflexibility in unexpected situations. To address this problem, this study aims to develop a cooperative robot automation process via voice commands and implement it in the workspace. In addition, we propose a new strategy to enhance the function of robots to recognize their surroundings and ensure the safety of people on the job. The main content of the study consists of two parts. First, we develop a process in which the robot accepts work instructions from the operator through voice commands and accurately understands them. After that, they then leverage camera vision systems and autonomous driving platforms to recognize their surroundings and create optimal work plans to carry out their work. This study is expected to contribute to expanding the use of cooperative robots in industrial sites and improving work efficiency. In addition, safety features using voice command and vision systems are expected to be applicable to various applications.
|
|
13:30-15:00, Paper TuBL-EX.12 | Add to My Program |
RNN-Based Shared Control for Enhanced Sense of Agency in Robotic Teleoperation |
|
Morita, Tomoya | Nagoya University |
Armleder, Simon | Technische Universität München |
Iino, Hiroto | Waseda University |
Aoyama, Tadayoshi | Nagoya University |
Cheng, Gordon | Technical University of Munich |
Hasegawa, Yasuhisa | Nagoya University |
Keywords: Telerobotics and Teleoperation, Machine Learning for Robot Control, Embodied Cognitive Science
Abstract: Robotic teleoperation, where humans operate robots at remote locations, removes temporal and spatial barriers and enables physical interaction with people and objects. Proactive robot operation is expected to improve the operator's motivation and sense of self-efficacy. However, when manipulation interventions that support task execution occur, the SoA (Sense of Agency: the sense of perception that the movement of the observed object is caused by oneself) decreases. Therefore, a method of manipulation intervention that maintains a high SoA is required. We proposed a shared control method in which the robot motion does not deviate significantly from the operator's intention by predicting the next time step input in real time with an RNN model that learns from the skilled operator's operations. We experimented with pouring tasks under different conditions, using a system where two remote robot arms were controlled by a VR device, the operator's hand and head position and posture were synchronized with the robot, and the robot's viewpoint was shared. Experimental results confirmed that in some cases, the proposed method corrects the operation input temporally and spatially so that it approaches the trajectory of the skilled operator. Although no significant difference was obtained in the improvement of task performance by the proposed method, it was confirmed that the proposed method maintained a higher SoA than the motion playback of the skilled operator's operation.
|
|
13:30-15:00, Paper TuBL-EX.13 | Add to My Program |
Autonomous Orientation Control of Forceps Based on Real-Time Action Segmentation in Robotic Surgery |
|
Yamada, Yutaro | Nagoya University |
Colan, Jacinto | Nagoya University |
Davila, Ana | Nagoya University |
Hasegawa, Yasuhisa | Nagoya University |
Keywords: AI-Based Methods, Recognition, Surgical Robotics: Laparoscopy
Abstract: Robotic Minimally Invasive Surgery (RMIS) systems offer only complementary assistance, resulting in high workload for surgeons. While surgeons have high cognitive abilities, they may experience fatigue and have limited dexterity. In contrast, robots can move with higher dexterity and precision without fatigue but have limited perception and decision-making. To combine their abilities, we propose a shared control system that recognizes the surgeon's actions and provides autonomous robotic assistance. Our system consists of a real-time action segmentation system and an assistance system. The segmentation system recognizes the surgeon's actions using a hierarchical clustering framework based on visual and kinematic data. The assistance system provides appropriate forceps orientation for the recognized actions. We evaluate the system using a pick-and-place task. The action segmentation system achieves an average accuracy of 85.3%, with a processing time of 44.5 ms per output, demonstrating its ability to accurately and swiftly recognize actions. A subject experiment comparing task completion time and mental workload with and without assistance shows significant reductions in both metrics, suggesting its effectiveness in reducing operational difficulty and cognitive workload. In conclusion, our shared control system recognizes the surgeon's actions and autonomously controls forceps orientation, enhancing operational efficiency and easing cognitive demands in robotic surgery.
|
|
13:30-15:00, Paper TuBL-EX.14 | Add to My Program |
TripletLoc: One-Shot Global Localization Using Semantic Triplet in Large-Scale Urban Environment |
|
Ma, Weixin | The Hong Kong Polytechnic University |
Sun, Yuxiang | City University of Hong Kong |
Yin, Huan | Hong Kong University of Science and Technology |
Su, Zhongqing | The Hong Kong Polytechnic University |
Keywords: Localization, Range Sensing, SLAM
Abstract: This study introduces a framework, TripletLoc, for fast and robust registration of a single LiDAR scan globally to a large-scale reference map. In contrast to the conventional method using place recognition and point cloud registration, TripletLoc generates correspondences between lightweight semantics in environments directly, which is close to how humans perceive the world. To achieve this, instances from the single query scan and the large-scale reference map are used to construct two semantic graphs, respectively. A novel semantic triplet-based histogram descriptor is proposed to match instances in the query scan to instances in the reference map. Then graph-theoretic outlier pruning is employed to obtain inlier correspondences from raw instance to-instance correspondences for robust 6-DoF pose estimation. We evaluate our pipeline extensively on a public large-scale dataset, HeliPR, which covers diverse and complex scenarios in urban environments. Experimental results demonstrate that our method can achieve fast and robust global localization under diverse and challenging environments, with high memory efficiency.
|
|
13:30-15:00, Paper TuBL-EX.15 | Add to My Program |
From '鏡花水月' (Mirror Flower, Water Moon) to Visual Contrastive Prospective Learning for UAV Indoor Autonomous Navigation |
|
Chang, Yingxiu | University of Hull |
Cheng, Yongqiang | University of Sunderland |
Murray, John Christopher | University of Sunderland |
Khalid, Muhammad | University of Hull |
Manzoor, Umar | University of Sunderland |
Keywords: Aerial Systems: Perception and Autonomy, AI-Based Methods, Representation Learning
Abstract: '鏡花水月'(Flower in the Mirror, Moon in the water) is metaphorically used to depict something that can be seen but is untouchable (illusions of their own from the real world). The mirror and water surfaces can be regarded as a latent space that consists of representations generated from reality and connected towards illusions. Assuming there is a mirror that can foresee the future, the mirror surface (i.e., latent space) essentially requires a generated representation from the latest observations (i.e., hallucinating future representations), and subsequently be decoded to hallucinate the scenarios in the future. Hallucinating future representations (i.e., prospective representation learning) has demonstrated better action prediction for robot motion and path planning. Therefore, this poster focuses on obtaining prospective aware representations from the latest sequential images by leveraging the advantages of contrastive learning to benefit performance of visual based mapless UAV indoor navigation The model and video are available at: https://github com/Yingxiu Chang/MulSCPL.
|
|
13:30-15:00, Paper TuBL-EX.16 | Add to My Program |
Inchworm-Like Biomimetic Magnetic-Driven Robotic Shell for Capsule Endoscope in Intestinal Tract |
|
Yu, Xinkai | Harbin Institute of Technology (Shenzhen) |
Wang, Jiaole | Harbin Institute of Technology, Shenzhen |
Su, Jingran | Department of Gastroenterology, Qilu Hospital of Shandong Univer |
Song, Shuang | Harbin Institute of Technology (Shenzhen) |
Keywords: Medical Robots and Systems, Mechanism Design, Biologically-Inspired Robots
Abstract: Wireless capsule endoscopy has become a widely utilized tool for diagnosing intestinal diseases, yet its passive movement alongside intestinal peristalsis limits its effectiveness. To overcome this limitation, we propose an innovative solution in the form of an inchworm-like biomimetic magnetic-driven robotic shell tailored for capsule endoscopy within the intestinal tract. Our robotic shell employs a magnetic torsion spring (MTS) mechanism to enable extension and contraction motions under the influence of an external magnetic field. Furthermore, flexible bristles on its surface facilitate inchworm-like locomotion through a differential friction effect (DFE). A thorough analysis of the forces and torques involved in both the MTS-driven and crawling locomotion processes has been conducted to refine design and control strategies. We have fabricated a prototype of the robotic shell measuring 16mm in diameter and 31.3mm in length, integrating a commercial capsule endoscope for experimental validation. Rigorous testing across various environments, including phantoms and in-vitro intestine scenarios, demonstrates the active locomotion and crawling capabilities of the robotic shell. Experimental results indicate its effectiveness in advancing within the porcine intestine, achieving an average speed of 3.00mm/sec.
|
|
13:30-15:00, Paper TuBL-EX.17 | Add to My Program |
Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions |
|
Korekata, Ryosuke | Keio University |
Kaneda, Kanta | Keio University |
Nagashima, Shunya | Keio University |
Imai, Yuto | Keio University |
Sugiura, Komei | Keio University |
Keywords: Deep Learning Methods, Deep Learning for Visual Perception, AI-Enabled Robotics
Abstract: Our objective is to develop a domestic service robot (DSR) capable of following open-vocabulary instructions to transport everyday objects to the designated pieces of furniture. We specifically focus on a method that retrieves images of target objects and receptacles from pre-exploration images. Our proposed approach allows for retrieving images of both target objects and receptacles using multimodal foundation models. To validate our approach, we have developed a new dataset comprising real-world images from diverse building environments and instructions obtained from crowdsourcing. The results indicate that our method outperforms existing approaches in standard image retrieval metrics. Moreover, we showcase the effectiveness of our method on a standard DSR platform, achieving an 82% task success rate despite the zero-shot transfer setting.
|
|
13:30-15:00, Paper TuBL-EX.18 | Add to My Program |
Bionic Hierarchical-Embodiment Design for Soft Robot Actuation-Perception Synergy |
|
Xu, Zhidong | HIT |
Shi, Peipei | State Key Laboratory of Robotics and Systems, Harbin Institute O |
Cao, Liyong | HIT |
Yan, Jihong | Harbin Institute of Technology |
Zhao, Jie | Harbin Institute of Technology |
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design
Abstract: Actuation-sensing integration is one of the keys of performance improvement for soft robots. However, highly nonlinear electro-mechanical coupling between actuators and sensors usually causes their functional compromise after integration. Inspired by the hierarchical-embodiment structures of octopus tentacles, an actuation-perception synergy design strategy is proposed. The soft module with three preprogrammed shapes presents high-sensitivity (GF=17.3) proprioception and sequentially variable stiffness (5.7×) without compromising the original compliance. The sensitivity of the proprioceptors can be accurately predicted and regulated by optimizing their topological structures with a proposed numerical design method. With the help of actuation-sensing functional synergy, a soft crawler can creep through a narrow space, and a gripper can successfully recognize random-placed objects with 1 mm resolution.
|
|
13:30-15:00, Paper TuBL-EX.19 | Add to My Program |
CAD-Informed Uncertainty-Aware Robotic Assembly Sequences |
|
Kiyokawa, Takuya | Osaka University |
Rodriguez Brena, Ismael Valentin | German Aerospace Center (DLR) |
Nottensteiner, Korbinian | German Aerospace Center (DLR) |
Lehner, Peter | German Aerospace Center (DLR) |
Eiband, Thomas | German Aerospace Center (DLR) |
Roa, Maximo A. | German Aerospace Center (DLR) |
Harada, Kensuke | Osaka University |
Keywords: Assembly, Planning, Scheduling and Coordination, Planning under Uncertainty
Abstract: This study addresses a multi-objective optimization problem in the planning of uncertainty-aware sequence and motion for mechanical products with intricate structures and numerous contact areas. The proposed pipeline involves planning several elements, including assembly order of parts, object placement pose, grasp, and arm trajectory. To generate an optimized sequence and motion that satisfies multiple conditions under mandatory requirements, we use a multi-objective optimization algorithm inspired by Non-Dominated Sorting Genetic Algorithm III (NSGA-III), along with contact-rich robotic assembly-oriented constraints and objective functions. The proposed pipeline takes as input the CAD models of robot hardware, workspace, and assembled parts, conducts 3D geometrical and physical simulations of assembly motions, and then optimizes the assembly plan, including parts order, object placement pose, state transition, grasp, and trajectory for the real robot to execute. The key component of the proposed pipeline is the uncertainty-aware ConCERRT-based state transition planner. Our experiments on assembly planning for a chainsaw product demonstrated that the proposed method can generate constraint-satisfied assembly plans with a 100% success rate while lowering uncertainty in simulations.
|
|
13:30-15:00, Paper TuBL-EX.20 | Add to My Program |
Insect-Inspired Perception and Navigation Systems |
|
Ye, Lingjian | Chinese Academy of Sciences |
Zhou, Yimin | Chinese Academy of Sciences |
Zhang, Qieshi | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Keywords: Biomimetics, Perception-Action Coupling, Vision-Based Navigation
Abstract: For the obstacle avoidance and navigation problems of robots, the significant contradiction is the limited computational resources and efficient behavior, which draws the attention of exploring biological responsive mechanisms, i.e., simple insect structures. Insects demonstrate remarkable abilities in interacting with dynamic cluttered scenes, such as collision avoidance and navigation. Inspired by the visual intelligence of insects, we study the environmental adaptability of Lobula Giant Movement Detectors (LGMDs) and Direction Selective Neurons (DSNs) perception models based on feedforward inhibition and feedback membrane potential gradient perception gain mechanisms. Subsequently, these models are combined with a biomimetic navigation model based on the PI mechanism to achieve rapid obstacle avoidance and real-time navigation in unknown environments. The effectiveness and efficiency of the biomimetic perception model are validated under real-world scenarios, while the effectiveness of the fused model is tested under simulated scenarios. Future improvements will involve to explore the unified control mechanisms for biological perception and motion decision-making or alteration in the fusion of different strategies.
|
|
13:30-15:00, Paper TuBL-EX.21 | Add to My Program |
Cooperative Control of Two Magnetically Driven Microrobots for Automated Assembly |
|
Huang, Chenyang | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Cai, Mingxue | The Chinese University of Hong Kong (CUHK), Shatin NT, Hong Kong |
Xu, Sheng | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Xu, Tiantian | Chinese Academy of Sciences |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Cooperating Robots
Abstract: In this work, we propose a novel cooperative control strategy of two magnetic millirobots based on their velocity difference for the automatic manipulation and assembly of objects. The velocity difference between the two millirobots is modeled and controlled by the inclination angle of the external oscillating magnetic field, thus cooperatively transporting and steering arbitrarily shaped objects in contact. Controlling the positions and formations of two millirobots simultaneously in the same magnetic field makes the system underdriven because there are more states than control inputs. A cooperative control method of the two-millirobot team inspired by differential wheels is proposed to address the underdriven problem, thus enabling closed-loop control of two millirobots for collaborative object transport. Considering the efficiency and collision avoidance of high-precision assembly of multiple objects, an optimal task allocation method based on the shortest total transport path and an improved sampling-based path planning method in real-time are proposed. Experiments demonstrated that the millirobot team could cooperatively transport objects 4 times heavier and 26 times larger than a single robot in closedloop path-following control. Furthermore, the millirobot team could autonomously assemble 3 strip objects to the desired shape with a position error of less than 2.16 mm and the orientation angle error of fewer than 5.07 degrees.
|
|
13:30-15:00, Paper TuBL-EX.22 | Add to My Program |
Magnetic Small-Scale Fish-Like Robot Motion Control by Broad Learning System for Obstacle Avoidance |
|
Xu, Sheng | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Cai, Mingxue | The Chinese University of Hong Kong (CUHK), Shatin NT, Hong Kong |
Huang, Chenyang | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Xu, Tiantian | Chinese Academy of Sciences |
Keywords: Learning from Demonstration, Automation at Micro-Nano Scales, Learning from Experience
Abstract: A new control policy using the broad learning system (BLS) is proposed for robot point-reaching motion control. A sample application of steering a magnetic small-scale robot to move across obstacles is designed. The detailed mathematical modeling of the dynamic systems can be avoided by the proposed learning method. The parameter constraints of the BLS-based controller are derived using Lyapunov theory. A constrained training process obtains the final motion controller. Finally, the effectiveness of the proposed method is demonstrated by the convergence of the artificial magnetic fish motion to the targeted area, successfully avoiding obstacles.
|
|
13:30-15:00, Paper TuBL-EX.23 | Add to My Program |
Tension Maintenance Mechanism for Robust Control of Twisted String Actuation-Based Hyper-Redundant Manipulator |
|
Cho, Minjae | KAIST |
Yi, Yesung | Korea Advanced Institute of Science and Technology |
Kyung, Ki-Uk | Korea Advanced Institute of Science & Technology (KAIST) |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Redundant Robots
Abstract: Hyper-redundant manipulators have been developed for hazardous environment exploration due to their flexibility and high agility in workplace. In this research, we designed a hyper-redundant manipulator by integrating Twisted String Actuators (TSAs) and Rolling Contact Joints (RCJs) to overcome the limitations of traditional cable-driven system, such as difficulties with long-distance power transmission, and to achieve high payload capability with a compact design. To prevent instantaneous tension loss by the slack and maintain the contraction ratio of TSA according to motor rotation to robustly control the manipulator, we proposed a tension maintenance mechanism using compression springs at the distal end of the manipulator. Additionally, to reduce loss from string contact friction, spring sheaths were inserted along the joint holes. Our approach enhances the repeatability and position controllability of the manipulator. We noted a 57.7% reduction of error in repeatability test along with 35.9% and 38.8% improvements in piecewise position control accuracy and precision compared to a conventional manipulator, respectively, leading to enhanced controllability. We also experimentally verified that the proposed manipulator can maintain its trajectory with a variance of less than 2.83% up to 1600 g. Overall, our manipulator has the potential to expand the exploration environments in which robots can be used by simultaneously demonstrating large payload and controllability.
|
|
13:30-15:00, Paper TuBL-EX.24 | Add to My Program |
An Eight-Neuron Network for Quadruped Locomotion with Hip-Knee Joint Control |
|
Liu, Xiyan | Zhejiang University |
Keywords: Legged Robots, Biomimetics, Motion Control
Abstract: The gait generator, capable of producing rhythmic signals for coordinating multiple joints, is crucial for quadruped robot locomotion control. In biology, the Central Pattern Generator is the counterpart of the gait generator, which is a small neural network with interacting neurons. Researchers have designed artificial neural networks composed of simulated neurons or oscillator equations inspired by this architecture. However, there are some issues such as overlooking the relationship between signal spatiotemporal symmetry and network structural symmetry, which leads to fewer gait patterns being produced compared to biological counterparts. Additionally, the architecture is relatively simple, limiting the controllable degrees of freedom. Lastly, neurons are modeled as oscillators, so transitions require altering the coupling relationships between oscillators rather than adjusting neuron stimulation. In this paper, we utilize symmetry theory to design an eight-neuron network composed of Stein neuron models, which can achieve five gaits and coordinated control of the hip-knee joint. We validate its signal stability through numerical simulations, revealing various results and patterns encountered when implementing gait transitions using neuron stimulation. Thus, we design four gait transition strategies. Using a commercial quadruped robot model, we demonstrate the feasibility of this network by implementing motion control and gait transitions through a simple mapping method.
|
|
13:30-15:00, Paper TuBL-EX.25 | Add to My Program |
Design, Modelling and Control of a Soft Reconfigurable Gripper Using Reinforcement Learning |
|
Vatsal, Vighnesh | Tata Consultancy Services |
George, Nijil | TCS Research & Innovation |
Lima, Rolif | TCS Research |
Das, Kaushik | TATA Consultancy Service |
Keywords: Grasping, Soft Robot Applications, Deep Learning in Grasping and Manipulation
Abstract: Grasping and manipulation are fundamental capabilities for robots. Conventional approaches, such as rigid parallel-jaw or vacuum-suction grippers are unable to handle the full spectrum of objects, especially fragile or deformable ones. Soft robotic grasping has emerged as a possible solution for these types of objects in applications such as agriculture and retail stores. Leveraging recent advances in materials, actuators and simulation tools, we have developed a cable-driven soft robotic gripper that can actuate each finger individually and dynamically change its finger configuration to adapt to a target object. In order to achieve stable grasps for a variety of objects, we modelled the gripper in a PyBullet simulation environment, ensuring fidelity of the compliant elements with the physical system through a vision-based system identification procedure. We then adapted an existing vision-based grasp planner and applied it in a reinforcement learning framework by expanding the state space to include the deformations and reconfigurations of the fingers. This allows for the exploration and discovery of grasp synergies that can handle irregularly-shaped objects, while ensuring safety through the compliant nature of the fingers. After training this framework in simulation, we aim to deploy it in the real world with closed-loop control using haptic sensors, and testing it out on fragile objects in a retail setting.
|
|
13:30-15:00, Paper TuBL-EX.26 | Add to My Program |
C-Arm Unleashed: Intuitive Inter-Operative Positioning of C-Arms Using Wearable Gesture Detection |
|
Ouyang, Jingyu | FAU Erlangen-Nuernberg |
Egle, Fabio Andre | Assisitive Intelligent Robotics Lab, Department AIBE, FAU Erlang |
Igney, Claudia | Siem |
Mutzke, Thomas | Siemens Healthineers |
Dahmani, Chiheb | Siemens Healthineers |
Castellini, Claudio | Friedrich-Alexander-Universität Erlangen-Nürnberg |
Thuerauf, Sabine | Friedrich-Alexander-University Erlangen-Nuremberg |
Keywords: Neurorobotics, Intention Recognition, Medical Robots and Systems
Abstract: Imaging equipment used in surgical and interventional procedures requires accurate positioning and often anticipative or very reactive intra-operative adjustment to ensure optimal clinical outcomes. The common way to control such imaging systems is joysticks and buttons operated by the physicians themselves or, more frequently, by their radiographers (or radiologic technologists). However, in the latter case, communication between both can be easily affected by the loud and distracting environment of the operating room, the lack of experience of the radiographers, or the stressful emergency character of certain situations. Miscommunication and misunderstandings concerning the positioning of the imaging equipment can lead to suboptimal image guidance, longer procedure times, and even clinical mistakes. Different alternative concepts, where physicians are able to control the imaging systems by themselves and without assistance, have already been tested and reported. However, these approaches, including voice control, were neither robust nor safe enough to be adapted into clinical routine. We introduce a new wearable EMG- and IMU-based intuitive control concept for imaging systems, especially mobile surgical C-arms, to address the mentioned shortcomings. The developed control algorithm was evaluated in a TAC test with a simulation. Apart from the TAC test, we also assessed workload, usability, likeability, and perceived safety.
|
|
TuCA1-CC Award Session, CC-Main Hall |
Add to My Program |
Medical Robotics |
|
|
Chair: Lueth, Tim C. | Technical University of Munich |
Co-Chair: Althoefer, Kaspar | Queen Mary University of London |
|
16:30-18:00, Paper TuCA1-CC.1 | Add to My Program |
Exoskeleton-Mediated Physical Human-Human Interaction for a Sit-To-Stand Rehabilitation Task |
|
Vianello, Lorenzo | Shirley Ryan Ability Lab |
Kucuktabak, Emek Baris | Northwestern University, Shirley Ryan Ability Lab |
Short, Matthew | Northwestern University, Shirley Ryan AbilityLab |
Lhoste, Clément | Northwestern University |
Amato, Lorenzo | Scuola Superiore Sant'Anna |
Lynch, Kevin | Northwestern University |
Pons, Jose L. | Shirley Ryan AbilityLab |
Keywords: Rehabilitation Robotics, Physical Human-Robot Interaction, Prosthetics and Exoskeletons
Abstract: Sit-to-Stand (StS) is a fundamental daily activity that can be challenging for stroke survivors due to strength, motor control, and proprioception deficits in their lower limbs. Existing therapies involve repetitive StS exercises, but these can be physically demanding for therapists while assistive devices may limit patient participation and hinder motor learning. To address these challenges, this work proposes the use of two lower-limb exoskeletons to mediate physical interaction between therapists and patients during a StS rehabilitative task. This approach offers several advantages, including improved therapist-patient interaction, safety enforcement, and performance quantification. The whole body control of the two exoskeletons transmits online feedback between the two users, but at the same time assists in movement and ensures balance, and thus helping subjects with greater difficulty. In this study we present the architecture of the framework, presenting and discussing some technical choices made in the design.
|
|
16:30-18:00, Paper TuCA1-CC.2 | Add to My Program |
Intraoperatively Iterative Hough Transform Based In-Plane Hybrid Control of Arterial Robotic Ultrasound for Magnetic Catheterization |
|
Li, Zhengyang | University of Macau |
Yeerbulati, Magejiang | University of Macau |
Xu, Qingsong | University of Macau |
Keywords: Compliant Joints and Mechanisms, Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems
Abstract: This paper presents an intraoperatively iterative Hough transform (IHT) based in-plane hybrid control of extracorporeal ultrasound (US) guided magnetic catheterization for arterial intervention. One uniqueness lies in that both control and tracking of the arterial robotic ultrasound end-effector have been implemented to improve performance. Firstly, the magnetic catheter model and hybrid visual/force servoing control scheme of the extracorporeal ultrasound-integrated tracking arm (EUTA) are derived based on the interaction Jacobian matrix and impedance modeling. Meanwhile, we implement a tracking method of in-plane ultrasound catheter's tip and detection of vascular boundaries utilizing intensity-level iterative Hough-transform with Iterative End-Ponit Fitting (IEPF). The effectiveness of the proposed control and tracking method has been verified by conducting in vitro experimental studies for catheter steering of a soft tissue-imitating phantom. Results show that an average steering error of 0.56 mm and signal-to-noise-ratio (SNR) of 12.2 are obtained for the ultrasound imaging at high synchronization along with a low target lost rate (15.8%) and constant-force tracking (2.50pm1.02 N).
|
|
16:30-18:00, Paper TuCA1-CC.3 | Add to My Program |
Efficient Model Learning and Adaptive Tracking Control of Magnetic Micro-Robots for Non-Contact Manipulation |
|
Jia, Yongyi | Tsinghua University |
Miao, Shu | Tsinghua University |
Zhou, Junjian | Shenyang Institute of Automation, Chinese Academy of Sciences |
Jiao, Niandong | Shenyang Institute of Automation, Chinese Academy of Sciences |
Liu, Lianqing | Shenyang Institute of Automation |
Li, Xiang | Tsinghua University |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Motion Control
Abstract: Magnetic microrobots can be navigated by an external magnetic field to autonomously move within living organisms with complex and unstructured environments. Potential applications include drug delivery, diagnostics, and therapeutic interventions. Existing techniques commonly impart magnetic properties to the target object, or drive the robot to contact and then manipulate the object, both probably inducing physical damage. This paper considers a non-contact formulation, where the robot spins to generate a repulsive field to push the object without physical contact. Under such a formulation, the main challenge is that the motion model between the input of the magnetic field and the output velocity of the target object is commonly unknown and difficult to analyze. To deal with it, this paper proposes a data-driven-based solution. A neural network is constructed to efficiently estimate the motion model. Then, an approximate model-based optimal control scheme is developed to push the object to track a time-varying trajectory, maintaining the non-contact with distance constraints. Furthermore, a straightforward planner is introduced to assess the adaptability of non-contact manipulation in a cluttered unstructured environment. Experimental results are presented to show the tracking and navigation performance of the proposed scheme.
|
|
16:30-18:00, Paper TuCA1-CC.4 | Add to My Program |
Design and Implementation of a Robotized Hand-Held Dissector for Endoscopic Pulmonary Endarterectomy |
|
Zhu, Runfeng | The Hong Kong Polytechnic University |
Hou, Xilong | Hong Kong Institute of Science and Innovation Chinese Academy Of |
Huang, Wei | CAIR |
Du, Lei | Sichuan University |
Wu, Zhong | West China Hospital, Sichuan University |
Liu, Hongbin | Hong Kong Institute of Science & Innovation, Chinese Academy Of |
Chu, Henry | The Hong Kong Polytechnic University |
Zhao, Qing xiang | Hong Kong Institute of Science & Innovation, Centre for Artifici |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Mechanism Design, Force Control
Abstract: Severe chronic pulmonary endarterectomy needs a dissector to delicately remove proliferative intima located in the depth of the pulmonary artery. This work proposed a novel endoscopic robotized steerable dissector for this surgery, enabling easier access to curved deep artery branches. The handheld surgical dissector also provides suction and visualization for surgeons to enhance effectiveness. The steerable section is a cable-driven hinged structure, and through an antagonistic mechanism regulating the cable tension, the overall stiffness is adjusted to adapt to various surroundings. The mapping between actuation space and shape configuration and tip force estimation model are respectively established for further closed-loop control scheme, achieving adaptive positioning and safe surgery. Experiments first demonstrate the feasibility of the proposed models and ex vivo trials validated the usage and effectiveness of the robotized dissector.
|
|
16:30-18:00, Paper TuCA1-CC.5 | Add to My Program |
Colibri5: Real-Time Monocular 5-DoF Trocar Pose Tracking for Robot-Assisted Vitreoretinal Surgery |
|
Dehghani, Shervin | TUM |
Sommersperger, Michael | Technical University of Munich |
Saleh, Mahdi | Technical University Munich |
Alikhani, Alireza | Augen Klinik Und Poliklinik, Klinikum Rechts Der Isar Der Techn |
Busam, Benjamin | Technical University of Munich |
Gehlbach, Peter | Johns Hopkins Medical Institute |
Iordachita, Ioan Iulian | Johns Hopkins University |
Navab, Nassir | TU Munich |
Nasseri, M. Ali | Technische Universitaet Muenchen |
Keywords: Computer Vision for Medical Robotics, Vision-Based Navigation, Visual Tracking
Abstract: Retinal surgery is a complex medical procedure that requires high precision dexterity to perform delicate instrument maneuvers with sub-millimeter accuracy. Minimizing the manual tremor and achieving precise and repeatable execution of surgical tasks has motivated the development of robotic platforms to overcome the limitations of manual surgery. However, specific tasks, such as instrument insertion through the trocar, are more challenging in robotic surgery than in conventional manual procedures since the robot control is often optimized for navigation inside the eye. This challenges the integration of robotic systems, creating a high cognitive load on the operator and prolonging the surgery time. Moreover, misalignment of the robot's remote center of motion (RCM) and trocar position during the procedure can lead to excessive forces between the instrument and the trocar, potentially causing patient trauma. Precise and rapid localization of the trocars enables the automation of the insertion procedure and dynamic compensation of eye motion. In this work, we present a real-time marker-less method for 3D pose tracking of trocar, achieved with only a single monocular camera. Our experiments show promising results towards real-time trocar pose estimation and tracking, achieving an average error of 3 degrees in trocar orientation estimation, with an average processing time of 15 fps. This could serve as a foundation to improve robotic systems' automation, integration, and efficiency of robotic systems for retinal surgery. The dataset created for this work is made publicly available.
|
|
16:30-18:00, Paper TuCA1-CC.6 | Add to My Program |
Hybrid Volitional Control of a Robotic Transtibial Prosthesis Using a Phase Variable Impedance Controller |
|
Posh, Ryan | University of Notre Dame |
Tittle, Jonathan Allen | University of Notre Dame |
Kelly, David | University of Notre Dame |
Schmiedeler, James | University of Notre Dame |
Wensing, Patrick M. | University of Notre Dame |
Keywords: Prosthetics and Exoskeletons
Abstract: For robotic transtibial prosthesis control, the global tibia kinematics can be used to monitor gait cycle progression and command smooth and continuous actuation. In this work, these global tibia kinematics define a phase variable impedance controller (PVIC), which is implemented as the non-volitional base controller within a hybrid volitional control framework (PVI-HVC). The gait progression estimation and biomechanic performance of one able-bodied individual walking on a robotic ankle prosthesis via a bypass adapter are compared for three control schemes: benchmark passive controller, PVIC, and PVI-HVC. The different actuation of each had a direct effect on the global tibia kinematics, but the average deviation between the estimated and ground truth gait percentages were 1.6%, 1.8%, and 2.1%, respectively, for each controller. Both PVIC and PVI-HVC produced good agreement with able-bodied kinematic and kinetic references. As designed, PVI-HVC results were similar to those of PVIC when the user used low volitional intent, but yielded higher peak plantarflexion, peak torque, and peak power when the user commanded high volitional input in late stance. This additional torque and power also allowed the user to volitionally and continuously achieve activities beyond level walking, such as ascending ramps, avoiding obstacles, standing on tip-toes, and tapping the foot. In this way, PVI-HVC offers the kinetic and kinematic performance of the PVIC during level ground walking, along with the freedom to volitionally pursue alternative activities.
|
|
TuCA2-CC Award Session, CC-301 |
Add to My Program |
Multi-Robot Systems |
|
|
Chair: Kelly, Jonathan | University of Toronto |
Co-Chair: Sabattini, Lorenzo | University of Modena and Reggio Emilia |
|
16:30-18:00, Paper TuCA2-CC.1 | Add to My Program |
Do We Run Large-Scale Multi-Robot Systems on the Edge? More Evidence for Two-Phase Performance in System Size Scaling |
|
Kuckling, Jonas | University of Konstanz |
Luckey, Robin | Institute of Computer Engineering, University of Lübeck |
Avrutin, Viktor | Institute for Systems Theory and Automatic Control, University O |
Vardy, Andrew | Memorial University of Newfoundland |
Reina, Andreagiovanni | Université Libre De Bruxelles |
Hamann, Heiko | University of Konstanz |
Keywords: Swarm Robotics, Multi-Robot Systems
Abstract: With increasing numbers of mobile robots arriving in real-world applications, more robots coexist in the same space, interact, and possibly collaborate. Methods to provide such systems with system size scalability are known, for example, from swarm robotics. Example strategies are self-organizing behavior, a strict decentralized approach, and limiting the robot-robot communication. Despite applying such strategies, any multi-robot system breaks above a certain critical system size (i.e., number of robots) as too many robots share a resource (e.g., space, communication channel). We provide additional evidence based on simulations, that at these critical system sizes, the system performance separates into two phases: nearly optimal and minimal performance. We speculate that in real-world applications that are configured for optimal system size, the supposedly high-performing system may actually live on borrowed time as it is on a transient to breakdown. We provide two modeling options (based on queueing theory and a population model) that may help to support this reasoning.
|
|
16:30-18:00, Paper TuCA2-CC.2 | Add to My Program |
Learning for Dynamic Subteaming and Voluntary Waiting in Heterogeneous Multi-Robot Collaborative Scheduling |
|
Jose, Williard Joshua | University of Massachusetts Amherst |
Zhang, Hao | University of Massachusetts Amherst |
Keywords: Multi-Robot Systems, Imitation Learning
Abstract: Coordinating heterogeneous robots is essential for autonomous multi-robot teaming. To execute a set of dependent tasks as quickly as possible, and to complete tasks that cannot be addressed by individual robots, it is necessary to form subteams that can collaboratively finish the tasks. It is also advantageous for robots to wait for teammates and tasks to become available in order to form better subteams or reduce the overall completion time. To enable both abilities, we introduce a new graph learning approach that formulates heterogeneous collaborative scheduling as a bipartite matching problem that maximizes a reward matrix learned via imitation learning. We design a novel graph attention transformer network (GATN) that represents the problem of collaborative scheduling as a bipartite graph, and integrates both local and global graph information to estimate the reward matrix using graph attention networks and transformers. By relaxing the constraint of one-to-one correspondence in bipartite matching, our approach allows multiple robots to address the same task as a subteam. Our approach also enables voluntary waiting by introducing an idle task that the robots can select to wait. Experimental results have shown that our approach well addresses heterogeneous collaborative scheduling with dynamic subteam formation and voluntary waiting, and outperforms the previous and baseline methods.
|
|
16:30-18:00, Paper TuCA2-CC.3 | Add to My Program |
Asynchronous Distributed Smoothing and Mapping Via On-Manifold Consensus ADMM |
|
McGann, Daniel | Carnegie Mellon University |
Lassak, Kyle | Astrobotic Technology, Inc |
Kaess, Michael | Carnegie Mellon University |
Keywords: Multi-Robot SLAM, SLAM, Distributed Robot Systems
Abstract: In this paper we present a fully distributed, asynchronous, and general purpose optimization algorithm for Consensus Simultaneous Localization and Mapping (CSLAM). Multi-robot teams require that agents have timely and accurate solutions to their state as well as the states of the other robots in the team. To optimize this solution we develop a CSLAM back-end based on Consensus ADMM called MESA (Manifold, Edge-based, Separable ADMM). MESA is fully distributed to tolerate failures of individual robots, asynchronous to tolerate communication delays and outages, and general purpose to handle any CSLAM problem formulation. We demonstrate that MESA exhibits superior convergence rates and accuracy compare to existing state-of-the art CSLAM back-end optimizers.
|
|
16:30-18:00, Paper TuCA2-CC.4 | Add to My Program |
Uncertainty-Bounded Active Monitoring of Unknown Dynamic Targets in Road-Networks with Minimum Fleet |
|
Wang, Shuaikang | Peking University |
Kantaros, Yiannis | Washington University in St. Louis |
Guo, Meng | Peking University |
Keywords: Multi-Robot Systems, Task and Motion Planning, Integrated Planning and Control
Abstract: Fleets of unmanned robots can be beneficial for the long-term monitoring of large areas, e.g., to monitor wild flocks, detect intruders, search and rescue. Monitoring numerous dynamic targets in a collaborative and efficient way is a challenging problem that requires online coordination and information fusion. The majority of existing works either assume a passive all-to-all observation model to minimize the summed uncertainties over all targets by all robots, or optimize over the jointed discrete actions while neglecting the dynamic constraints of the robots and unknown behaviors of the targets. This work proposes an online task and motion coordination algorithm that ensures an explicitly-bounded estimation uncertainty for the target states, while minimizing the average number of active robots. The robots have a limited-range perception to actively track a limited number of targets simultaneously, of which their future control decisions are all unknown. It includes: (i) the assignment of monitoring tasks, modeled as a flexible size multiple vehicle routing problem with time windows (m-MVRPTW), given the predicted target trajectories with uncertainty measure in the road-networks; (ii) the nonlinear model predictive control (NMPC) for optimizing the robot trajectories under uncertainty and safety constraints. It is shown that the robots can switch between active and inactive roles dynamically online as required by the unknown monitoring task. The proposed methods are validated via large-scale simulations of up to 100 robots and targets.
|
|
16:30-18:00, Paper TuCA2-CC.5 | Add to My Program |
Observer-Based Distributed MPC for Collaborative Quadrotor-Quadruped Manipulation of a Cable-Towed Load |
|
Xu, Shaohang | Huazhong University of Science and Technology |
Wang, Yi'an | Huazhong University of Science and Technology |
Zhang, Wentao | Huazhong University of Science and Technology |
Ho, Chin Pang | City University of Hong Kong |
Zhu, Lijun | Huazhong University of Science and Technology |
Keywords: Distributed Robot Systems, Swarm Robotics, Legged Robots
Abstract: This paper presents a collaborative quadrotor-quadruped robot system for the manipulation of a cable-towed payload. In particular, we aim to solve the challenge from the unknown dynamics of the cable-towed payload. To this end, we first propose novel dynamic models for both the quadrotor and the quadruped robot, taking into account the nonlinear robot dynamics and the uncertainties associated with the cable-towed load. Moreover, we design observers for the hybrid interaction between the robots and the payload. Theoretically, the convergence of these observers is analyzed using Lyapunov functions under mild technical assumptions. Finally, we seamlessly integrate the dynamics models and the observers into a distributed Model Predictive Control (MPC) framework with kinematics limitations and collision avoidance constraints. The proposed system is validated through challenging field experiments in indoor and outdoor environments, involving push disturbances, varying and unknown payloads, uneven terrains, etc.
|
|
TuCT1-CC Oral Session, CC-303 |
Add to My Program |
Planning under Uncertainty III |
|
|
Chair: Montijano, Eduardo | Universidad De Zaragoza |
Co-Chair: Ishigami, Genya | Keio University |
|
16:30-18:00, Paper TuCT1-CC.1 | Add to My Program |
Multi-Sample Long Range Path Planning under Sensing Uncertainty for Off-Road Autonomous Driving |
|
Schmittle, Matt | University of Washington |
Baijal, Rohan | University of Washington |
Hou, Brian | University of Washington |
Srinivasa, Siddhartha | University of Washington |
Boots, Byron | University of Washington |
Keywords: Planning under Uncertainty, Motion and Path Planning
Abstract: We focus on the problem of long-range dynamic replanning for off-road autonomous vehicles, where a robot plans paths through a previously unobserved environment while continuously receiving noisy local observations. An effective approach for planning under sensing uncertainty is deter- minization, where one converts a stochastic world into a de- terministic one and plans under this simplification. This makes the planning problem tractable, but the cost of following the planned path in the real world may be different than in the determinized world. This causes collisions if the determinized world optimistically ignores obstacles, or causes unnecessarily long routes if the determinized world pessimistically imagines more obstacles. We aim to be robust to uncertainty over potential worlds while still achieving the efficiency benefits of determinization. We evaluate algorithms for dynamic replanning on a large real-world dataset of challenging long-range planning problems from the DARPA RACER program. Our method, Dynamic Replanning via Evaluating and Aggregating Multiple Samples (DREAMS), outperforms other determinization-based approaches in terms of combined traversal time and collision cost. https://sites.google.com/cs.washington.edu/dreams/
|
|
16:30-18:00, Paper TuCT1-CC.2 | Add to My Program |
Perceptual Factors for Environmental Modeling in Robotic Active Perception |
|
Morilla-Cabello, David | Universidad De Zaragoza |
Westheider, Jonas | University Bonn |
Popovic, Marija | University of Bonn |
Montijano, Eduardo | Universidad De Zaragoza |
Keywords: Planning under Uncertainty, Reactive and Sensor-Based Planning, Semantic Scene Understanding
Abstract: Accurately assessing the potential value of new sensor observations is a critical aspect of planning for active perception. This task is particularly challenging when reasoning about high-level scene understanding using measurements from vision-based neural networks. Due to appearance-based reasoning, the measurements are susceptible to several environmental effects such as the presence of occluders, variations in lighting conditions, and redundancy of information due to similarity in appearance between nearby viewpoints. To address this, we propose a new active perception framework incorporating an arbitrary number of perceptual effects in planning and fusion. Our method models the correlation with the environment by a set of general functions termed perceptual factors to construct a perceptual map, which quantifies the aggregated influence of the environment on candidate viewpoints. This information is seamlessly incorporated into the planning and fusion processes by adjusting the uncertainty associated with measurements to weigh their contributions. We evaluate our perceptual maps in a simulated environment that reproduces environmental conditions common in robotics applications. Our results show that, by accounting for environmental effects within our perceptual maps, we improve the state estimation by correctly selecting the viewpoints and considering the measurement noise correctly when affected by environmental factors. We furthermore deploy our approach on a ground robot to showcase its applicability for real-world active perception missions.
|
|
16:30-18:00, Paper TuCT1-CC.3 | Add to My Program |
Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment |
|
Puthumanaillam, Gokul | University of Illinois Urbana-Champaign |
Liu, Xiangyu | University of Cyprus |
Mehr, Negar | University of California Berkeley |
Ornik, Melkior | University of Illinois Urbana-Champaign |
Keywords: Planning under Uncertainty, Planning, Scheduling and Coordination, Autonomous Agents
Abstract: Optimal decision-making presents a significant challenge for autonomous systems operating in uncertain, stochastic and time-varying environments. Environmental variability over time can significantly impact the system's optimal decision making strategy for mission completion. To model such environments, our work combines the previous notion of Time-Varying Markov Decision Processes (TVMDP) with partial observability and introduces Time-Varying Partially Observable Markov Decision Processes (TV-POMDP). We propose a two-pronged approach to accurately estimate and plan within the TV-POMDP: 1) Memory Prioritized State Estimation (MPSE), which leverages weighted memory to provide more accurate time-varying transition estimates; and 2) an MPSE-integrated planning strategy that optimizes long-term rewards while accounting for temporal constraint. We validate the proposed framework and algorithms using simulations and hardware, with robots exploring a partially observable, time-varying environments. Our results demonstrate superior performance over standard methods, highlighting the framework's effectiveness in stochastic, uncertain, time-varying domains.
|
|
16:30-18:00, Paper TuCT1-CC.4 | Add to My Program |
Choosing the Right Tool for the Job: Online Decision Making Over SLAM Algorithms |
|
Nashed, Samer | University of Massachusetts Amherst |
Grupen, Rod | University of Massachusetts |
Zilberstein, Shlomo | University of Massachusetts |
Keywords: Planning under Uncertainty, SLAM, Localization
Abstract: Nearly all state-of-the-art SLAM algorithms are designed to exploit patterns in data from specific sensing modalities, such as time-of-flight and structured light depth sensors, or RGB cameras. This specialization increases localization accuracy in domains where the given modality detects many high-quality features, but comes at the cost of decreasing performance in other, less favorable environments. For robotic systems that may experience a wide variety of sensing conditions, this difficulty in generalization presents a significant challenge. In this paper, we propose running several computationally cheap SLAM front ends in parallel and choosing the most promising feature set online. This problem is similar to the Algorithm Selection Problem (ASP), but has several complicating factors that preclude application of existing methods. We first provide an extension of the ASP formalism that captures the unique challenges in the SLAM setting, and then, based on this formalism, we propose modeling the SLAM ASP as a partially observable Markov decision process (POMDP). Our experiments show that dynamically selecting SLAM front ends, even myopically, improves localization robustness compared to selecting a static front end, and that using a POMDP policy provides even greater improvement.
|
|
16:30-18:00, Paper TuCT1-CC.5 | Add to My Program |
ASPIRe: An Informative Trajectory Planner with Mutual Information Approximation for Target Search and Tracking |
|
Zhou, Kangjie | Peking University |
Wu, Pengying | Peking University |
Su, Yao | Beijing Institute for General Artificial Intelligence |
Gao, Han | Peking University |
Ma, Ji | Peking University |
Liu, Hangxin | Beijing Institute for General Artificial Intelligence (BIGAI) |
Liu, Chang | Peking University |
Keywords: Planning under Uncertainty, Motion and Path Planning
Abstract: This paper proposes an informative trajectory planning approach, namely, adaptive particle filter tree with sigma point-based mutual information reward approximation (ASPIRe), for mobile target search and tracking (SAT) in cluttered environments with limited sensing field of view. We develop a novel sigma point-based approximation to accurately estimate mutual information (MI) for general, non-Gaussian distributions utilizing particle representation of the belief state, while simultaneously maintaining high computational efficiency. Building upon the MI approximation, we develop the Adaptive Particle Filter Tree (APFT) approach with MI as the reward, which features belief state tree nodes for informative trajectory planning in continuous state and measurement spaces. An adaptive criterion is proposed in APFT to adjust the planning horizon based on the expected information gain. Simulations and physical experiments demonstrate that ASPIRe achieves real-time computation and outperforms benchmark methods in terms of both search efficiency and estimation accuracy.
|
|
16:30-18:00, Paper TuCT1-CC.6 | Add to My Program |
Preprocessing-Based Planning for Utilizing Contacts in Semi-Structured High-Precision Insertion Tasks |
|
Saleem, Muhammad Suhail | Carnegie Mellon University |
Veerapaneni, Rishi | Carnegie Mellon University |
Likhachev, Maxim | Carnegie Mellon University |
Keywords: Planning under Uncertainty, Manipulation Planning
Abstract: In manipulation tasks like plug insertion or assembly that have low tolerance to errors in pose estimation (errors of the order of 2mm cause task failure), the utilization of touch/contact modality can aid in accurately localizing the object of interest. Motivated by this, in this work we model high-precision insertion tasks as planning problems under pose uncertainty, where we effectively utilize the occurrence of contacts (or the lack thereof) as observations to reduce uncertainty and reliably complete the task. We present a preprocessing-based planning framework for high-precision insertion in repetitive and time-critical settings, where the set of initial pose distributions (identified by a perception system) is finite. The finite set allows us to enumerate the possible planning problems that can be encountered online and preprocess a database of policies. Due to the computational complexity of constructing this database, we propose a general experience-based POMDP solver, E-RTDP-Bel, that uses the solutions of similar planning problems as experience to speed up planning queries and use it to efficiently construct the database. We show that the developed algorithm speeds up database creation by over a factor of 100, making the process computationally tractable. We demonstrate the effectiveness of the proposed framework in a real-world plug insertion task in the presence of port position uncertainty and an assembly task in simulation in the presence of pose uncertainty.
|
|
16:30-18:00, Paper TuCT1-CC.7 | Add to My Program |
Vision-Based Uncertainty-Aware Motion Planning Based on Probabilistic Semantic Segmentation |
|
Römer, Ralf | Technical University of Munich |
Lederer, Armin | Technical University of Munich |
Tesfazgi, Samuel | Technical University of Munich |
Hirche, Sandra | Technische Universität München |
Keywords: Planning under Uncertainty, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: For safe operation, a robot must be able to avoid collisions in uncertain environments. Existing approaches for motion planning under uncertainties often assume parametric obstacle representations and Gaussian uncertainty, which can be inaccurate or result in trajectories with excessive cost. While visual perception can deliver a more accurate representation of the environment, its use for safe motion planning is limited by the inherent miscalibration of neural networks and the challenge of obtaining adequate datasets. To address these limitations, we propose to employ ensembles of deep semantic segmentation networks trained with massively augmented datasets to ensure reliable probabilistic occupancy information. For avoiding conservatism during motion planning, we directly employ the probabilistic perception in a scenario-based path planning approach. A velocity scheduling scheme is applied to the path to ensure a safe motion despite tracking inaccuracies. We demonstrate the effectiveness of the massive data augmentation in combination with deep ensembles and the proposed scenario-based planning approach in comparisons to state-of-the-art methods and validate our framework in an experiment with a human hand as obstacle.
|
|
16:30-18:00, Paper TuCT1-CC.8 | Add to My Program |
Chance-Constrained Multi-Robot Motion Planning under Gaussian Uncertainties |
|
Theurkauf, Anne | University of Colorado Boulder |
Kottinger, Justin | University of Colorado Boulder |
Ahmed, Nisar | University of Colorado Boulder |
Lahijanian, Morteza | University of Colorado Boulder |
Keywords: Motion and Path Planning, Multi-Robot Systems, Planning under Uncertainty
Abstract: We consider a chance-constrained multi-robot motion planning problem in the presence of Gaussian motion and sensor noise. Our proposed algorithm, CC-K-CBS, leverages the scalability of kinodynamic conflict-based search (K-CBS) in conjunction with the efficiency of Gaussian belief trees as used in the Belief-A framework, and inherits the completeness guarantees of Belief-A's low-level sampling-based planner. We also develop three different methods for robot-robot probabilistic collision checking, which trade off computation with accuracy. Our algorithm generates motion plans driving each robot from its initial to goal state while accounting for uncertainty evolution with chance-constrained safety guarantees. Benchmarks compare computation time to conservatism of the collision checkers, in addition to characterizing the performance of the planner as a whole. Results show that CC-K-CBS scales up to 30 robots.
|
|
16:30-18:00, Paper TuCT1-CC.9 | Add to My Program |
Uncertainty-Aware Trajectory Planning: Using Uncertainty Quantification and Propagation in Traversability Prediction of Planetary Rovers |
|
Takemura, Reiya | Keio University |
Ishigami, Genya | Keio University |
Keywords: Motion and Path Planning, Space Robotics and Automation, Wheeled Robots
Abstract: Motion planning for a planetary rover involves robotic traversability such that the rover can safely travel without mobility hazards. While conventional planners have primarily assessed rover traversability index with predetermined threshold values, the traversability index cannot be precisely predicted because of the measurement uncertainty of onboard mapping and motion uncertainty. This study presents an uncertainty-aware trajectory planning algorithm for the rover on rough and loose terrains. The planning algorithm involves new metrics that quantify heteroscedastic uncertainties in the rover traversability prediction model, which are dependent on terrain characteristics and robotic state/control. Further, uncertainty propagation extends the uncertainty metrics to explicitly consider the growth of uncertainty over time steps. The uncertainty metrics are used to assess tree extensions of the sampling-based search algorithm, enabling the trajectory planner to avoid the unexpected risk of vehicle rollover and extremely high slip. Simulation study confirms that the proposed algorithm achieves up to 20 % reduction of the probability of mobility hazards in real challenging terrains.
|
|
TuCT2-CC Oral Session, CC-311 |
Add to My Program |
Joint Mechanism |
|
|
Chair: Cho, Kyu-Jin | Seoul National University, Biorobotics Laboratory |
Co-Chair: Sadeghian, Hamid | Technical University of Munich |
|
16:30-18:00, Paper TuCT2-CC.1 | Add to My Program |
Perching and Grasping Using a Passive Dynamic Bioinspired Gripper |
|
Firouzeh, Amir | EPFL |
Lee, Jongeun | Seoul National University |
Yang, Hyunsoo | Seoul National University |
Lee, Dongjun | Seoul National University |
Cho, Kyu-Jin | Seoul National University, Biorobotics Laboratory |
Keywords: Compliant Joint/Mechanism, Grippers and Other End-Effectors, Underactuated Robots, Aerial Systems: Mechanics and Control
Abstract: The ability to grasp objects broadens the application range of unmanned aerial vehicles (UAVs) by allowing interactions with the environment. The difficulty in performing a mid-air grasp is the high probability of impact between the UAV’s foot and the target. For a successful grasp, the foot must smoothly absorb the energy of impact and simultaneously engage with the target in a short period of time. We present a bioinspired passive dynamic foot in which the claws are actuated solely by the impact energy. Our gripper simultaneously resolves the issue of smooth absorption of the impact energy and fast closure of the claws by linking the motion of an ankle linkage and the claws through soft tendons. We study the dynamics of impact and use the stiffness of the tendon as our design/control parameter to adjust the mechanics of the gripper for smooth recycling of the impact energy. Our gripper closes within 45 milliseconds after initial contact with the impacting object without requiring any controller or actuation energy. An electro-adhesive locking mechanism attached to the tendon locks the claws within 20 milliseconds after reaching closed configuration. We demonstrated the effectiven
|
|
16:30-18:00, Paper TuCT2-CC.2 | Add to My Program |
Self-Sensing Feedback Control of an Electrohydraulic Robotic Shoulder |
|
Christoph, Clemens Claudio | ETH Zürich |
Kazemipour, Amirhossein | ETH Zürich |
Vogt, Michel Ryan | ETH Zürich |
Zhang, Yu | ETH Zurich |
Katzschmann, Robert Kevin | ETH Zurich |
Keywords: Compliant Joints and Mechanisms, Biologically-Inspired Robots, Motion Control
Abstract: The human shoulder, with its glenohumeral joint, tendons, ligaments, and muscles, allows for the execution of complex tasks with precision and efficiency. However, current robotic shoulder designs lack the compliance and compactness inherent in their biological counterparts. A major limitation of these designs is their reliance on external sensors like rotary encoders, which restrict mechanical joint design and introduce bulk to the system. To address this constraint, we present a bio-inspired antagonistic robotic shoulder with two degrees of freedom powered by self-sensing hydraulically amplified self-healing electrostatic actuators. Our artificial muscle design decouples the high-voltage electrostatic actuation from the pair of low-voltage self-sensing electrodes. This approach allows for proprioceptive feedback control of trajectories in the task space while eliminating the necessity for any additional sensors. We assess the platform's efficacy by comparing it to a feedback control based on position data provided by a motion capture system. The study demonstrates closed-loop controllable robotic manipulators based on an inherent self-sensing capability of electrohydraulic actuators. The proposed architecture can serve as a basis for complex musculoskeletal joint arrangements.
|
|
16:30-18:00, Paper TuCT2-CC.3 | Add to My Program |
Design and Validation of a Variable Stiffness Spiral Cam Actuator |
|
Auer, Matthew | Arizona State University |
Joglekar, Suhrud Parag | Arizona State University |
Lee, Hyunglae | Arizona State University |
Keywords: Compliant Joints and Mechanisms, Compliance and Impedance Control
Abstract: This study presents the design and validation of a variable stiffness actuator incorporating multiple cam mechanisms. The actuator is intended for use in walking assistance, focusing on assisting individuals with diminished ankle function. This study highlights the advantages of variable stiffness actuators over traditional and other modern actuators in mobility assistance. The working principles of the proposed Variable Stiffness Spiral Cam Actuator (VS-SCA) are described, focusing on the cantilever beams with adjustable supports, main cam mechanism, and symmetric support positioning architecture utilizing an Archimedean spiral cam. The design and fabrication process are discussed, considering system design considerations, cantilever beam design, cam design, and spiral cam design. The analytical methodology used for validation is also presented, which connects the subsystems of the actuator and allows for the determination of effective torsional stiffness. The experimental validation showed that the VS-SCA provides a range of stiffness from 20 to 75 Nm/rad for dorsiflexion, necessary for providing ankle assistance during the push-off phase of walking, while maintaining low stiffness (4 - 12 Nm/rad) for plantarflexion not to hinder natural ankle motion in the swing phase.
|
|
16:30-18:00, Paper TuCT2-CC.4 | Add to My Program |
Hybrid Force-Position Control of an Elastic Tendon-Driven Scrubbing Robot (TEDSR) |
|
Harmatz, Noah | Rutgers University |
Zahra, Alina | Rutgers University |
Abdelmalak, Amir | Rutgers |
Purohit, Shivam | Rutgers University |
Shin, Trevor | Rutgers University |
Mazzeo, Aaron | Rutgers University |
Keywords: Compliant Joints and Mechanisms, Tendon/Wire Mechanism, Soft Sensors and Actuators
Abstract: There is a lack of cleaning robots dedicated to the scrubbing of contaminated surfaces. Contaminated surfaces in domestic and industrial settings typically require manual scrubbing which can be costly or hazardous. To address the opportunity to automate the scrubbing of surfaces, this work focuses on the use of series elastic actuators which can apply consistent trajectories of scrubbing force. Consistent force during scrubbing increases the rate of removal for a contaminant. An elastic robot which has rigid links and low-stiffness joints can perform friction-based cleaning of surfaces with complex geometries while maintaining consistent scrubbing force. This study uses a hybrid force-position control scheme and a low-cost elastic robot to perform scrubbing. This study observes the relationship between joint stiffness in the robot and the disturbance rejection for force-based control during scrubbing. There is growing demand for automated sanitization systems in hospitals, food-processing plants, and other settings where cleanliness of surfaces is important.
|
|
16:30-18:00, Paper TuCT2-CC.5 | Add to My Program |
Investigation on the Multi-Solution Problem of the Kinetostatics of Cable-Driven Continuum Manipulators |
|
Dai, Yicheng | Harbin Institute of Technology (Shenzhen) |
Li, Zuan | Harbin Institute of Technology (Shenzhen) |
Wang, Xin | Harbin Institute of Technology, Shenzhen |
Yuan, Han | Harbin Institute of Technology |
Keywords: Tendon/Wire Mechanism, Kinematics
Abstract: Cable-driven continuum manipulators have gained considerable attention due to their high dexterity and inherent structural compliance, making them a popular research topic. However, previous studies have overlooked the kinetostatics of these manipulators, which can result in multi-solution problem. This issue is critical as having multiple equilibrium states can lead to erroneous estimations of the manipulator's profile. To address this issue, the kinetostatic model is presented and simulations based on both the interval analysis method and the commonly used floating-point optimization algorithm are conducted under same actuating forces and external loads. Results show that there are multiple solutions to the kinetostatics of cable-driven continuum manipulators with constant cross section or variable cross section. This paper fills a gap in the current literature and offers valuable insights for researchers in the field of cable-driven continuum manipulators.
|
|
16:30-18:00, Paper TuCT2-CC.6 | Add to My Program |
Optimization Design Method of Tendon-Sheath Transmission Path under Curvature Constraint |
|
Li, Yanan | Tsinghua University |
Lu, Weining | Department of Automation, Tsinghua University |
Liu, Yu | Harbin Institute of Technology |
Meng, Deshan | Sun Yat-Sen University |
Wang, Xueqian | Center for Artificial Intelligence and Robotics, Graduate School |
Liang, Bin | Center for Artificial Intelligence and Robotics, Graduate School |
Keywords: Tendon/Wire Mechanism, Mechanism Design, Contact Modeling, Optimization and Optimal Control
Abstract: The application requirements of the tendon-sheath mechanism in the field of precision machinery are becoming increasingly extensive. However, the contact friction between the tendon and sheath seriously affects the transmission accuracy. In the case of unavoidable friction, optimizing the tendon transmission path to reduce tension loss and elastic deformation has become an important research direction. In this paper, the influence law of the tendon transmission path on the tension and displacement transmission is obtained using the two parameters related to the curvature of the transmission path: total bending angle and equivalent tendon length. Then, based on the optimal control theory and minimum principle, the different transmission path solutions of the minimum tension loss, the minimum tendon deformation, and the coupling of tension and displacement are obtained; the numerical optimization method verifies the correctness of the proposed theory. Finally, an optimal design of a tendon-constrained synchronous rotation mechanism for the manipulator is carried out, and the linkage performance is greatly improved by optimizing the transmission path.
|
|
16:30-18:00, Paper TuCT2-CC.7 | Add to My Program |
Stability Analysis of Tendon Driven Continuum Robots and Application to Active Softening |
|
Peyron, Quentin | Inria and CRIStAL UMR CNRS 9189, University of Lille |
Burgner-Kahrs, Jessica | University of Toronto |
Keywords: Tendon/Wire Mechanism, Modeling, Control, and Learning for Soft Robots, Kinematics, Elastic Stability of Continuum Robots
Abstract: Tendon driven continuum robots are often considered to navigate through- and operate in cluttered environments. While their compliance allows them to conform safely to obstacles, it leads them also to buckle under tendon actuation. In this work, we perform for the first time an extensive elastic stability analysis of these robots for arbitrary planar designs. The buckling phenomena are investigated and analyzed using bifurcation diagrams, complementing the current state of the art and adding new knowledge about robots composed of n spacer disks. We show the existence of multiple robot configurations with different shapes, achievable with the same actuation inputs. A global stability criterion is also established which links the critical tendon force, until which the robot is stable, to the design parameters. Finally, the buckling phenomena are used to actively soften the robot for a better compromise between compliance and payload. An open loop control strategy is proposed, which can theoretically decrease the stiffness to zero while maintaining the same robot shape. Experimentally, the robot is made 4 times more compliant than it is nominally using tendon actuation only.
|
|
16:30-18:00, Paper TuCT2-CC.8 | Add to My Program |
Elasto-Static Modelling and Identification of a Deployable Cable-Driven Parallel Robot with Compliant Masts |
|
Zake, Zane | IRT Jules Verne |
Caro, Stéphane | CNRS/LS2N |
Keywords: Tendon/Wire Mechanism, Parallel Robots, Compliant Joints and Mechanisms
Abstract: Some cable-driven parallel robots (CDPRs) can be rapidly deployed on-site. To achieve such deployability, the fixed frame is usually substituted by four masts. However, not having any rigid fixture between the masts reduces the overall stiffness of the CDPR. This paper introduces a CDPR called Rocaspect, that has four compliant masts. The robot behavior and accuracy is evaluated experimentally and three different mast models are proposed.
|
|
16:30-18:00, Paper TuCT2-CC.9 | Add to My Program |
Torque Transmission in Double-Tendon Sheath Driven Actuators for Application in Exoskeletons |
|
Pérez-Suay, Daniel | Technical University of Munich |
Li, Yu | Technical University of Munich |
Sadeghian, Hamid | Technical University of Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Tendon/Wire Mechanism, Prosthetics and Exoskeletons, Actuation and Joint Mechanisms
Abstract: Bowden cables serve as essential components in various mechanical systems, facilitating power transmission from remote actuators to specific destinations. The pretension of Bowden cables profoundly influences system performance, notably in terms of friction. This study investigates the effects of cable pretension and shape on friction and torque efficiency. A custom self-designed testbed, comprising integrated actuator units, pulleys, and a novel pretension mechanism connected by Bowden cables, is utilized to conduct experimental tests under varying parameters. This work adopts an integrated approach of experimentation, modeling, and validation, offering preliminary insights into the torque transmission characteristics of tendon driven actuator systems. Additionally, the precise model exhibits excellent conformity across a broad range of shapes and provides initial insights into hysteresis modeling attributable to cable material properties.
|
|
TuCT3-CC Oral Session, CC-313 |
Add to My Program |
Big Data in Robotics and Automation |
|
|
Chair: Bohg, Jeannette | Stanford University |
Co-Chair: Johns, Edward | Imperial College London |
|
16:30-18:00, Paper TuCT3-CC.1 | Add to My Program |
OpenBot-Fleet: A System for Collective Learning with Real Robots |
|
Müller, Matthias | Intel Labs |
Brahmbhatt, Samarth Manoj | Intel Corporation |
Deka, Ankur | Intel Labs |
Leboutet, Quentin | Intel Labs |
Hafner, David | Intel Labs |
Koltun, Vladlen | Intel Labs |
Keywords: Big Data in Robotics and Automation, Machine Learning for Robot Control, Deep Learning for Visual Perception
Abstract: We introduce OpenBot-Fleet, a comprehensive open-source cloud robotics system for navigation. OpenBot-Fleet uses smartphones for sensing, local compute and communication, Google Firebase for secure cloud storage and off-board compute, and a robust yet low-cost wheeled robot to act in real-world environments. The robots collect task data and upload it to the cloud where navigation policies can be learned either offline or online and can then be sent back to the robot fleet. In our experiments we distribute 72 robots to a crowd of workers who operate them in homes, and show that OpenBot-Fleet can learn robust navigation policies that generalize to unseen homes with >80% success rate. OpenBot-Fleet represents a significant step forward in cloud robotics, making it possible to deploy large continually learning robot fleets in a cost-effective and scalable manner. All materials can be found at https://www.openbot.org/
|
|
16:30-18:00, Paper TuCT3-CC.2 | Add to My Program |
WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting |
|
Chen, Kan | Waymo LLC |
Ge, Runzhou | Waymo LLC |
Qiu, Hang | University of California, Riverside |
Al-Rfou, Rami | Waymo |
Qi, Charles Ruizhongtai | Waymo |
Zhou, Xuanyu | Waymo |
Yang, Zoey Zeyu | Waymo |
Ettinger, Scott | Waymo |
Sun, Pei | Waymo |
Leng, Zhaoqi | Waymo LLC |
Baniodeh, Mustafa | Waymo LLC |
Bogun, Ivan | Cruise LLC |
Wang, Weiyue | University of Southern California |
Tan, Mingxing | Waymo Research |
Anguelov, Dragomir | Waymo |
Keywords: Big Data in Robotics and Automation, Data Sets for Robot Learning, Motion and Path Planning
Abstract: Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the human-designed explicit interfaces between perception and motion forecasting typically pass only a subset of the semantic information present in the original sensory input. To study the effect of these modular approaches, design new paradigms that mitigate these limitations, and accelerate the development of end-to-end motion forecasting models, we augment the Waymo Open Motion Dataset (WOMD) with large-scale, high-quality, diverse LiDAR data for the motion forecasting task. The new augmented dataset WOMD-LiDAR consists of over 100,000 scenes that each spans 20 seconds, consisting of well-synchronized and calibrated high quality LiDAR point clouds captured across a range of urban and suburban geographies (https://waymo.com/open/data/motion/). Compared to Waymo Open Dataset (WOD), WOMD-LiDAR dataset contains 100x more scenes. Furthermore, we integrate the LiDAR data into the motion forecasting model training and provide a strong baseline. Experiments show that the LiDAR data brings improvement in the motion forecasting task. We hope that WOMD-LiDAR will provide new opportunities for boosting end-to-end motion forecasting models.
|
|
16:30-18:00, Paper TuCT3-CC.3 | Add to My Program |
Increasing the Absolute Position Accuracy of Industrial Robots by Means of a Deep Continual Evidential Regression Model |
|
Uhlmann, Eckart | TU Berlin, Institute for Machine Tools and Factory Management |
Polte, Mitchel | TU Berlin, Institute for Machine Tools and Factory Management |
Blumberg, Julian | TU Berlin, Institute for Machine Tools and Factory Management |
Yin, Sheng | TU Berlin, Institute for Machine Tools and Factory Management |
Wang, Gang | Chongqing University |
Keywords: Deep Learning Methods, Continual Learning, Industrial Robots
Abstract: The use of industrial robots represents a key technology for increasing productivity and efficiency in manufacturing. However, their low absolute position accuracy still denies the broad substitution of machine tools by industrial robots. In this paper, a data-driven method for accuracy enhancement of industrial robots under consideration of kinematic, elastic, and thermal effects is presented. A continual learning algorithm is proposed, which allows to train the model in a process-parallel manner without suffering from catastrophic forgetting. Furthermore, the model is able to determine confidence intervals of the prediction values and thus supports further processing in safety-relevant applications. The effectiveness of the model can be demonstrated using a large data stream with about 3,000 real data points. As a result, it can be shown that the absolute position accuracy of the industrial robot can be improved by 96 % with the proposed method.
|
|
16:30-18:00, Paper TuCT3-CC.4 | Add to My Program |
SpawnNet: Learning Generalizable Visuomotor Skills from Pre-Trained Network |
|
Lin, Xingyu | UC Berkeley |
So, John Ian | Stanford University |
Mahalingam, Sashwat | University of California, Berkeley |
Liu, Fangchen | University of California, Berkeley |
Abbeel, Pieter | UC Berkeley |
Keywords: Sensorimotor Learning, Representation Learning, Big Data in Robotics and Automation
Abstract: The existing internet-scale image and video datasets cover a wide range of everyday objects and tasks, bringing the potential of learning policies that generalize in diverse scenarios. Prior works have explored visual pre-training with different self-supervised objectives. Still, the generalization capabilities of the learned policies and the advantages over well-tuned baselines remain unclear from prior studies. In this work, we present a focused study of the generalization capabilities of the pre-trained visual representations at the categorical level. We identify the key bottleneck in using a frozen pre-trained visual backbone for policy learning and then propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy. Through extensive simulated and real experiments, we show significantly better categorical generalization compared to prior approaches in imitation learning settings. Open-sourced code and videos can be found on our website: https://xingyu-lin.github.io/spawnnet/.
|
|
16:30-18:00, Paper TuCT3-CC.5 | Add to My Program |
RoboAgent: Generalization and Efficiency in Robot Manipulation Via Semantic Augmentations and Action Chunking |
|
Bharadhwaj, Homanga | Carnegie Mellon University |
Vakil, Jay | Meta |
Sharma, Mohit | Carnegie Mellon University |
Gupta, Abhinav | Carnegie Mellon University |
Tulsiani, Shubham | Carnegie Mellon University |
Kumar, Vikash | Meta AI |
Keywords: Big Data in Robotics and Automation, Imitation Learning, Data Sets for Robot Learning
Abstract: The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets is strenuous due to manual efforts, operational costs, and safety challenges. A path toward such a universal agent requires an efficient framework capable of generalization but within a reasonable data budget. In this paper, we develop an efficient framework (MT-ACT) for training universal agents capable of multi-task manipulation skills using (a) semantic augmentations that can rapidly multiply existing datasets and (b) action representations that can extract performant policies with small yet diverse multi-modal datasets without overfitting. In addition, reliable task conditioning and an expressive policy architecture enables our agent to exhibit a diverse repertoire of skills in novel situations specified using task commands. Using merely 7500 demonstrations, we are able to train a single policy, RoboAgent capable of 12 unique skills, and demonstrate its generalization over 38 tasks spread across common daily activities in diverse kitchen scenes. On average, MT-ACT outperforms prior methods by over 40% in unseen situations while being more sample efficient. See https://robopen.github.io/ for video results and appendix.
|
|
16:30-18:00, Paper TuCT3-CC.6 | Add to My Program |
Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models |
|
Kapelyukh, Ivan | Imperial College London |
Ren, Yifei | Imperial College London |
Alzugaray, Ignacio | Imperial College London |
Johns, Edward | Imperial College London |
Keywords: Big Data in Robotics and Automation, Perception for Grasping and Manipulation, Semantic Scene Understanding
Abstract: We introduce Dream2Real, a robotics framework which integrates vision-language models (VLMs) trained on 2D data into a 3D object rearrangement pipeline. This is achieved by the robot autonomously constructing a 3D representation of the scene, where objects can be rearranged virtually and an image of the resulting arrangement rendered. These renders are evaluated by a VLM, so that the arrangement which best satisfies the user instruction is selected and recreated in the real world with pick-and-place. This enables language-conditioned rearrangement to be performed zero-shot, without needing to collect a training dataset of example arrangements. Results on a series of real-world tasks show that this framework is robust to distractors, controllable by language, capable of understanding complex multi-object relations, and readily applicable to both tabletop and 6-DoF rearrangement tasks.
|
|
16:30-18:00, Paper TuCT3-CC.7 | Add to My Program |
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning |
|
Yang, Jingyun | Stanford University |
Sobol Mark, Max | Stanford University |
Vu, Brandon | Stanford University |
Sharma, Archit | Stanford University |
Bohg, Jeannette | Stanford University |
Finn, Chelsea | Stanford University |
Keywords: Reinforcement Learning, Continual Learning, Big Data in Robotics and Automation
Abstract: The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Internet. However, reinforcement learning often requires significant human effort in the form of manual reward specification or environment resets, even if the policy is pre-trained. We introduce RoboFuME, a reset-free fine-tuning system that pre-trains a multi-task manipulation policy from diverse datasets of prior experiences and self-improves online to learn a target task with minimal human intervention. Our insights are to utilize calibrated offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy in the presence of distribution shifts and leverage pre-trained vision language models (VLMs) to build a robust reward classifier for autonomously providing reward signals during the online fine-tuning process. In a diverse set of five real robot manipulation tasks, we show that our method can incorporate data from an existing robot dataset collected at a different institution and improve on a target task within as little as 3 hours of autonomous real-world experience. We also demonstrate in simulation experiments that our method outperforms prior works that use different RL algorithms or different approaches for predicting rewards. Project website: https://robofume.github.io
|
|
16:30-18:00, Paper TuCT3-CC.8 | Add to My Program |
Scaling Motion Forecasting Models with Ensemble Distillation |
|
Ettinger, Scott | Waymo |
Goel, Kratarth | Waymo |
Srivastava, Avikalp | Waymo |
Al-Rfou, Rami | Waymo |
Keywords: Deep Learning Methods, Autonomous Agents, Big Data in Robotics and Automation
Abstract: Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting systems subject to limited compute budgets by combining model ensemble and distillation techniques. The use of ensembles of deep neural networks has been shown to improve generalization accuracy in many application domains. We first demonstrate significant performance gains by creating a large ensemble of optimized single models. We then develop a generalized framework to distill motion forecasting model ensembles into small student models which retain high performance with a fraction of the computing cost. For this study we focus on the task of motion forecasting using real world data from autonomous driving systems. We develop ensemble models that are very competitive on the Waymo Open Motion Dataset (WOMD) and Argoverse leaderboards. From these ensembles, we train distilled student models which have high performance at a fraction of the compute costs. These experiments demonstrate distillation from ensembles as an effective method for improving accuracy of predictive models for robotic systems with limited compute budgets.
|
|
16:30-18:00, Paper TuCT3-CC.9 | Add to My Program |
Is It a Bug? Understanding Physical Unit Mismatches in Robot Software |
|
Canelas, Paulo | Carnegie Mellon University |
Tabor, Trenton | Carnegie Mellon University |
Ore, John-Paul | North Carolina State University |
Fonseca, Alcides | LASIGE, Faculdade De Ciências Da Universidade De Lisboa |
Le Goues, Claire | Carnegie Mellon University |
Timperley, Christopher Steven | Carnegie Mellon University |
Keywords: Software Tools for Robot Programming
Abstract: Robot software is abundant with variables that represent real-world physical units (e.g., meters, seconds). Operations over different units (e.g., adding meters and seconds) may be incorrect and can lead to dangerous system misbehaviors; manually detecting such mistakes is challenging. Current software analysis techniques identify such mismatches using dimensional analysis rules and ROS-specific assumptions to analyze the source code. However, these are ignorant of the fact that physical unit mismatches in robotics code are often intentional (e.g., when operating a differential drive robot), resulting in false positive bug reports that can impede robotics developer trust and productivity. In this work, we study how developers introduce physical unit mismatches by manually inspecting 180 errors detected by the software analysis technique, Phys. We identify three types of physical unit mismatches and present a taxonomy of eight high-level categories of how these errors manifest. We find that developers often make unforced and paradigmatic physical unit mismatches through differential drives, small angle approximations, and controls. We draw insights on current development to inform future research to better detect, categorize, and address meaningful physical unit mismatches.
|
|
TuCT4-CC Oral Session, CC-315 |
Add to My Program |
Multi-Robot Systems III |
|
|
Chair: Amato, Nancy | University of Illinois |
Co-Chair: Nikolakopoulos, George | Luleå University of Technology |
|
16:30-18:00, Paper TuCT4-CC.1 | Add to My Program |
Behavior Tree Capabilities for Dynamic Multi-Robot Task Allocation with Heterogeneous Robot Teams |
|
Heppner, Georg | FZI Forschungszentrum Informatik |
Oberacker, David | FZI Forschungszentrum Informatik |
Roennau, Arne | FZI Forschungszentrum Informatik, Karlsruhe |
Dillmann, Rüdiger | FZI - Forschungszentrum Informatik - Karlsruhe |
Keywords: Multi-Robot Systems, Control Architectures and Programming, Behavior-Based Systems
Abstract: While individual robots are becoming increasingly capable, the complexity of expected missions increases exponentially in comparison. To cope with this complexity, heterogeneous teams of robots have become a significant research interest in recent years. Making effective use of the robots and their unique skills in a team is challenging. Dynamic runtime conditions often make static task allocations infeasible, requiring a dynamic, capability-aware allocation of tasks to team members. To this end, we propose and implement a system that allows a user to specify missions using Behavior (BTs), which can then, at runtime, be dynamically allocated to the current robot team. The system allows to statically model an individual robot's capabilities within our ros_bt_py BT framework. It offers a runtime auction system to dynamically allocate tasks to the most capable robot in the current team. The system leverages utility values and pre-conditions to ensure that the allocation improves the overall mission execution quality while preventing faulty assignments. To evaluate the system, we simulated a find-and-decontaminate mission with a team of three heterogeneous robots and analyzed the utilization and overall mission times as metrics. Our results show that our system can improve the overall effectiveness of a team while allowing for intuitive mission specification and flexibility in the team composition.
|
|
16:30-18:00, Paper TuCT4-CC.2 | Add to My Program |
Multi-Robot Cooperative Navigation in Crowds: A Game-Theoretic Learning-Based Model Predictive Control Approach |
|
Le, Viet-Anh | University of Delaware |
Tadiparthi, Vaishnav | Honda Research Institute |
Chalaki, Behdad | Honda Research Institute USA, Inc |
Nourkhiz Mahjoub, Hossein | Honda Research Institute US |
D'sa, Jovin | Honda Research Institute, USA |
Moradi-Pari, Ehsan | Honda Research Institute |
Malikopoulos, Andreas | Cornell University |
Keywords: Multi-Robot Systems, Human-Aware Motion Planning, Social HRI
Abstract: In this paper, we develop a control framework for the coordination of multiple robots as they navigate through crowded environments. Our framework comprises of a local model predictive control (MPC) for each robot and a social long short-term memory model that forecasts pedestrians' trajectories. We formulate the local MPC formulation for each individual robot that includes both individual and shared objectives, in which the latter encourages the emergence of coordination among robots. Next, we consider the multi-robot navigation and human-robot interaction, respectively, as a potential game and a two-player game, then employ an iterative best response approach to solve the resulting optimization problems in a centralized and distributed fashion. Finally, we demonstrate the effectiveness of coordination among robots in simulated crowd navigation.
|
|
16:30-18:00, Paper TuCT4-CC.3 | Add to My Program |
Multi-Robot Human-In-The-Loop Control under Spatiotemporal Specifications |
|
Zhang, Yixiao | KTH Royal Institute of Technology |
Nan Fernandez-Ayala, Victor | KTH Royal Institute of Technology |
Dimarogonas, Dimos V. | KTH Royal Institute of Technology |
Keywords: Multi-Robot Systems, Human-Robot Collaboration, Robotics and Automation in Agriculture and Forestry
Abstract: In this work, we present a coordination strategy tailored for scenarios involving multiple agents and tasks. We devise a range of tasks using signal temporal logic (STL), each earmarked for specific agents. These tasks are then imposed through control barrier function (CBF) constraints to ensure completion. To extend existing methodologies, our framework adeptly manages interactions among multiple agents. This extension is facilitated by leveraging nonlinear model predictive control (NMPC) to compute trajectories that avoid collisions. An integral aspect of our approach is the integration of a human-in-the-loop (HIL) model. This model enables real-time integration of human directives into the coordination process. A novel task allocation protocol is embedded within the framework to guide this process. We substantiate our methodology through a series of experiments, which corroborate the viability and relevance of our algorithms.
|
|
16:30-18:00, Paper TuCT4-CC.4 | Add to My Program |
Hypergraph-Based Multi-Robot Task and Motion Planning |
|
Motes, James | University of Illinois Urbana-Champaign |
Chen, Tan | Michigan Technological University |
Bretl, Timothy | University of Illinois at Urbana-Champaign |
Morales, Marco | University of Illinois at Urbana-Champaign & Instituto Tecnológ |
Amato, Nancy | University of Illinois |
Keywords: Multi-Robot Systems, Task Planning, Motion and Path Planning, Cooperating Robots
Abstract: We present a multi-robot task and motion planning method that, when applied to the rearrangement of objects by manipulators, results in solution times up to three orders of magnitude faster than existing methods and successfully plans for problems with up to twenty objects, more than three times as many objects as comparable methods. We achieve this improvement by decomposing the planning space to consider manipulators alone, objects, and manipulators holding objects. We represent this decomposition with a hypergraph where vertices are decomposed elements of the planning spaces and hyperarcs are transitions between elements. Existing methods use based representations where vertices are full composite spaces and edges are transitions between these. Using the hypergraph reduces the representation size of the planning space-for multi-manipulator object rearrangement, the number of hypergraph vertices scales linearly with the number of either robots or objects, while the number of hyperarcs scales quadratically with the number of robots and linearly with the number of objects. In contrast, the number of vertices and edges in graph representations scales exponentially with either.
|
|
16:30-18:00, Paper TuCT4-CC.5 | Add to My Program |
Measurement-Limited Multi-Agent, Relative Pose Estimation for On-Orbit Inspection |
|
Mercier, Mark | Air Force Institute of Technology |
Curtis, David | Air Force Institute of Technology |
Taylor, Clark | Air Force Institute of Technology |
Keywords: Space Robotics and Automation, Multi-Robot Systems, Autonomous Vehicle Navigation
Abstract: Relative navigation methods are a critical enabling technology for the next generation of autonomous spacecraft conducting close proximity operations. This is especially true for multi-agent inspection operations in which safety including intra-agent or agent-target collisions are a serious concern. Additionally, in an on-orbit servicing operation various failure modes of the target may result in unreliable a-priori knowledge or cooperation from the target. The main contribution of this work is the demonstration of a method for multi-agent, relative pose estimation that is robust to A) sensor blinding and B) dynamic uncertainty. This objective is accomplished leveraging GTSAM, an existing toolbox for the formulation of factor graphs, along with an algorithm for the efficient, real-time solution of such factor graphs, iSAM2. This estimation method is demonstrated in an example scenario with uncertain dynamics and sensor blinding due to sun position. Results revealed that the iSAM2-based method is capable of handling sensor blinding through leveraging an inter-agent range measurement, despite a dynamically uncertain environment.
|
|
16:30-18:00, Paper TuCT4-CC.6 | Add to My Program |
Dynamic Targeting of Satellite Observations Incorporating Slewing Costs and Complex Observation Utility |
|
Kangaslahti, Akseli | University of Michigan |
Candela, Alberto | NASA Jet Propulsion Laboratory, Caltech |
Swope, Jason | Jet Propulsion Laboratory, California Institute of Technology |
Yue, Qing | Jet Propulsion Laboratory, California Institute of Technology |
Chien, Steve | Jet Propulsion Laboratory |
Keywords: Space Robotics and Automation, Planning, Scheduling and Coordination
Abstract: Maximizing the utility of limited Earth observing satellite resources is a difficult ongoing problem. Dynamic Targeting is an approach to this challenge that intelligently plans and executes primary sensor observations based on information from a lookahead sensor. However, current implementations have failed to account for realistic satellite operational constraints and have used static utility for repeat observations of the same target. To address these limitations, we implement a more general Dynamic Targeting framework that comprises a physics-based slew model, a dynamic model of observation utility, and an algorithm for gathering high-utility observations. To demonstrate this framework, we also supply complex dynamic utility models that are applicable to many missions and new algorithms for intelligently scheduling observations with slewing restrictions and changing utility, including a greedy algorithm and a depth-first search algorithm. To evaluate these algorithms, we test their performance across simulated runs through two datasets and compare to the performance of an algorithm representative of most scheduling algorithms aboard Earth science missions today as well as an intractable upper bound. We show that our algorithms have great potential to improve science return from Earth science missions.
|
|
16:30-18:00, Paper TuCT4-CC.7 | Add to My Program |
RecNet: An Invertible Point Cloud Encoding through Range Image Embeddings for Multi-Robot Map Sharing and Reconstruction |
|
Stathoulopoulos, Nikolaos | Luleå University of Technology |
Valdes Saucedo, Mario Alberto | Lulea University of Technology |
Koval, Anton | Luleå University of Technology |
Nikolakopoulos, George | Luleå University of Technology |
Keywords: Multi-Robot Systems, AI-Based Methods, Localization
Abstract: In the field of resource-constrained robots and the need for effective place recognition in multi-robotic systems, this article introduces RecNet, a novel approach that concurrently addresses both challenges. The core of RecNet's methodology involves a transformative process: it projects 3D point clouds into range images, compresses them using an encoder-decoder framework, and subsequently reconstructs the range image, restoring the original point cloud. Additionally, RecNet utilizes the latent vector extracted from this process for efficient place recognition tasks. This approach not only achieves comparable place recognition results but also maintains a compact representation, suitable for sharing among robots to reconstruct their collective maps. The evaluation of RecNet encompasses an array of metrics, including place recognition performance, the structural similarity of the reconstructed point clouds, and the bandwidth transmission advantages, derived from sharing only the latent vectors. Our proposed approach is assessed using both a publicly available dataset and field experiments 1, confirming its efficacy and potential for real-world applications.
|
|
TuCT5-CC Oral Session, CC-411 |
Add to My Program |
Visual Tracking |
|
|
Chair: Kheddar, Abderrahmane | CNRS-AIST |
Co-Chair: Pathak, Sarthak | Chuo University |
|
16:30-18:00, Paper TuCT5-CC.1 | Add to My Program |
Object Permanence Filter for Robust Tracking with Interactive Robots |
|
Peng, Shaoting | University of Pennsylvania |
Wang, Margaret | Massachusetts Institute of Technology |
Shah, Julie A. | MIT |
Figueroa, Nadia | University of Pennsylvania |
Keywords: Visual Tracking, Cognitive Control Architectures, Visual Servoing
Abstract: Object permanence, which refers to the concept that objects continue to exist even when they are no longer perceivable through the senses, is a crucial aspect of human cognitive development. In this work, we seek to incorporate this understanding into interactive robots by proposing a set of assumptions and rules to represent object permanence in multi-object, multi-agent interactive scenarios. We integrate these rules into the particle filter, resulting in the Object Permanence Filter (OPF). For multi-object scenarios, we propose an ensemble of K interconnected OPFs, where each filter predicts plausible object tracks that are resilient to missing, noisy, and kinematically or dynamically infeasible measurements. Through several interactive scenarios, we demonstrate that the proposed OPF approach provides robust tracking in human-robot interactive tasks agnostic to measurement type, even in the presence of prolonged and complete occlusion. Project webpage: https://opfilter.github.io/
|
|
16:30-18:00, Paper TuCT5-CC.2 | Add to My Program |
Zero-Shot Open-Vocabulary Tracking with Large-Scale Pre-Trained Models |
|
Chu, Wen-Hsuan | Carnegie Mellon University |
Harley, Adam | Stanford University |
Tokmakov, Pavel | CMU |
Dave, Achal | Toyota Research Institute |
Guibas, Leonidas | Stanford University |
Fragkiadaki, Aikaterini | Carnegie Mellon University |
Keywords: Visual Tracking, Deep Learning for Visual Perception
Abstract: Object tracking is central to robot perception and scene understanding, allowing robots to parse a video stream in terms of moving objects with names. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static images in the wild. This raises the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking? In this paper, we combine an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments any object in 2D videos. Given a monocular video input, our method predicts object and part mask tracks with associated language descriptions, rebuilding the pipeline of Tractor with modern large pre-trained models for static image detection and segmentation: we detect open-vocabulary object instances and propagate their boxes from frame to frame using a flow-based motion model, refine the propagated boxes with the box regression module of the visual detector, and prompt an open-world segmenter with the refined box to segment the objects. We decide the termination of an object track based on the objectness score of the propagated boxes as well as forward-backward optical flow consistency. We re-identify objects across occlusions using deep feature matching. We show that our model achieves strong performance on multiple established benchmarks, and can produce reasonable tracks in manipulation data. In particular, our model outperforms previous state-of-the-art in UVO and BURST, benchmarks for open-world object tracking and segmentation, despite never being explicitly trained for tracking. We hope that our approach can serve as a simple and extensible framework for future research and enable imitation learning from videos with unconventional objects.
|
|
16:30-18:00, Paper TuCT5-CC.3 | Add to My Program |
Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking |
|
Feng, Shihao | Zhengzhou University |
Liang, Pengpeng | Zhengzhou University |
Gao, Jin | Institute of Automation Chinese Academy of Sciences |
Cheng, Erkang | Nullmax Inc |
Keywords: Visual Tracking, Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: Point cloud-based 3D object tracking is an important task in autonomous driving. Though great advances regarding Siamese-based 3D tracking have been made recently, it remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. Instead of performing correlation of the two branches at just one point in the network, in this paper, we present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage based on sparse pillars. More specifically, in each stage, self-attention is first applied to each branch separately to capture the non-local context information. Then, cross-attention is used to inject the template information into the search area. This strategy allows the feature learning of the search area to be aware of the template while keeping the individual characteristics of the template intact. To enable the network to easily preserve the information learned at different stages and ease the optimization, for the search area, we densely connect the initial input sparse pillars and the output of each stage to all subsequent stages and the target localization network, which converts pillars to bird’s eye view (BEV) feature maps and predicts the state of the target with a small densely connected convolution network. Deep supervision is added to each stage to further boost the performance as well.
|
|
16:30-18:00, Paper TuCT5-CC.4 | Add to My Program |
Refining Pre-Trained Motion Models |
|
Sun, Xinglong | Stanford & UIUC |
Harley, Adam | Stanford University |
Guibas, Leonidas | Stanford University |
Keywords: Visual Tracking, Deep Learning for Visual Perception, Human Detection and Tracking
Abstract: Given the difficulty of manually annotating motion in video, the current best motion estimation methods are trained with synthetic data, and therefore struggle somewhat due to a train/test gap. Self-supervised methods hold the promise of training directly on real video, but typically perform worse. These include methods trained with warp error (i.e., color constancy) combined with smoothness terms, and methods that encourage cycle-consistency in the estimates (i.e., tracking backwards should yield the opposite trajectory as tracking forwards). In this work, we take on the challenge of improving state-of-the-art supervised models with self-supervised training. We find that when the initialization is supervised weights, most existing self-supervision techniques actually make performance worse instead of better, which suggests that the benefit of seeing the new data is overshadowed by the noise in the training signal. Focusing on obtaining a "clean" training signal from real-world unlabelled video, we propose to separate label-making and training into two distinct stages. In the first stage, we use the pre-trained model to estimate motion in a video, and then select the subset of motion estimates which we can verify with cycle-consistency. This produces a sparse but accurate pseudo-labelling of the video. In the second stage, we fine-tune the model to reproduce these outputs, while also applying augmentations on the input. We complement this boot-strapping method with simple techniques that densify and re-balance the pseudo-labels, ensuring that we do not merely train on "easy" tracks. We show that our method yields reliable gains over fully-supervised methods in real videos, for both short-term (flow-based) and long-range (multi-frame) pixel tracking. Our code can be found here: https://github.com/AlexSunNik/refining-motion-code.
|
|
16:30-18:00, Paper TuCT5-CC.5 | Add to My Program |
SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking |
|
Papais, Sandro | University of Toronto |
Ren, Robert | University of Toronto |
Waslander, Steven Lake | University of Toronto |
Keywords: Visual Tracking, Intelligent Transportation Systems, Probability and Statistical Methods
Abstract: Modern robotic systems are required to operate in dense dynamic environments, requiring highly accurate real-time track identification and estimation. For 3D multi-object tracking, recent approaches process a single measurement frame recursively with greedy association and are prone to errors in ambiguous association decisions. Our method, Sliding Window Tracker (SWTrack), yields more accurate association and state estimation by batch processing many frames of sensor data while being capable of running online in real-time. The most probable track associations are identified by evaluating all possible track hypotheses across the temporal sliding window. A novel graph optimization approach is formulated to solve the multidimensional assignment problem with lifted graph edges introduced to account for missed detections and graph sparsity enforced to retain real-time efficiency. We evaluate our SWTrack implementation on the NuScenes autonomous driving dataset to demonstrate improved tracking performance.
|
|
16:30-18:00, Paper TuCT5-CC.6 | Add to My Program |
UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking |
|
Lee, Chang Won | University of Toronto |
Waslander, Steven Lake | University of Toronto |
Keywords: Visual Tracking, Object Detection, Segmentation and Categorization, Intelligent Transportation Systems
Abstract: Multi-object tracking (MOT) methods have seen a significant boost in performance recently, due to strong interest from the research community and steadily improving object detection methods. The majority of tracking methods, which follow the tracking-by-detection (TBD) paradigm, blindly trust the incoming detections with no sense of their associated localization uncertainty. This lack of uncertainty awareness poses a problem in safety-critical tasks such as autonomous driving where passengers could be put at risk due to erroneous detections that have propagated to downstream tasks, including MOT. While there are existing works in probabilistic object detection that predict the localization uncertainty around the boxes, no work in 2D MOT for autonomous driving has studied whether these estimates are meaningful enough to be leveraged effectively in object tracking. We introduce UncertaintyTrack, a collection of extensions that can be applied to multiple TBD trackers to account for localization uncertainty estimates from probabilistic object detectors. Experiments on the Berkeley Deep Drive MOT dataset show that the combination of our method and informative uncertainty estimates reduces the number of ID switches by around 19% and improves mMOTA by 2-3%.
|
|
16:30-18:00, Paper TuCT5-CC.7 | Add to My Program |
Humanoid Loco-Manipulations Using Combined Fast Dense 3D Tracking and SLAM with Wide-Angle Depth-Images (I) |
|
Chappellet, Kevin | CNRS |
Murooka, Masaki | AIST |
Caron, Guillaume | CNRS |
Kanehiro, Fumio | National Inst. of AIST |
Kheddar, Abderrahmane | CNRS-AIST |
Keywords: Visual Tracking, Perception for Grasping and Manipulation, Humanoid Robot Systems
Abstract: To efficiently achieve complex humanoid loco-manipulation tasks in industrial contexts, we propose a combined vision-based tracker-localization interplay integrated as part of a task-space whole-body optimization control. To achieve good perception complementarity between manipulation and localization, a new fast dense 3D model-based tracking using wide-angle depth image is developed and used in conjunction with a simultaneous localization and mapping software. Our approach allows humanoid robots, targeted for industrial manufacturing, to manipulate and assemble large-scale objects while walking. It is assessed with experiments consisting in rolling and assembling in an unwinder a heavy and wide bobbin using bimanual grasping and bipedal locomotion at a time. This experimental use-case is found in some large-scale manufacturing where bobbins are enrolled with various materials (cables, papers, rubbers, etc.). The same experiments are made using two different humanoid robots of the same family.
|
|
16:30-18:00, Paper TuCT5-CC.8 | Add to My Program |
LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking |
|
Wei, Qingmao | Guandong University of Technology |
Zeng, Bi | Guangdong University of Technology |
Liu, Jianqi | Guangdong University of Technology |
He, Li | Southern University of Science and Technology |
Zeng, Guotian | Guangdong University of Technology |
Keywords: Visual Tracking, Recognition, Computer Vision for Automation
Abstract: The recent advancements in transformer-based visual trackers have led to significant progress, attributed to their strong modeling capabilities. However, as performance improves, running latency correspondingly increases, presenting a challenge for real-time robotics applications, especially on edge devices with computational constraints. In response to this, we introduce LiteTrack, an efficient transformer-based tracking model optimized for high-speed operations across various devices. It achieves a more favorable trade-off between accuracy and efficiency than the other lightweight trackers. The main innovations of LiteTrack encompass: 1) asynchronous feature extraction and interaction between the template and search region for better feature fushion and cutting redundant computation, and 2) pruning encoder layers from a heavy tracker to refine the balnace between performance and speed. As an example, our fastest variant, LiteTrack-B4, achieves 65.2% AO on the GOT-10k benchmark, surpassing all preceding efficient trackers, while running over 100 fps with ONNX on the Jetson Orin NX edge device. Moreover, our LiteTrack-B9 reaches competitive 72.2% AO on GOT-10k and 82.4% AUC on TrackingNet, and operates at 171 fps on an NVIDIA 2080Ti GPU. The code and demo materials will be available at https://github.com/TsingWei/LiteTrack.
|
|
TuCT6-CC Oral Session, CC-414 |
Add to My Program |
RGB-D Sensing and Perception II |
|
|
Co-Chair: Carlone, Luca | Massachusetts Institute of Technology |
|
16:30-18:00, Paper TuCT6-CC.1 | Add to My Program |
WeatherDepth: Curriculum Contrastive Learning for Self-Supervised Depth Estimation under Adverse Weather Conditions |
|
Wang, JiYuan | Beijing JiaoTong University |
Lin, Chunyu | Beijing Jiaotong University |
Nie, Lang | Beijing Jiaotong University |
Huang, Shujuan | Beijing Jiaotong University |
Pan, Xing | Haomo Zhixing |
Ai, Rui | Haomo AI Technology Co., Ltd |
Zhao, Yao | Beijing Jiaotong University |
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Visual Learning
Abstract: Depth estimation models have shown promising performance on clear scenes but fail to generalize to adverse weather conditions due to illumination variations, weather particles, etc. In this paper, we propose WeatherDepth, a self-supervised robust depth estimation model with curriculum contrastive learning, to tackle performance degradation in complex weather conditions. Concretely, we first present a progressive curriculum learning scheme with three simple-to-complex curricula to gradually adapt the model from clear to relative adverse, and then to adverse weather scenes. It encourages the model to gradually grasp beneficial depth cues against the weather effect, yielding smoother and better domain adaption. Meanwhile, to prevent the model from forgetting previous curricula, we integrate contrastive learning into different curricula. By drawing reference knowledge from the previous course, our strategy establishes a depth consistency constraint between different courses toward robust depth estimation in diverse weather. Besides, to reduce manual intervention and better adapt to different models, we designed an adaptive curriculum scheduler to automatically search for the best timing for course switching. In the experiment, the proposed solution is proven to be easily incorporated into various architectures and demonstrates state-of-the-art (SoTA) performance on both synthetic and real weather datasets. Source code and data are available at url{https://github.com/wangjiyuan9/WeatherDepth}.
|
|
16:30-18:00, Paper TuCT6-CC.2 | Add to My Program |
Collaborative Decision-Making Using Spatiotemporal Graphs in Connected Autonomy |
|
Gao, Peng | University of Massachussets Amherst |
Shen, Yu | University of Maryland |
Lin, Ming C. | University of Maryland at College Park |
Keywords: RGB-D Perception, Multi-Robot Systems
Abstract: Collaborative decision-making is an essential capability for multi-robot systems, such as connected vehicles, to collaboratively control autonomous vehicles in accident-prone scenarios. Under limited communication bandwidth, capturing comprehensive situational awareness by integrating connected agents' observation is very challenging. In this paper, we propose a novel collaborative decision-making method that efficiently and effectively integrates collaborators' representations to control the ego vehicle in accident-prone scenarios. Our approach formulates collaborative decision-making as a classification problem. We first represent sequences of raw observations as spatiotemporal graphs, which significantly reduce the package size to share among connected vehicles. Then we design a novel spatiotemporal graph neural network based on heterogeneous graph learning, which analyzes spatial and temporal connections of objects in a unified way for collaborative decision-making. We evaluate our approach using a high-fidelity simulator that considers realistic traffic, communication bandwidth, and vehicle sensing among connected autonomous vehicles. The experimental results show that our representation achieves over 100x reduction in the shared data size that meets the requirements of communication bandwidth for connected autonomous driving. In addition, our approach achieves over 30% improvements in driving safety.
|
|
16:30-18:00, Paper TuCT6-CC.3 | Add to My Program |
Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds |
|
Jin, David | MIT |
Karmalkar, Sushrut | University of Wisconsin-Madison |
Zhang, Harry Haolun | MIT |
Carlone, Luca | Massachusetts Institute of Technology |
Keywords: RGB-D Perception, Object Detection, Segmentation and Categorization, Sensor Fusion
Abstract: We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D registration where one wants to reconstruct a single pose, e.g., the motion of the sensor picturing a static scene. Moreover, it provides a mathematically grounded formulation for relevant robotics applications, e.g., where a depth sensor onboard a robot perceives a dynamic scene and has the goal of estimating its own motion (from the static portion of the scene) while simultaneously recovering the motion of all dynamic objects. We assume a correspondence-based setup where we have putative matches between the two point clouds and consider the practical case where these correspondences are plagued with outliers. We then propose a simple approach based on Expectation-Maximization (EM) and establish theoretical conditions under which the EM approach converges to the ground truth. We evaluate the approach in simulated and real datasets ranging from table-top scenes to self-driving scenarios and demonstrate its effectiveness when combined with state-of-the-art scene flow methods to establish dense correspondences.
|
|
16:30-18:00, Paper TuCT6-CC.4 | Add to My Program |
A Robust Deformable Linear Object Perception Pipeline in 3D: From Segmentation to Reconstruction |
|
Zhaole, Sun | Tsinghua University, the University of Edinburgh, Intel Lab Chin |
Zhou, Hang | Shanghai University |
Li, Nanbo | University of Edinburgh |
Chen, Longfei | University of Edinburgh |
Zhu, Jihong | University of York |
Fisher, Robert | University of Edinburgh |
Keywords: RGB-D Perception, Perception for Grasping and Manipulation
Abstract: 3D perception of deformable linear objects (DLOs) is crucial for DLO manipulation. However, perceiving DLOs in 3D from a single RGBD image is challenging. Previous DLO perception methods fail to extract a decent 3D DLO model due to different textures, occlusions, sparse and false depth information. To address these problems and provide a more robust DLO state estimation for downstream tasks like tracking and manipulation, this paper proposes a 3D DLO perception pipeline to first segment a DLO in 2D images and post-process masks to eliminate false positive segmentation, reconstruct the DLO in 3D space to predict the occluded part of the DLO, and physically smooth the reconstructed DLO. By testing on a synthetic DLO dataset and further validating on a real-world dataset with seven different DLOs, we demonstrate that the proposed method is an effective and robust 3D perception pipeline solution with better performance on 2D DLO segmentation and 3D DLO reconstruction compared to State-of-the-Art algorithms.
|
|
16:30-18:00, Paper TuCT6-CC.5 | Add to My Program |
Unlocking the Performance of Proximity Sensors by Utilizing Transient Histograms |
|
Sifferman, Carter | University of Wisconsin-Madison |
Wang, Yeping | University of Wisconsin-Madison |
Gupta, Mohit | University of Wisconsin-Madison |
Gleicher, Michael | University of Wisconsin - Madison |
Keywords: RGB-D Perception, Range Sensing
Abstract: We provide methods which recover planar scene geometry by utilizing the transient histograms captured by a class of close-range time-of-flight (ToF) distance sensor. A transient histogram is a one dimensional temporal waveform which encodes the arrival time of photons incident on the ToF sensor. Typically, a sensor processes the transient histogram using a proprietary algorithm to produce distance estimates, which are commonly used in several robotics applications. Our methods utilize the transient histogram directly to enable recovery of planar geometry more accurately than is possible using only proprietary distance estimates, and consistent recovery of the albedo of the planar surface, which is not possible with proprietary distance estimates alone. This is accomplished via a differentiable rendering pipeline, which simulates the transient imaging process, allowing direct optimization of scene geometry to match observations. To validate our methods, we capture 3,800 measurements of eight planar surfaces from a wide range of viewpoints, and show that our method outperforms the proprietary-distance-estimate baseline by an order of magnitude in most scenarios. We demonstrate a simple robotics application which uses our method to sense the distance to and slope of a planar surface from a sensor mounted on the end effector of a robot arm.
|
|
16:30-18:00, Paper TuCT6-CC.6 | Add to My Program |
VG4D: Vision-Language Model Goes 4D Video Recognition |
|
Deng, Zhichao | Sun Yat-Sen University |
Li, Xiangtai | Peking University |
Li, Xia | ETH Zurich |
Tong, Yunhai | Peking University |
Zhao, Shen | Sun Yat-Sen University |
Liu, Mengyuan | Peking University |
Keywords: RGB-D Perception, Recognition, Deep Learning for Visual Perception
Abstract: Understanding the real world through point cloud video is a crucial aspect of robotics and autonomous driving systems. However, prevailing methods for 4D point cloud recognition have limitations due to sensor resolution, which leads to a lack of detailed information. Recent advances have shown that Vision-Language Models (VLM) pre-trained on web-scale text-image datasets can learn fine-grained visual concepts that can be transferred to various downstream tasks. However, effectively integrating VLM into the domain of 4D point clouds remains an unresolved problem. In this work, we propose the Vision-Language Models Goes 4D (VG4D) framework to transfer VLM knowledge from visual-text pre-trained models to a 4D point cloud network. Our approach involves aligning the 4D encoder's representation with a VLM learning a shared visual and text space from training on large-scale image-text pairs. By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance. To enhance the 4D encoder, we modernize the classic dynamic point cloud backbone and propose an improved version of PSTNet, im-PSTNet, which can efficiently model point cloud videos. Experiments demonstrate that our method achieves state-of-the-art performance for action recognition on both NTU RGB+D 60 dataset and NTU RGB+D 120 dataset.
|
|
16:30-18:00, Paper TuCT6-CC.7 | Add to My Program |
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning |
|
Gu, Qiao | University of Toronto |
Kuwajerwala, Alihusein | University of Toronto |
Morin, Sacha | Université De Montréal, Mila |
Jatavallabhula, Krishna Murthy | MIT |
Sen, Bipasha | International Institute of Information Technology |
Agarwal, Aditya | IIIT Hyderabad |
Rivera, Corban | Johns Hopkins University Applied Physics Lab |
Paul, William | Johns Hopkins University Applied Physics Lab |
Ellis, Kirsty | Mila, Université De Montréal |
Chellappa, Rama | Johns Hopkins University |
Gan, Chuang | IBM |
de Melo, Celso | CCDC US Army Research Laboratory |
Tenenbaum, Joshua | Massachusetts Institute of Technology |
Torralba, Antonio | MIT |
Shkurti, Florian | University of Toronto |
Paull, Liam | Université De Montréal |
Keywords: RGB-D Perception, Semantic Scene Understanding, Mapping
Abstract: For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts. To explore the full scope of our experiments and results, we encourage readers to visit our project webpage.
|
|
16:30-18:00, Paper TuCT6-CC.8 | Add to My Program |
TrackDLO: Tracking Deformable Linear Objects under Occlusion with Motion Coherence |
|
Xiang, Jingyi | University of Illinois at Urbana-Champaign |
Dinkel, Holly | University of Illinois at Urbana-Champaign |
Zhao, Harry | University of Illinois at Urbana-Champaign |
Gao, Naixiang | University of Illinois at Urbana-Champaign |
Coltin, Brian | Carnegie Mellon University |
Smith, Trey | NASA Ames Research Center |
Bretl, Timothy | University of Illinois at Urbana-Champaign |
Keywords: RGB-D Perception, Visual Tracking, Perception for Grasping and Manipulation
Abstract: The TrackDLO algorithm estimates the shape of a Deformable Linear Object (DLO) under occlusion from a sequence of RGB-D images. TrackDLO is vision-only and runs in real-time. It requires no external state information from physics modeling, simulation, visual markers, or contact as input. The algorithm improves on previous approaches by addressing three common scenarios which cause tracking failure: tip occlusion, mid-section occlusion, and self-occlusion. This is achieved through the application of Motion Coherence Theory to impute the spatial velocity of occluded nodes, the use of the topological geodesic distance to track self-occluding DLOs, and the introduction of a non-Gaussian kernel that only penalizes lower-order spatial displacement derivatives to reflect DLO physics. Improved real-time DLO tracking under mid-section occlusion, tip occlusion, and self-occlusion is demonstrated experimentally. The source code and demonstration data are publicly released.
|
|
16:30-18:00, Paper TuCT6-CC.9 | Add to My Program |
Long-Tailed 3D Semantic Segmentation with Adaptive Weight Constraint and Sampling |
|
Lahoud, Jean | MBZUAI |
Khan, Fahad | Linkoping University |
Cholakkal, Hisham | MBZUAI |
Anwer, Rao | MBZUAI |
Khan, Salman | CSIRO |
Keywords: RGB-D Perception, Recognition, Deep Learning for Visual Perception
Abstract: Existing 3D understanding datasets typically provide annotations for a limited number of object classes, with sufficient examples per class. However, real-world object classes are not equally represented in practical settings, leading to poor performance on rarely-occurring categories if the class imbalance is neglected. In this work, we address the challenge of 3D semantic segmentation with a long-tail distribution of classes. Common methods to reduce class imbalance during training include data re-sampling, loss re-weighting, and transfer learning. In contrast, our work proposes to effectively utilize network classifier weights in 3D models to balance the training on long-tail class distributions. While previous work in the 2D domain has studied imposing constraints on the classifier weights to regularize the training, it is sensitive to hyper-parameter choices and has not been yet explored for the 3D domain. To address these challenges, our work proposes adaptive regularization for frequent classes and sampling-based regularization for rare classes that alleviate the need to manually select thresholds and can dynamically focus training on the hard classes. Our experiments on the large-scale ScanNet200 benchmark show that our method achieves improved performance, surpassing methods that rely on re-sampling, re-weighting, and pre-training.
|
|
TuCT7-CC Oral Session, CC-416 |
Add to My Program |
Learning in Localization and Navigation |
|
|
Chair: Cattaneo, Daniele | University of Freiburg |
Co-Chair: Valada, Abhinav | University of Freiburg |
|
16:30-18:00, Paper TuCT7-CC.1 | Add to My Program |
BioSLAM: A Bio-Inspired Lifelong Memory System for General Place Recognition |
|
Yin, Peng | City University of Hong Kong |
Abuduweili, Abulikemu | Carnegie Mellon University |
Zhao, Shiqi | University of California San Diego |
Xu, Lingyun | Carnegie Mellon University |
Liu, Changliu | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Lifelong Learning, Localization, Deep Learning in Robotics and Automation, SLAM
Abstract: We present BioSLAM, a lifelong SLAM framework for learning various new appearances incrementally and maintaining accurate place recognition for previously visited areas. Unlike humans, artificial neural networks suffer from catastrophic forgetting and may forget the previously visited areas when trained with new arrivals. For humans, researchers discover that there exists a memory replay mechanism in the brain to keep the neuron active for previous events. Inspired by this discovery, BioSLAM designs a gated generative replay to control the robot’s learning behavior based on the feedback rewards. Specifically, BioSLAM provides a novel dual-memory mechanism for maintenance: 1) a dynamic memory to efficiently learn new observations and 2) a static memory to balance new-old knowledge. When the agent is encountered with different appearances under new domains, the complete processing pipeline can help to incrementally update the place recognition ability, robust to the increasing complexity of long-term place recognition. We demonstrate BioSLAM in three incremental SLAM scenarios: 1) a 120km city-scale trajectories with LiDAR-based inputs, 2) a multi-visited 4.5
|
|
16:30-18:00, Paper TuCT7-CC.2 | Add to My Program |
Efficient Hierarchical Reinforcement Learning for Mapless Navigation with Predictive Neighbouring Space Scoring (I) |
|
Gao, Yan | Cardiff University |
Wu, Jing | Cardiff University |
Yang, Xintong | CARDIFF UNIVERSITY |
Ji, Ze | Cardiff University |
Keywords: Reinforcement Learning, Collision Avoidance, Motion and Path Planning
Abstract: Solving reinforcement learning-based mapless navigation tasks is challenging due to their sparse reward and long decision horizon nature. Hierarchical reinforcement learning has the ability to leverage knowledge at different abstract levels and is thus preferred in complex mapless navigation tasks. However, it is computationally expensive and inefficient to learn navigation end-to-end from raw high-dimensional sensor data, such as Lidar or RGB cameras. The use of subgoals based on a compact intermediate representation is therefore preferred for dimension reduction. This work proposes an efficient HRL-based framework to achieve this with a novel scoring method, named Predictive Neighbouring Space Scoring(PNSS). The PNSS model estimates the explorable space for a given position of interest based on the current robot observation. The PNSS values for a few candidate positions around the robot provide a compact and informative state representation for subgoal selection. We study the effects of different candidate position layouts and demonstrate that our layout design facilitates higher performances in longer-range tasks. Moreover, a penalty term is introduced in the reward function for the high-level policy, so that the subgoal selection process takes the performance of the low-level policy into consideration. Comprehensive evaluations demonstrate that using the proposed PNSS module consistently improves performances over the use of Lidar only or Lidar and encoded RGB features.
|
|
16:30-18:00, Paper TuCT7-CC.3 | Add to My Program |
Learning Diverse Skills for Local Navigation under Multi-Constraint Optimality |
|
Cheng, Jin | ETH Zurich |
Vlastelica, Marin | Max Planck Institute for Intelligent Systems |
Kolev, Pavel | Max Planck Institute for Intelligent Systems |
Li, Chenhao | ETH Zurich |
Martius, Georg | Max Planck Institute for Intelligent Systems |
Keywords: Reinforcement Learning, Legged Robots, Motion and Path Planning
Abstract: Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing multiple constraints on the reward terms. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal.
|
|
16:30-18:00, Paper TuCT7-CC.4 | Add to My Program |
Snake Robot with Tactile Perception Navigates on Large-Scale Challenging Terrain |
|
Jiang, Shuo | Northeastern University |
Salagame, Adarsh | Northeastern University |
Ramezani, Alireza | Northeastern University |
Wong, Lawson L.S. | Northeastern University |
Keywords: Reinforcement Learning, Force and Tactile Sensing, Motion and Path Planning
Abstract: Along with the advancement of robot skin technology, there has been notable progress in the development of snake robots featuring body-surface tactile perception. In this study, we proposed a locomotion control framework for snake robots that integrates tactile perception to augment their adaptability to various terrains. Our approach embraces a hierarchical reinforcement learning (HRL) architecture, wherein the high-level orchestrates global navigation strategies while the low-level uses curriculum learning for local navigation maneuvers. Due to the significant computational demands of collision detection in whole-body tactile sensing, the efficiency of the simulator is severely compromised. Thus a distributed training pattern to mitigate the efficiency reduction was adopted. We evaluated the navigation performance of the snake robot in complex large-scale cave exploration with challenging terrains to exhibit improvements in motion efficiency, evidencing the efficacy of tactile perception in terrain-adaptive locomotion of snake robots.
|
|
16:30-18:00, Paper TuCT7-CC.5 | Add to My Program |
RaLF: Flow-Based Global and Metric Radar Localization in LiDAR Maps |
|
Nayak, Abhijeet | University of Freiburg |
Cattaneo, Daniele | University of Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Deep Learning Methods, Localization, Computer Vision for Transportation
Abstract: Localization is paramount for autoatnomous robots. While camera and LiDAR-based approaches have been exten- sively investigated, they are affected by adverse illumination and weather conditions. Therefore, radar sensors have recently gained attention due to their intrinsic robustness to such conditions. In this paper, we propose RaLF, a novel deep neural network-based approach for localizing radar scans in a LiDAR map of the environment, by jointly learning to address both place recognition and metric localization. RaLF is composed of radar and LiDAR feature encoders, a place recognition head that generates global descriptors, and a metric localization head that predicts the 3-DoF transformation between the radar scan and the map. We tackle the place recognition task by learning a shared embedding space between the two modalities via cross-modal metric learning. Additionally, we perform metric localization by predicting pixel-level flow vectors that align the query radar scan with the LiDAR map. We extensively evaluate our approach on multiple real-world driving datasets and show that RaLF achieves state-of-the-art performance for both place recognition and metric localization. Moreover, we demonstrate that our approach can effectively generalize to different cities and sensor setups than the ones used during training. We make the code and trained models publicly available at http://ralf.cs.uni-freiburg.de.
|
|
16:30-18:00, Paper TuCT7-CC.6 | Add to My Program |
VPE-SLAM: Neural Implicit Voxel-Permutohedral Encoding for SLAM |
|
Zhang, Zhiyao | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Shen, You | Northeastern University |
Rong, Lei | Northeastern University |
Wang, Sizhan | Northeastern University |
Ouyang, Xin | Northeastern University |
Li, Yulong | Northeastern University |
Keywords: Deep Learning Methods, SLAM, Mapping
Abstract: NeRF can reconstruct incredibly realistic environmental maps in dense simultaneous localization and mapping, providing robots with more comprehensive scene map information. However, NeRF often struggles with geometric distortions in indoor reconstructions. To correct geometric distortions, we develop VPE-SLAM, based on the proposed voxel-permutohedral encoding, which can incrementally reconstruct maps of unknown scenes. Specifically, voxel-permutohedral encoding combines a sparse voxel feature grid created by an octree and multi-resolution permutohedral tetrahedral feature grids to represent the scene effectively. Especially when dealing with object edges, our method can effectively encode the geometry and texture of edges by the hybrid structural grid. We propose a novel local bundle adjustment module that utilizes a sliding window mechanism to manage adjacent keyframes requiring optimization. Furthermore, the proposed method establishes local map consistency by repeatedly optimizing keyframes that were initially under-optimized through a compensation strategy. The consistency of the local map can enhance the adaptability of our method to challenging scenes. Extensive experiments demonstrate that our method can achieve accurate camera tracking and produce high-quality reconstruction results on the Replica and ScanNet datasets. The source code will be available at https://github.com/NeuCV-IRMI/VPE-SLAM.
|
|
16:30-18:00, Paper TuCT7-CC.7 | Add to My Program |
Zero-Shot Wireless Indoor Navigation through Physics-Informed Reinforcement Learning |
|
Yin, Mingsheng | New York University |
Li, Tao | New York University |
Lei, Haozhe | New York University |
Hu, Yaqi | NYU |
Rangan, Sundeep | New York University |
Zhu, Quanyan | New York University |
Keywords: Reinforcement Learning, Motion and Path Planning
Abstract: The growing focus on indoor robot navigation utilizing wireless signals has stemmed from the capability of these signals to capture high-resolution angular and temporal measurements. Prior heuristic-based methods, based on radio frequency (RF) propagation, are intuitive and generalizable across simple scenarios, yet fail to navigate in complex environments. On the other hand, end-to-end (e2e) deep reinforcement learning (RL) can explore a rich class of policies, delivering surprising performance when facing complex wireless environments. However, the price to pay is the astronomical amount of training samples, and the resulting policy, without fine-tuning (zero-shot), is unable to navigate efficiently in new scenarios unseen in the training phase. To equip the navigation agent with sample-efficient learning and {zero-shot} generalization, this work proposes a novel physics-informed RL (PIRL) where a distance-to-target-based cost (standard in e2e) is augmented with physics-informed reward shaping. The key intuition is that wireless environments vary, but physics laws persist. After learning to utilize the physics information, the agent can transfer this knowledge across different tasks and navigate in an unknown environment without fine-tuning. The proposed PIRL is evaluated using a wireless digital twin (WDT) built upon simulations of a large class of indoor environments from the AI Habitat dataset augmented with electromagnetic radiation simulation for wireless signals. It is shown that the PIRL significantly outperforms both e2e RL and heuristic-based solutions in terms of generalization and performance. Source code is available at url{https://github.com/Panshark/PIRL-WIN}.
|
|
16:30-18:00, Paper TuCT7-CC.8 | Add to My Program |
An Environmental-Complexity-Based Navigation Method Based on Hierarchical Deep Reinforcement Learning |
|
Chen, Pengbin | Harbin Institute of Technology, Shenzhen |
Liu, Qi | Harbin Institute of Technology |
Li, Yanjie | Harbin Institute of Technology (Shenzhen) |
Ma, Shuaikang | Harbin Institute of Technology, Shenzhen |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, AI-Enabled Robotics
Abstract: Navigation methods based on deep reinforcement learning (RL) have recently exhibited superior performance, particularly for navigation in dynamic environments. However, most existing methods solely rely on deep neural network feature encoders to extract features from raw LiDAR data, lacking an explicit representation of environmental structure. This limitation hinders effective environmental representation and interpretability, constraining navigation performance improvement. To solve this problem, we propose two quantitative metrics based on laser scans, which explicitly represent environmental complexity and show great interpretability. Furthermore, we propose an environmental-complexity-based navigation method based on hierarchical deep RL with the proposed metrics. Experimental results show that the proposed method achieves better navigation performance than baselines, especially in challenging scenarios with corners and dynamic obstacles.
|
|
16:30-18:00, Paper TuCT7-CC.9 | Add to My Program |
Pre-Trained Masked Image Model for Mobile Robot Navigation |
|
Sharma, Vishnu D. | University of Maryland |
Singh, Anukriti | University of Maryland |
Tokekar, Pratap | University of Maryland |
Keywords: Representation Learning, Deep Learning for Visual Perception, Deep Learning Methods
Abstract: 2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas. Typically, the robot builds the navigation maps incrementally from local observations using onboard sensors. Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency. While many such works build task-specific networks using limited datasets, we show that the existing foundational vision networks can accomplish the same without any fine-tuning. Specifically, we use Masked Autoencoders, pre-trained on street images, to present novel applications for field-of-view expansion, single-agent topological exploration, and multi-agent exploration for indoor mapping, across different input modalities. Our work motivates the use of foundational vision models for generalized structure prediction-driven applications, especially in the dearth of training data. We share more qualitative results at https://raaslab.org/projects/MIM4Robots.
|
|
TuCT8-CC Oral Session, CC-418 |
Add to My Program |
Reinforcement Learning II |
|
|
Chair: Gao, Sicun | UCSD |
|
16:30-18:00, Paper TuCT8-CC.1 | Add to My Program |
Active Automotive Augmented Reality Displays Using Reinforcement Learning |
|
Ryu, Ju-Hyeok | Seoul National University |
Kim, Chan | Seoul National University |
Kim, Seong-Woo | Seoul National University |
Keywords: Reinforcement Learning, AI-Based Methods, Autonomous Agents
Abstract: Demand for home-like spatial functions is increasing in the traditional role of automobiles, which had value as a means of transportation in the past. Advances in technology are making this possible, and as a result, the total area of the display in a vehicle is increasing. This is leading to increased demand for augmented reality displays in vehicles. In order to enhance driving convenience and safety, automotive Augmented Reality displays, e.g.,head-up displays, have garnered attention and are gradually being deployed. However, when vehicles encounter uneven roads, vertical vibrations lead to mismatches between external physical objects and augmented reality overlay images, adversely affecting the AR display’s visibility. Resolving the problem is quite challenging because the optical system operates on magnification design, and is highly sensitive due to its multifunctional nature involving reflection and refraction through an intermediate medium. This paper aims to address the newly emerging problem of vertical mismatches in automotive AR displays. To tackle this issue, we begin by defining the problem and then examine the effectiveness of traditional control methods,on-policy and off-policy reinforcement learning as potential solutions. Finally,we validate our approach through experiments, demonstrating a significant reduction in vertical mismatches and an improvement in the overall visibility of automotive AR displays. Our findings provide valuable insightsfor enhancing driving convenience and safety in real-world conditions.
|
|
16:30-18:00, Paper TuCT8-CC.2 | Add to My Program |
Extremum-Seeking Action Selection for Accelerating Policy Optimization |
|
Chang, Ya-Chien | University of California San Diego |
Gao, Sicun | UCSD |
Keywords: Reinforcement Learning, Deep Learning Methods
Abstract: Reinforcement learning for control over continuous spaces typically uses high-entropy stochastic policies, such as Gaussian distributions, for local exploration and estimating policy gradient to optimize performance. Many robotic control problems deal with complex unstable dynamics, where applying actions that are off the feasible control manifolds can quickly lead to undesirable divergence. In such cases, most samples taken from the ambient action space generate low-value trajectories that hardly contribute to policy improvement, resulting in slow or failed learning. We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC). On each action sampled from stochastic policies, we apply sinusoidal perturbations and query for estimated Q-values as the response signal. Based on ESC, we then dynamically improve the sampled actions to be closer to nearby optima before applying them to the environment. Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.
|
|
16:30-18:00, Paper TuCT8-CC.3 | Add to My Program |
Privacy Risks in Reinforcement Learning for Household Robots |
|
Li, Miao | Carnegie Mellon University |
Ding, Wenhao | Carnegie Mellon University |
Zhao, Ding | Carnegie Mellon University |
Keywords: Reinforcement Learning, Agent-Based Systems, Autonomous Agents
Abstract: The prominence of embodied Artificial Intelligence (AI), which empowers robots to navigate, perceive, and engage within virtual environments, has attracted significant attention, owing to the remarkable advances in computer vision and large language models. Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information. However, the issue of privacy leakage in embodied AI tasks, particularly concerning reinforcement learning algorithms, has not received adequate consideration in research. This paper aims to address this gap by proposing an attack on the training process of the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervisory signals. The choice of using gradients for the attack is motivated by the fact that commonly employed federated learning techniques solely utilize gradients computed based on private user data to optimize models, without storing or transmitting the data to public servers. Nevertheless, these gradients contain sufficient information to potentially expose private data. To validate our approach, we conducted experiments on the AI2THOR simulator and evaluated our algorithm on active perception, a prevalent task in embodied AI. The experimental results demonstrate the effectiveness of our method in successfully reconstructing all information from the data in 120 room layouts. Check our website for videos.
|
|
16:30-18:00, Paper TuCT8-CC.4 | Add to My Program |
MAexp: A Generic Platform for RL-Based Multi-Agent Exploration |
|
Zhu, Shaohao | Zhejiang University |
Zhou, Jiacheng | Zhejiang University |
Chen, Anjun | Zhejiang University |
Bai, Mingming | College of Control Science and Engineering, Zhejiang University |
Chen, Jiming | Zhejiang University |
Xu, Jinming | Zhejiang University |
Keywords: Reinforcement Learning, Performance Evaluation and Benchmarking, Cooperating Robots
Abstract: The sim-to-real gap poses a significant challenge in RL-based multi-agent exploration due to scene quantization and action discretization. Existing platforms suffer from the inefficiency in sampling and the lack of diversity in Multi-Agent Reinforcement Learning (MARL) algorithms across different scenarios, restraining their widespread applications. To fill these gaps, we propose MAexp, a generic platform for multi-agent exploration that integrates a broad range of state-of-the-art MARL algorithms and representative scenarios. Moreover, we employ point clouds to represent our exploration scenarios, leading to high-fidelity environment mapping and a sampling speed approximately 40 times faster than existing platforms. Furthermore, equipped with an attention-based Multi-Agent Target Generator and a Single-Agent Motion Planner, MAexp can work with arbitrary numbers of agents and accommodate various types of robots. Extensive experiments are conducted to establish the first benchmark featuring several high-performance MARL algorithms across typical scenarios for robots with continuous actions, which highlights the distinct strengths of each algorithm in different scenarios.
|
|
16:30-18:00, Paper TuCT8-CC.5 | Add to My Program |
Improving Offline Reinforcement Learning with Inaccurate Simulators |
|
Hou, Yiwen | University of Science and Technology of China |
Sun, Haoyuan | University of Science and Technology of China |
Ma, Jinming | University of Science and Technology of China |
Wu, Feng | University of Science and Technology of China |
Keywords: Reinforcement Learning
Abstract: Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment in robotics. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address this, we propose a novel approach to better combine the offline dataset and the inaccurate simulation data. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate simulator starting from the distribution provided by the generator and reweight the simulated data using the discriminator. Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
|
|
16:30-18:00, Paper TuCT8-CC.6 | Add to My Program |
REFORMA: Robust REinFORceMent Learning Via Adaptive Adversary for Drones Flying under Disturbances |
|
Hsu, Hao-Lun | Duke University |
Meng, Haocheng | Duke University |
Luo, Shaocheng | Duke University |
Dong, Juncheng | Duke University |
Tarokh, Vahid | Duke University |
Pajic, Miroslav | Duke University |
Keywords: Reinforcement Learning
Abstract: In this work, we introduce REFORMA, a novel robust reinforcement learning (RL) approach to design controllers for unmanned aerial vehicles (UAVs) robust to unknown disturbances during flights. These disturbances, typically due to wind turbulence, electromagnetic interference, temperature extremes and many other external physical interference, are highly dynamic and difficult to model. REFORMA can perform a real-time online adaptation to these disturbances and generate appropriate velocity actions as countermeasures to stabilize the drone. REFORMA consists of two components: a base policy trained completely in simulation using model-free RL and an adaptation module trained via supervised learning with on-policy datasets. By varying the disturbance strength in an adaptation module, i.e., adopting adaptive adversary, the policy is then able to handle extreme cases when the velocity of the drone is immediately affected by disturbances. Finally, we demonstrate the effectiveness of our method through extensive simulated experiments. To the best of our knowledge, REFORMA is the first robust RL approach that uses adaptive adversaries to tackle uncertain disturbances in drone tasks.
|
|
16:30-18:00, Paper TuCT8-CC.7 | Add to My Program |
Brain-Inspired Hyperdimensional Computing in the Wild: Lightweight Symbolic Learning for Sensorimotor Controls of Wheeled Robots |
|
Kwon, Hyukjun | DGIST |
Kim, Kangwon | DGIST |
Lee, Junyoung | DGIST |
Lee, Hyunsei | DGIST |
Kim, Jiseung | DGIST |
Kim, Jinhyung | HYU(Hanyang University) |
Kim, Taehyeong | Coga-Robotics |
Kim, Yong Nyeon | Hanyang University |
Ni, Yang | University of California, Irvine |
Imani, Mohsen | University of California Irvine |
Suh, Il Hong | Hanyang University |
Kim, Yeseong | DGIST |
Keywords: Bioinspired Robot Learning, Representation Learning, Reinforcement Learning
Abstract: Efficiency and performance are significant challenges in applying Machine Learning (ML) to robotics, especially in energy-constrained real-world scenarios. In this context, Hyperdimensional Computing offers an energy-efficient alternative but has been underexplored in robotics. We introduce ReactHD, an HDC-based framework tailored for perception-action-based learning for sensorimotor controls of robot tasks. ReactHD employs hypervectors to encode sensory inputs and learn the suitable high-dimensional pattern for robot actions. It also integrates two HD-based lightweight symbolic learning paradigms: HDC-based supervised learning by demonstration (HDC-IL) and HD-Reinforcement Learning (HDC-RL). It renders robots to show precisely situated reactive behaviors in complex environments. Our empirical evaluations show that ReactHD achieves robust and accurate learning outcomes comparable to state-of-the-art deep learning while substantially improving the performance and energy consumption efficiency by 14.2× and 15.3×. To the best of our knowledge, ReactHD is the first HDC-based framework deployed in real-world settings.
|
|
16:30-18:00, Paper TuCT8-CC.8 | Add to My Program |
Learning for Deformable Linear Object Insertion Leveraging Flexibility Estimation from Visual Cues |
|
Li, Mingen | University of California, San Diego |
Choi, Changhyun | University of Minnesota, Twin Cities |
Keywords: Reinforcement Learning
Abstract: Manipulation of deformable Linear objects (DLOs), including iron wire, rubber, silk, and nylon rope, is ubiquitous in daily life. These objects exhibit diverse physical properties, such as Young's modulus and bending stiffness. Such diversity poses challenges for developing generalized manipulation policies. However, previous research limited their scope to single-material DLOs and engaged in time-consuming data collection for the state estimation. In this paper, we propose a two-stage manipulation approach consisting of a material property (e.g., flexibility) estimation and policy learning for DLO insertion with reinforcement learning. Firstly, we design a flexibility estimation scheme that characterizes the properties of different types of DLOs. The ground truth flexibility data is collected in simulation to train our flexibility estimation module. During the manipulation, the robot interacts with the DLOs to estimate flexibility by analyzing their visual configurations. Secondly, we train a policy conditioned on the estimated flexibility to perform challenging DLO insertion tasks. Our pipeline trained with diverse insertion scenarios achieves an 85.6% success rate in simulation and 66.67% in real robot experiments. Please refer to our project page: https://lmeee.github.io/DLOInsert/
|
|
16:30-18:00, Paper TuCT8-CC.9 | Add to My Program |
Reinforcement Learning of Action and Query Policies with LTL Instructions under Uncertain Event Detector |
|
Hatanaka, Wataru | Ricoh Company, Ltd |
Yamashina, Ryota | Ricoh Company, Ltd |
Matsubara, Takamitsu | Nara Institute of Science and Technology |
Keywords: Reinforcement Learning, Planning under Uncertainty
Abstract: Reinforcement learning (RL) with linear temporal logic (LTL) objectives can allow robots to carry out symbolic event plans in unknown environments. Most existing methods assume that the event detector can accurately map environmental states to symbolic events; however, uncertainty is inevitable for real-world event detectors. Such uncertainty in an event detector generates multiple branching possibilities on LTL instructions, confusing action decisions. Moreover, the queries to the uncertain event detector, necessary for the task's progress, may increase the uncertainty further. To cope with those issues, we propose an RL framework, Learning Action and Query over Belief LTL (LAQBL), to learn an agent that can consider the diversity of LTL instructions due to uncertain event detection while avoiding task failure due to the unnecessary event-detection query. Our framework simultaneously learns 1) an embedding of belief LTL, which is multiple branching possibilities on LTL instructions using a graph neural network, 2) an action policy, and 3) a query policy which decides whether or not to query for the event detector. Simulations in a 2D grid world and image-input robotic inspection environments show that our method successfully learns actions to follow LTL instructions even with uncertain event detectors.
|
|
TuCT9-CC Oral Session, CC-419 |
Add to My Program |
Vision-Based Navigation and Learning |
|
|
Chair: Sugiura, Komei | Keio University |
Co-Chair: Gu, Jason | Dalhousie University |
|
16:30-18:00, Paper TuCT9-CC.1 | Add to My Program |
Guided by the Way: The Role of On-The-Route Objects and Scene Text in Enhancing Outdoor Navigation |
|
Sun, Yanjun | Keio University |
Qiu, Yue | National Institute of Advanced Industrial Science and Technology |
Aoki, Yoshimitsu | Keio University |
Kataoka, Hirokatsu | National Institute of Advanced Industrial Science and Technology |
Keywords: Vision-Based Navigation, Deep Learning for Visual Perception, AI-Based Methods
Abstract: In outdoor environments, Vision-and-Language Navigation (VLN) requires an agent to rely on multi-modal cues from real-world urban environments and natural language instructions. While existing outdoor VLN models predict actions using a combination of panorama and instruction features, this approach ignores objects in the environment and learns data bias to fail navigation. According to our preliminary findings, most instances of navigation failure in previous models were due to turning or stopping at the wrong place. In contrast, humans intuitively frequently use identifiable objects or store names as reference landmarks, ensuring accurate turns and stops, especially in unfamiliar places. To address this insight gap, we propose an Object-Attention VLN (OAVLN) model that helps the agent focus on relevant objects during training and understand the environment better. Our model outperforms previous methods in all evaluation metrics under both seen and unseen scenarios on two existing benchmark datasets, Touchdown and map2seq.
|
|
16:30-18:00, Paper TuCT9-CC.2 | Add to My Program |
PlaceNav: Topological Navigation through Place Recognition |
|
Suomela, Lauri Aleksanteri | Tampere University |
Kalliola, Jussi Oskari | Tampere University |
Edelman, Harry | Tampere University |
Kamarainen, Joni-Kristian | Tampere University of Technology |
Keywords: Vision-Based Navigation, Deep Learning for Visual Perception, Localization
Abstract: Recent results suggest that splitting topological navigation into robot-independent and robot-specific components improves navigation performance by enabling the robot-independent part to be trained with data collected by robots of different types. However, the navigation methods' performance is still limited by the scarcity of suitable training data and they suffer from poor computational scaling. In this work, we present PlaceNav, subdividing the robot-independent part into navigation-specific and generic computer vision components. We utilize visual place recognition for the subgoal selection of the topological navigation pipeline. This makes subgoal selection more efficient and enables leveraging large-scale datasets from non-robotics sources, increasing training data availability. Bayesian filtering, enabled by place recognition, further improves navigation performance by increasing the temporal consistency of subgoals. Our experimental results verify the design and the new method obtains a 76% higher success rate in indoor and 23% higher in outdoor navigation tasks with higher computational efficiency.
|
|
16:30-18:00, Paper TuCT9-CC.3 | Add to My Program |
Aligning Knowledge Graph with Visual Perception for Object-Goal Navigation |
|
Xu, Nuo | Zhejiang Lab |
Wang, Wen | Zhejiang Lab |
Yang, Rong | Zhejiang Lab |
Qin, Mengjie | Zhejiang Lab |
Lin, Zheyuan | Zhejiang Lab |
Song, Wei | Zhejiang Lab |
Zhang, Chunlong | Zhejiang Lab |
Gu, Jason | Dalhousie University |
Li, Chao | Zhejiang Lab |
Keywords: Vision-Based Navigation, Deep Learning for Visual Perception, Reinforcement Learning
Abstract: Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph representation of the scenes, which results in misalignment with visual images. To provide more accurate and coherent scene descriptions and address this misalignment issue, we propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation. Technically, our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception. The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability. We extensively evaluate our method using the AI2-THOR simulator and conduct a series of experiments to demonstrate the effectiveness and efficiency of our navigator.
|
|
16:30-18:00, Paper TuCT9-CC.4 | Add to My Program |
Probable Object Location (POLo) Score Estimation for Efficient Object Goal Navigation |
|
Wang, Jiaming | National University of Singapore |
Soh, Harold | National University of Singapore |
Keywords: Vision-Based Navigation, Deep Learning Methods
Abstract: In this work, we focus on object search tasks within unexplored environments. We introduce a framework centered around the Probable Object Location (POLo) score. Utilizing a 3D object probability map, the POLo score allows the agent to make data-driven decisions for efficient object search. We further enhance the framework's practicality by introducing POLoNet, a neural network trained to approximate the computationally-intensive POLo score. Our approach addresses critical limitations of both end-to-end reinforcement learning methods, which suffer from memory decay over long-horizon tasks, and traditional map-based methods that neglect visibility constraints. Our experiments, involving the first phase of the Open-Vocabulary Mobile Manipulation (OVMM) 2023 challenge, demonstrate that an agent equipped with POLoNet significantly outperforms a range of baseline methods, including end-to-end RL techniques and prior map-based strategies. To provide a comprehensive evaluation, we introduce new performance metrics that offer insights into the efficiency and effectiveness of various agents in object goal navigation.
|
|
16:30-18:00, Paper TuCT9-CC.5 | Add to My Program |
Bridging Zero-Shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill |
|
Cai, Wenzhe | Southeast University |
Huang, Siyuan | Shanghai Jiao Tong University |
Cheng, Guangran | Southeast University |
Long, Yuxing | Peking University |
Gao, Peng | Shanghai AI Lab |
Sun, Changyin | Southeast University |
Dong, Hao | Peking University |
Keywords: Vision-Based Navigation, Deep Learning Methods, Imitation Learning
Abstract: Zero-shot object navigation is a challenging task for home-assistance robots. This task emphasizes visual grounding, commonsense inference and locomotion abilities, where the first two are inherent in foundation models. But for the locomotion part, most works still depend on map-based planning approaches. The gap between RGB space and map space makes it difficult to directly transfer the knowledge from foundation models to navigation tasks. In this work, we propose a Pixel-guided Navigation skill (PixNav), which bridges the gap between the foundation models and the embodied navigation task. It is straightforward for recent foundation models to indicate an object by pixels, and with pixels as the goal specification, our method becomes a versatile navigation policy towards all different kinds of objects. Besides, our PixNav is a pure RGB-based policy that can reduce the cost of home- assistance robots. Experiments demonstrate the robustness of the PixNav which achieves 80+% success rate in the local path-planning task. To perform long-horizon object navigation, we design an LLM-based planner to utilize the commonsense knowledge between objects and rooms to select the best waypoint. Evaluations across both photorealistic indoor simulators and real- world environments validate the effectiveness of our proposed navigation strategy. More details are accessible via our project website https://sites.google.com/view/pixnav/.
|
|
16:30-18:00, Paper TuCT9-CC.6 | Add to My Program |
GeoAdapt: Self-Supervised Test-Time Adaptation in LiDAR Place Recognition Using Geometric Priors |
|
Knights, Joshua Barton | Queensland University of Technology |
Hausler, Stephen | CSIRO |
Sridharan, Sridha | Queensland University of Technology |
Fookes, Clinton | Queensland University of Technology |
Moghadam, Peyman | CSIRO |
Keywords: Vision-Based Navigation, Deep Learning Methods, Recognition
Abstract: LiDAR place recognition approaches based on deep learning suffer from significant performance degradation when there is a shift between the distribution of training and test datasets, often requiring re-training the networks to achieve peak performance. However, obtaining accurate ground truth data for new training data can be prohibitively expensive, especially in complex or GPS-deprived environments. To address this issue we propose GeoAdapt, which introduces a novel auxiliary classification head to generate pseudo-labels for re-training on unseen environments in a self-supervised manner. GeoAdapt uses geometric consistency as a prior to improve the robustness of our generated pseudo-labels against domain shift, improving the performance and reliability of our Test-Time Adaptation approach. Comprehensive experiments show that GeoAdapt significantly boosts place recognition performance across moderate to severe domain shifts, and is competitive with fully supervised test-time adaptation approaches. Our code is available at https://github.com/csiro-robotics/GeoAdapt.
|
|
16:30-18:00, Paper TuCT9-CC.7 | Add to My Program |
ViPlanner: Visual Semantic Imperative Learning for Local Navigation |
|
Roth, Pascal | ETH Zurich |
Nubert, Julian | ETH Zürich |
Yang, Fan | ETH Zurich |
Mittal, Mayank | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Vision-Based Navigation, Integrated Planning and Learning, Semantic Scene Understanding
Abstract: Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: https://github.com/leggedrobotics/viplanner.
|
|
16:30-18:00, Paper TuCT9-CC.8 | Add to My Program |
UIVNAV: Underwater Information-Driven Vision-Based Navigation Via Imitation Learning |
|
Lin, Xiaomin | University of Maryland |
Karapetyan, Nare | Woods Hole Oceanographic Institution |
Joshi, Kaustubh | University of Maryland College Park |
Liu, Tianchen | University of Maryland, College Park |
Chopra, Nikhil | University of Maryland, College Park |
Yu, Miao | University of Maryland |
Tokekar, Pratap | University of Maryland |
Aloimonos, Yiannis | University of Maryland |
Keywords: Vision-Based Navigation, Marine Robotics, Visual Learning
Abstract: Autonomous navigation in the underwater environment is challenging due to limited visibility, dynamic changes, and the lack of a cost-efficient, accurate localization system. We introduce UIVNav, a novel end-to-end underwater navigation solution designed to navigate robots over Objects of Interest (OOI) while avoiding obstacles, all without relying on localization. UIVNav utilizes imitation learning and draws inspiration from the navigation strategies employed by human divers, who do not rely on localization. UIVNav consists of the following phases: (1) generating an intermediate representation (IR) and (2) training the navigation policy based on human-labeled IR. By training the navigation policy on IR instead of raw data, the second phase is domain-invariant --- the navigation policy does not need to be retrained if the domain or the OOI changes. We demonstrate this within simulation by deploying the same navigation policy to survey two distinct Objects of Interest (OOIs): oyster and rock reefs. We compared our method with complete coverage and random walk methods, showing that our approach is more efficient in gathering information for OOIs while avoiding obstacles. The results show that UIVNav chooses to visit the areas with larger area sizes of oysters or rocks with no prior information about the environment or localization. Moreover, a robot using UIVNav compared to complete coverage method surveys on average 36% more oysters when traveling the same distances. We also demonstrate the feasibility of real-time deployment of UIVNav in pool experiments with BlueROV underwater robot for surveying a bed of oyster shells.
|
|
16:30-18:00, Paper TuCT9-CC.9 | Add to My Program |
Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference-Aligned Path Planning |
|
Karnan, Haresh | The University of Texas at Austin |
Yang, Elvin | University of Michigan, Ann Arbor |
Warnell, Garrett | U.S. Army Research Laboratory |
Stone, Peter | University of Texas at Austin |
Biswas, Joydeep | University of Texas at Austin |
Keywords: Vision-Based Navigation, Representation Learning, Autonomous Vehicle Navigation
Abstract: Autonomous mobility tasks such as last-mile delivery require reasoning about operator-indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, coping with out-of-distribution data from novel terrains or appearance changes due to lighting variations remains a fundamental problem in visual terrain-adaptive navigation. Existing solutions either require labor-intensive manual data re-collection and labeling or use hand-coded reward functions that may not align with operator preferences. In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain preferences within the inertial-proprioceptive-tactile domain. Leveraging this insight, we introduce Preference extrApolation for Terrain-awarE Robot Navigation (PATERN), a novel framework for extrapolating operator terrain preferences for visual navigation. PATERN learns to map inertial-proprioceptive-tactile measurements from the robot’s observations to a representation space and performs a nearest-neighbor search in this space to estimate operator preferences over novel terrains. Through physical robot experiments in outdoor environments, we assess PATERN’s capability to extrapolate preferences and generalize to novel terrains and challenging lighting conditions. Compared to baseline approaches, our findings indicate that PATERN robustly generalizes to diverse terrains and varied lighting conditions, while navigating in a preference-aligned manner.
|
|
TuCT10-CC Oral Session, CC-501 |
Add to My Program |
Soft Sensors and Actuators I |
|
|
Co-Chair: Katzschmann, Robert Kevin | ETH Zurich |
|
16:30-18:00, Paper TuCT10-CC.1 | Add to My Program |
A Biomorphic Whisker Sensor for Aerial Tactile Applications |
|
Ye, Chaoxiang | Delft University of Technology |
de Croon, Guido | TU Delft |
Hamaza, Salua | TU Delft |
Keywords: Soft Sensors and Actuators, Force and Tactile Sensing, Deep Learning Methods
Abstract: Unmanned air vehicles (UAVs) have traditionally been considered as “eyes in the sky”, that can move in three dimensions and need to avoid any contact with their environment. On the contrary, contact should not be considered as a problem, but as an opportunity to expand the range of UAVs applications. In this paper, we designed, fabricated, and characterized a whisker sensor unit based on MEMS barometers suitable for tactile localization on UAVs, featuring lightweight, low stiffness, high sensitivity, a broad sensing range, and scalability. Then, for the challenging task of contact point localization, we propose a Recurrent Multi output Network (RMN) for predicting 3D contact points under continuous contact conditions to address the problems of non-linearity, hysteresis, and non-injective mapping between signals and contact points by considering time series. In addition, we propose an azimuth prediction loss function which reduces the RMSE by 3.24◦ compared to L1 loss. Finally, we conduct experiments on a linear stage to validate the 3D contact point localization capability of the proposed whisker system and model. The results show that our localization can achieve excellent performance, with an inference time of 1.4 ms and a mean error of only 9.18 mm in Euclidean distance within 3D space, laying a robust foundation for future implementation of tactile localization on UAVs. The design files, dataset, and source code are available on: https: //github.com/BioMorphic-Intelligence-Lab/Whisker-3D-Localiz ation.
|
|
16:30-18:00, Paper TuCT10-CC.2 | Add to My Program |
Embedded Air Channels Transform Soft Lattices into Sensorized Grippers |
|
Zhang, Annan | Massachusetts Institute of Technology |
Chin, Lillian | UT Austin |
Tong, Daniel L | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: Soft Sensors and Actuators, Grippers and Other End-Effectors, Perception for Grasping and Manipulation
Abstract: Sensing plays a pivotal role in robotic manipulation, dictating the accuracy and versatility with which objects are handled. Vision-based sensing methods often suffer from fabrication complexity and low durability, while approaches that rely on direct measurements on the gripper often have limited resolution and are difficult to scale. Here, we present a soft robotic gripper made out of two cubic lattices that are sensorized by embedding air channels within the structure. The lattices are 3D printed from a single build material, simplifying the fabrication process. The flexibility of this approach offers significant control over sensor and lattice design, while the pressure-based internal sensing provides measurements with minimal disruption to the grasping surface. With only 12 sensors, 6 per lattice, this gripper can estimate an object's weight and location and offer new insights into grasp parameters like friction coefficients and grasp force.
|
|
16:30-18:00, Paper TuCT10-CC.3 | Add to My Program |
Towards Automatic Design of Soft Pneumatic Actuators: Inner Structure Design Using CNN Model and Bézier Curve-Based Genetic Algorithm |
|
Mosser, Loïc | Icube - Université De Strasbourg |
Barbé, Laurent | University of Strasbourg, ICube CNRS |
Rubbert, Lennart | INSA - Strasbourg |
Renaud, Pierre | ICube |
Keywords: Soft Sensors and Actuators, Hydraulic/Pneumatic Actuators
Abstract: In this paper, the development of a method for the design of soft pneumatic actuators is described. The focus is given on the interest of using a deep learning model to explore the design space with a genetic algorithm. In particular, we propose to perform the automatic synthesis of the inner structure of pneumatic actuators using Bézier curves and Gaussian Mixture Points, to have a simple representation of the actuator genotype. This makes it possible to represent a wide variety of structures and to take into account the presence of the actuator pneumatic supply. It is shown a CNN model can interestingly be used in conjunction with FEM. FEM is being used to train initially the CNN model and for the control of accuracy, while the CNN model reduces the computational cost, offering a sufficient accuracy during the synthesis thanks to transfer learning. Through two case studies, the capacity of generating geometrically complex designs such as a double-helix network for a twisting actuator is outlined. Its possible extension and further use are also discussed.
|
|
16:30-18:00, Paper TuCT10-CC.4 | Add to My Program |
Design of a Rigid-Soft Hybrid Robotic Glove with Force Sensing Function |
|
Li, Hexin | Harbin Institute of Technology |
Jiang, Li | Harbin Institute of Technology |
Zhen, Ruichen | Harbin Institute of Technology |
Cheng, Ming | Harbin Institute of Technology |
Ding, Kehan | Harbin Institute of Technology |
Keywords: Soft Sensors and Actuators, Rehabilitation Robotics, Soft Robot Applications
Abstract: Soft robotic gloves can not only provide timely, effective, safe and cheap rehabilitation training for patients with impaired movement function of hand, but also assist in completing daily grasping activities. However, most soft robotic gloves are completely composed of flexible structures. Although they have high flexibility and safety, there are problems such as poor fit and low output force. In order to solve these problems, this paper refers to the structure of the human hand and designs an articulated rigid-soft hybrid robotic glove, which combines the advantages of rigid robotic gloves and soft robotic gloves, and has high flexibility, high output force and good fit. In addition, soft robotic gloves generally lack the ability to sense the force between the human hand and the glove. Therefore, this paper designed an array flexible force sensor, and studied the structure, signal acquisition and preparation process of the sensor. Finally, a complete test platform was built to test the performance of the rigid-soft hybrid robotic glove with force sensing function. The test results show that the robotic glove has good fit and high output force, can effectively assist training and assist grasping, and can perceive the contact force.
|
|
16:30-18:00, Paper TuCT10-CC.5 | Add to My Program |
SoftER: A Spiral Soft Robotic Ejector for Sorting Applications |
|
Zournatzis, Ilias | Hellenic Mediterranean University |
Kalaitzakis, Sotiris | Hellenic Mediterranean University |
Polygerinos, Panagiotis | Hellenic Mediterranean University |
Keywords: Soft Sensors and Actuators, Soft Robot Applications, Soft Robot Materials and Design
Abstract: For over thirty years optical belt drive configurations are being used in food industries to automatically sort produce in high operating speeds. Despite their benefits, these multi-component assemblies are prone to faults, difficult to clean, and require frequent maintenance that halts the production lines for considerable amount of time. In this paper, we adopt the abundantly occurring spiral motions encountered in nature and translate them to the proof-of concept design and development of a soft pneumatic actuator (SPA), the SoftER. This novel actuator has the ability to rapidly unwind when pressurized to deliver impact forces. We explore this inherently low-cost and simple design and its potential to replace current systems based on the results of an application case study presented in this paper. Simulation driven optimization methods are leveraged, utilizing quasi-static and dynamic finite element methods models, to create a scalable framework for selecting the best performing design parameters for each application. Using rapid manufacturing processes the optimized actuator is constructed and physical testing validates its high-speed and impact force delivering capabilities.
|
|
16:30-18:00, Paper TuCT10-CC.6 | Add to My Program |
Design and Validation of Soft Sliding Structure with Adjustable Stiffness for Ankle Sprain Prevention |
|
Ham, Seoyeon | Hanyang University |
Paing, Soe Lin | Arizona State University |
Kang, Brian Byunghyun | Sejong University |
Lee, Hyunglae | Arizona State University |
Kim, Wansoo | Hanyang University ERICA |
Keywords: Soft Sensors and Actuators, Soft Robot Applications, Wearable Robotics
Abstract: This study presents the design and validation of a soft sliding stiffness structure with a soft-rigid layer sliding mechanism. It aims to mitigate ankle sprains and address the progression of chronic ankle instability by providing stiffness support. The soft-rigid layer sliding mechanism of the structure is designed to achieve a wide range of stiffness while maintaining a compact form factor. The structure incorporates rigid retainer pieces within each layer, which allows for sliding within a hollow cuboid structure and enables modulation of stiffness. An analytical model is presented to investigate the variations in stiffness resulting from the different sliding states. The stiffness characteristics of the structure were validated through both bench tests and human subject tests. The gradual sliding of the structure’s layer resulted in an increase in stiffness, aligning with the analytical model’s predictions. At the most rigid stage (0% alignment), the stiffness exhibited a significant increase of 111.1% compared to the most flexible stage (100% alignment). Additionally, the human subject testing demonstrated a stiffness increase of up to 93.8%. These results underscore the potential applicability of the soft sliding structure in ankle support applications.
|
|
16:30-18:00, Paper TuCT10-CC.7 | Add to My Program |
Capacitive Origami Sensing Modules for Measuring Force in a Neurosurgical, Soft Robotic Retractor |
|
Van Lewen, Daniel | Boston University |
Wang, Catherine | Boston University |
Lee, Hun Chan | Boston University |
Devaiah, Anand | Boston University |
Upadhyay, Urvashi | Boston University |
Russo, Sheila | Boston University |
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Soft Robot Applications
Abstract: In neurosurgery, soft robots have the potential to introduce significant benefits over traditional metal tools for their ability to safely interact with delicate tissues. In this paper, we introduce a proof-of-concept soft, capacitive origami sensing module (OSM) that can measure forces during neurosurgical retraction. Using origami-inspired design and fabrication principles, the OSM is easily folded and integrated within a soft robotic retractor that interacts with brain tissue to generate a surgical workspace upon actuation. We demonstrate the individual OSM signal response to forces and folding. We further characterize the OSM response within a fully-assembled soft robotic retractor to both folding and the application of forces over 0-5 N showing a 0.38 N average prediction error and resolution of 0.25 N. The sensing capability of the retractor is validated on an in-vitro model to demonstrate prediction errors of 0.06 N and the proposed operation during neurosurgery.
|
|
16:30-18:00, Paper TuCT10-CC.8 | Add to My Program |
A Soft Miniaturized Continuum Robot with 3D Shape Sensing Via Functionalized Soft Optical Waveguides |
|
Del Bono, Viola | Boston University |
McCandless, Max | Boston Children's Hospital |
Juliá Wise, Frank | Boston University |
Russo, Sheila | Boston University |
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Soft Robot Applications
Abstract: In this paper, we present a fully soft miniaturized continuum robot that integrates 3D optical shape sensing through functionalized tubing used as soft optical waveguides. The sensor is fabricated by laser patterning an off-the-shelf medical tubing, allowing for bidirectional responses to large curvatures in two bending directions, enabling 3D shape sensing and tip tracking of the continuum robot. The robot is able to bend and sense its own shape up to a curvature of 44.7 m-1, corresponding to a bending angle of 102◦, having high accuracy tracking capabilities, resulting in an average tracking error of 3.08 mm, that is 7.7 % of the robot length. The robot’s functionality was shown in validation experiments, including a real-time shape prediction through a graphical user interface.
|
|
16:30-18:00, Paper TuCT10-CC.9 | Add to My Program |
Continuously Estimate and Control Prosthetic Grip Force by an Optical Waveguide Sensor |
|
Ju, Linhang | Beihang University |
Jia, Hanze | Beihang University |
Shi, Yanjun | Beihang University |
Ding, Xilun | Beijing Univerisity of Aeronautics and Astronautics |
Feng, Yanggang | Beihang University |
Zhang, Wuxiang | Beihang University |
Keywords: Intention Recognition, Sensor-based Control, Force Control
Abstract: The emergence of intelligent prostheses has facilitated the life and work of disabled patients. The interaction aspect of prostheses has become a highlight research topic in the field of rehabilitation robotics. However, most of the existing prosthetic interaction methods focus on the use of myoelectricity to classify finite gestures, rather than continuous (infinite) force detection, which greatly limits the use of prosthetic scenarios. In this study, a novel optical waveguide sensor was used to collect muscle deformation information from the human arm for continuous control of the prosthetic grip force. The optical waveguide sensor was embedded with carbon fiber to limit the stretching of the waveguide, which led to the optical waveguide sensor being sensitive to bending deformation. Compared with EMGs, the accuracy of continuous grip force control based on the optical waveguide sensor is higher. The R-Square for prosthetic grip force and hand grip force were 0.867 and 0.9724 in the periodic and sustaining grip force experiments, respectively. The results suggested that the proposed method could provide a new approach to the interaction of prostheses.
|
|
TuCT11-CC Oral Session, CC-502 |
Add to My Program |
Deep Learning for Visual Perception III |
|
|
Chair: Zhao, Na | SUTD |
Co-Chair: Kwon, Heesung | DEVCOM Army Research Laboratory |
|
16:30-18:00, Paper TuCT11-CC.1 | Add to My Program |
UAV-Sim: NeRF-Based Synthetic Data Generation for UAV-Based Perception |
|
Maxey, Christopher | University of Maryland, Army Research Laboratory |
Choi, Jaehoon | University of Maryland, College Park |
Lee, Hyungtae | US Army Research Laboratory |
Manocha, Dinesh | University of Maryland |
Kwon, Heesung | DEVCOM Army Research Laboratory |
Keywords: Deep Learning for Visual Perception, Aerial Systems: Perception and Autonomy
Abstract: Tremendous variations coupled with large degrees of freedom in UAV-based imaging conditions lead to a significant lack of data in adequately learning UAV-based perception models. Using various synthetic renderers in conjunction with perception models is prevalent to create synthetic data to augment the learning in the ground-based imaging domain. However, severe challenges in the austere UAV-based domain require distinctive solutions to image synthesis for data aug- mentation. In this work, we leverage recent advancements in neural rendering to improve static and dynamic novel- view UAV-based image synthesis, especially from high altitudes, capturing salient scene attributes. Finally, we demonstrate a considerable performance boost is achieved when a state-of- the-art detection model is optimized primarily on hybrid sets of real and synthetic data instead of the real or synthetic data separately.
|
|
16:30-18:00, Paper TuCT11-CC.2 | Add to My Program |
Contrastive Learning for Enhancing Robust Scene Transfer in Vision-Based Agile Flight |
|
Xing, Jiaxu | University of Zurich |
Bauersfeld, Leonard | University of Zurich (UZH) |
Song, Yunlong | University of Zurich |
Xing, Chunwei | ETH Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Deep Learning for Visual Perception, Aerial Systems: Perception and Autonomy, Imitation Learning
Abstract: Scene transfer for vision-based mobile robotics applications is a highly relevant and challenging problem. The utility of a robot greatly depends on its ability to perform a task in the real world, outside of a well-controlled lab environment. Existing scene transfer end-to-end policy learning approaches often suffer from poor sample efficiency or limited generalization capabilities, making them unsuitable for mobile robotics applications. This work proposes an adaptive multi-pair contrastive learning strategy for visual representation learning that enables zero-shot scene transfer and real-world deployment. Control policies relying on the embedding are able to operate in unseen environments without the need for finetuning in the deployment environment. We demonstrate the performance of our approach on the task of agile, vision-based quadrotor flight. Extensive simulation and real-world experiments demonstrate that our approach successfully generalizes beyond the training domain and outperforms all baselines.
|
|
16:30-18:00, Paper TuCT11-CC.3 | Add to My Program |
Watching the Air Rise: Learning-Based Single-Frame Schlieren Detection |
|
Achermann, Florian | ETH Zurich, ASL |
Haug, Julian Andreas | ETH Zurich |
Zumsteg, Tobias | ETH Zürich |
Lawrance, Nicholas | CSIRO Data61 |
Chung, Jen Jen | The University of Queensland |
Kolobov, Andrey | Microsoft Research |
Siegwart, Roland | ETH Zurich |
Keywords: Deep Learning for Visual Perception, Energy and Environment-Aware Automation, Aerial Systems: Perception and Autonomy
Abstract: Detecting air flows caused by phenomena such as heat convection is valuable in multiple scenarios, including leak identification and locating thermal updrafts for extending UAV flight duration. Unfortunately, the heat signature of these flows is often too subtle to be seen by a thermal camera. While convection also leads to fluctuations in air density and hence causes so-called schlieren – intensity and color variations in images – existing techniques such as Background-oriented schlieren (BOS) allow detecting them only against a known background and from a static camera, making these approaches unsuitable for moving vehicles. In this work we demonstrate the feasibility of visualizing air movement by predicting the corresponding schlieren-induced optical flow from a single greyscale image captured by a moving camera against an unfamiliar background. We first record and label a set of optical flows in an indoor setup using standard BOS techniques. We then train a convolutional neural network (CNN) by applying the previously collected optical flow distortions to a dataset containing a mixture of real and synthetically generated images to predict the two-dimensional optical flow from a single image. Finally, we evaluate our approach on the task of extracting the optical flow caused by schlieren from both a static and moving camera on previously unseen flow patterns and background images.
|
|
16:30-18:00, Paper TuCT11-CC.4 | Add to My Program |
High-Throughput Visual Nano-Drone to Nano-Drone Relative Localization Using Onboard Fully Convolutional Networks |
|
Crupi, Luca | IDSIA USI-SUPSI |
Giusti, Alessandro | IDSIA USI-SUPSI |
Palossi, Daniele | ETH Zurich |
Keywords: Deep Learning for Visual Perception, Micro/Nano Robots, Vision-Based Navigation
Abstract: Relative drone-to-drone localization is a fundamental building block for any swarm operations. We address this task in the context of miniaturized nano-drones, i.e., 10cm in diameter, which show an ever-growing interest due to novel use cases enabled by their reduced form factor. The price for their versatility comes with limited onboard resources, i.e., sensors, processing units, and memory, which limits the complexity of the onboard algorithms. A traditional solution to overcome these limitations is represented by lightweight deep learning models directly deployed aboard nano-drones. This work tackles the challenging relative pose estimation between nano-drones using only a gray-scale low-resolution camera and an ultra-low-power System-on-Chip (SoC) hosted onboard. We present a vertically integrated system based on a novel vision-based fully convolutional neural network (FCNN), which runs at 39Hz within 101mW onboard a Crazyflie nano-drone extended with the GWT GAP8 SoC. We compare our FCNN against three State-of-the-Art (SoA) systems. Considering the best-performing SoA approach, our model results in a R^2 improvement from 32 to 47% on the horizontal image coordinate and from 18 to 55% on the vertical image coordinate, on a real-world dataset of 30k images. Finally, our in-field tests show a reduction of the average tracking error of 37% compared to a previous SoA work and an endurance performance up to the entire battery lifetime of 4min.
|
|
16:30-18:00, Paper TuCT11-CC.5 | Add to My Program |
End-To-End Semi-Supervised 3D Instance Segmentation with PCTeacher |
|
Li, Linfeng | Nanyang Technological University |
Zhao, Na | SUTD |
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: 3D instance segmentation is a fundamental and critical task for enabling robots to operate effectively in unstructured 3D environments. In order to address the challenges posed by the high demand for large-scale annotated data and the limited availability of such data in the context of 3D instance segmentation, we study semi-supervised 3D instance segmentation problem and propose a novel end-to-end framework based on the mean teacher paradigm, named PCTeacher. Our PCTeacher generates both point-level and cluster-level pseudo labels to harness knowledge from unlabeled data. It notably enhances the training stability through end-to-end training and improves pseudo-label quality. Specifically, for point-level pseudo labels, PCTeacher employs a multi-view fusion strategy to achieve higher precision and recall. Regarding cluster-level pseudo labels, it introduces a hybrid grouping strategy to generate more potential proposals and utilizes a point-cluster agreement-based thresholding (PCAT) mechanism to fully exploit cluster-level pseudo labels. By combining and strengthening both point-level and cluster-level pseudo labels, our PCTeacher achieves state-of-the-art performance on two benchmark datasets across multiple labeled data ratios with a more compact network compared to the existing method.
|
|
16:30-18:00, Paper TuCT11-CC.6 | Add to My Program |
PAg-NeRF: Towards Fast and Efficient End-To-End Panoptic 3D Representations for Agricultural Robotics |
|
Smitt, Claus | University of Bonn |
Halstead, Michael Allan | Bonn University |
Zimmer, Patrick | University of Bonn |
Läbe, Thomas | University of Bonn |
Guclu, Esra | University of Bonn |
Stachniss, Cyrill | University of Bonn |
McCool, Christopher Steven | University of Bonn |
Keywords: Deep Learning for Visual Perception, Semantic Scene Understanding, Agricultural Automation
Abstract: Precise scene understanding is key for most robot monitoring and intervention tasks in agriculture. In this work we present PAg-NeRF which is a novel NeRF-based system that enables 3D panoptic scene understanding. Our representation is trained using an image sequence with noisy robot odometry poses and automatic panoptic predictions with inconsistent IDs between frames. Despite this noisy input, our system is able to output scene geometry, photo-realistic renders and 3D consistent panoptic representations with consistent instance IDs. We evaluate this novel system in a very challenging horticultural scenario and in doing so demonstrate an end-to-end trainable system that can make use of noisy robot poses rather than precise poses that have to be pre-calculated. Compared to a baseline approach the peak signal to noise ratio is improved from 21.34dB to 23.37dB while the panoptic quality improves from 56.65% to 70.08%. Furthermore, our approach is faster and can be tuned to improve inference time by more than a factor of 2 while being memory efficient with approximately 12 times fewer parameters. Code, data and interactive results are available at https://claussmitt.com/pagnerf
|
|
16:30-18:00, Paper TuCT11-CC.7 | Add to My Program |
GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints |
|
Cheng, Anqi | Nanyang Technological University (NTU) |
Yang, Zhiyuan | Nanyang Technological University (NTU) |
Zhu, Haiyue | Agency for Science, Technology and Research (A*STAR) |
Mao, Kezhi | Nanyang Technological University (NTU) |
Keywords: Deep Learning for Visual Perception, Sensor Fusion, RGB-D Perception
Abstract: Self-supervised depth estimation has evolved into an image reconstruction task that minimizes a photometric loss. While recent methods have made strides in indoor depth estimation, they often produce inconsistent depth estimation in textureless areas and unsatisfactory depth discrepancies at object boundaries. To address these issues, in this work, we propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints. The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions by allocating weights based on gradient magnitudes.The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries, leveraging a co-optimization network and proxy semantic labels derived from a pretrained segmentation model. Experimental studies on three indoor datasets, including NYUv2, ScanNet, and InteriorNet, show that GAM-Depth outperforms existing methods and achieves state-of-the-art performance, signifying a meaningful step forward in indoor depth estimation. Our code will be available at https://github.com/AnqiCheng1234/GAM-Depth.
|
|
16:30-18:00, Paper TuCT11-CC.8 | Add to My Program |
RoboKeyGen: Robot Pose and Joint Angles Estimation Via Diffusion-Based 3D Keypoint Generation |
|
Tian, Yang | Peking University |
Zhang, Jiyao | Peking University |
Huang, Guowei | Huawei Technologies Co., Ltd |
Wang, Bin | Noah's Ark Lab, Huawei |
Wang, Ping | Peking University |
Pang, Jiangmiao | Shanghai AI Laboratory |
Dong, Hao | Peking University |
Keywords: Deep Learning for Visual Perception, Visual Learning, Calibration and Identification
Abstract: Estimating robot pose and joint angles is pivotal in advanced robotics, underpinning applications like robot collaboration and online hand-eye calibration. However, the introduction of unknown joint angles makes prediction more complex than simple robot pose estimation, due to its higher dimensionality. Previous methods either regress 3D keypoints directly or utilise a render&compare strategy. These approaches often falter in terms of performance or efficiency and grapple with the cross-camera gap problem. This paper presents a novel framework that bifurcates the high-dimensional prediction challenge into two manageable subtasks: detecting 2D keypoints and lifting 2D keypoints to 3D. This separation promises enhanced performance without sacrificing the efficiency innate to keypoint-based techniques. A vital component of our method is the lifting of 2D projections to 3D keypoints. Common deterministic regression methods may falter when faced with uncertainties from 2D detection errors or self-occlusions. Leveraging the robust modeling potential of diffusion models, we reframe this issue as a conditional 3D keypoints generation task. To bolster cross-camera adaptability, we introduce the Normalized Camera Coordinate Space (NCCS), ensuring alignment of estimated 2D keypoints across varying camera intrinsics. Experimental results demonstrate that the proposed method outperforms the state-of-the-art render&compare method RoboPose and achieves higher inference speed. Furthermore, the tests accentuate our method's robust cross-camera generalisation capabilities. We intend to release both the dataset and code.
|
|
16:30-18:00, Paper TuCT11-CC.9 | Add to My Program |
Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development |
|
Zhao, Runkai | University of Sydney |
Heng, Yuwen | Baidu ACG |
Wang, Heng | University of Sydney |
Gao, Yuanda | Baidu ACG |
Liu, Shilei | Baidu ACG |
Yao, Changhao | Shanghai Jiao Tong University |
Chen, Jiawen | Baidu ACG |
Cai, Weidong | University of Sydney |
Keywords: Deep Learning for Visual Perception, Visual Learning, Recognition
Abstract: Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making. However, their application in 3D lane detection for effective driving environment perception is hindered by the lack of comprehensive LiDAR datasets. The sparse nature of LiDAR point cloud data prevents an efficient manual annotation process. To solve this problem, we present LiSV-3DLane, a large-scale 3D lane dataset that comprises 20k frames of surround-view LiDAR point clouds with enriched semantic annotation. Unlike existing datasets confined to a frontal perspective, LiSV-3DLane provides a full 360-degree spatial panorama around the ego vehicle, capturing complex lane patterns in both urban and highway environments. We leverage the geometric traits of lane lines and the intrinsic spatial attributes of LiDAR data to design a simple yet effective automatic annotation pipeline for generating finer lane labels. To propel future research, we propose a novel LiDAR-based 3D lane detection model, LiLaDet, incorporating the spatial geometry learning of the LiDAR point cloud into Bird’s Eye View (BEV) based lane identification. Experimental results indicate that LiLaDet outperforms existing camera- and LiDAR-based approaches in the 3D lane detection task on the K-Lane dataset and our LiSV-3DLane.
|
|
TuCT12-CC Oral Session, CC-503 |
Add to My Program |
Deep Learning in Grasping and Manipulation III |
|
|
Chair: Watanabe, Tetsuyou | Kanazawa University |
|
16:30-18:00, Paper TuCT12-CC.1 | Add to My Program |
STOPNet: Multiview-Based 6-DoF Suction Detection for Transparent Objects on Production Lines |
|
Kuang, Yuxuan | Peking University |
Han, Qin | Peking University |
Li, Danshi | New York University |
Dai, Qiyu | Peking University |
Ding, Lian | Huawei Cloud Computing Technologies Co., Ltd |
Sun, Dong | Huawei Cloud Computing Technologies Co., Ltd |
Zhao, Hanlin | Huawei Cloud Computing Technologies Co., Ltd |
Wang, He | Peking University |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Computer Vision for Automation
Abstract: In this work, we present STOPNet, a framework for 6-DoF object suction detection on production lines, with a focus on but not limited to transparent objects, which is an important and challenging problem in robotic systems and modern industry. Current methods requiring depth input fail on transparent objects due to depth cameras’ deficiency in sensing their geometry, while we proposed a novel framework to reconstruct the scene on the production line depending only on RGB input, based on multiview stereo. Compared to existing works, our method not only reconstructs the whole 3D scene in order to obtain high-quality 6-DoF suction poses in real time but also generalizes to novel environments, novel arrangements and novel objects, including challenging transparent objects, both in simulation and the real world. Extensive experiments in simulation and the real world show that our method significantly surpasses the baselines and has better generalizability, which caters to practical industrial needs.
|
|
16:30-18:00, Paper TuCT12-CC.2 | Add to My Program |
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation |
|
Vecerik, Mel | DeepMind |
Doersch, Carl | Google DeepMind |
Yang, Yi | Google DeepMind |
Davchev, Todor Bozhinov | DeepMind |
Aytar, Yusuf | DeepMind |
Zhou, Guangyao | Google DeepMind |
Hadsell, Raia | DeepMind |
Agapito, Lourdes | University College London |
Scholz, Jonathan | Google Deepmind |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Perception-Action Coupling
Abstract: For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration.We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching,stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.
|
|
16:30-18:00, Paper TuCT12-CC.3 | Add to My Program |
Learning Extrinsic Dexterity with Parameterized Manipulation Primitives |
|
Yang, Shih-Min | Örebro University |
Magnusson, Martin | Örebro University |
Stork, Johannes A. | Orebro University |
Stoyanov, Todor | Örebro University |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning, Grasping
Abstract: Many practically relevant robot grasping problems feature a target object for which all grasps are occluded, e.g., by the environment. Single-shot grasp planning invariably fails in such scenarios. Instead, it is necessary to first manipulate the object into a configuration that affords a grasp. We solve this problem by learning a sequence of actions that utilize the environment to change the object's pose. Concretely, we employ hierarchical reinforcement learning to combine a sequence of learned parameterized manipulation primitives. By learning the low-level manipulation policies, our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment. Designing such a complex behavior analytically would be infeasible under uncontrolled conditions, as an analytic approach requires accurate physical modeling of the interaction and contact dynamics. In contrast, we learn a hierarchical policy model that operates directly on depth perception data, without the need for object detection, pose estimation, or manual design of controllers. We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace. Our method transfers to a real robot and is able to successfully complete the object picking task in 98% of experimental trials.
|
|
16:30-18:00, Paper TuCT12-CC.4 | Add to My Program |
Learning Active Manipulation to Target Shapes with Model-Free, Long-Horizon Deep Reinforcement Learning |
|
Sivertsvik, Matias | Norwegian University of Science and Technology |
Sumskiy, Kirill | Norwegian University of Science and Technology |
Misimi, Ekrem | SINTEF Ocean |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning
Abstract: We investigate the active manipulation of objects using model-free and long-horizon DRL (Deep Reinforcement Learning) to achieve target shapes. Our proposed approach uses visual observations consisting of segmented images, to mitigate the sim-to-real gap. We address a long-horizon manipulation task requiring a sequence of accurate actions to achieve the target shapes using a robot arm with an RGB-D camera in eye-in-hand configuration, and an elongated, volumetric, elastoplastic object. We find similar objects in food, marine, and manufacturing domains. The aim is to actively manipulate the object into an arbitrary target shape using image observations. We trained a DRL agent using PPO (Proximal Policy Optimization) by running 768 parallel actors in simulation, for a total of 1,2M environment interactions, and tested this on 200 unseen target deformations. In three attempts, 82% of the trials achieved a greater than 90% overlap with the 200 target shapes. By relying on segmentation images as a visual observation space, we successfully transferred the agent to the real world without supplementary training. Our approach does not need any real-world manipulation examples nor fine-tuning in the real world. The robustness of our approach was demonstrated in simulation, and experimentally validated in the real world for specific manipulation tasks, achieving a 94.2% mean zero-shot overlap success rate on previously unseen target shapes.
|
|
16:30-18:00, Paper TuCT12-CC.5 | Add to My Program |
GAMMA: Generalizable Articulation Modeling and Manipulation for Articulated Objects |
|
Yu, Qiaojun | Shanghai Jiao Tong University |
Wang, Junbo | Shanghai Jiao Tong University |
Liu, Wenhai | Shanghai Jiao Tong University |
Hao, Ce | University of California, Berkeley |
Liu, Liu | Hefei University of Technology |
Shao, Lin | National University of Singapore |
Wang, Weiming | Shanghai Jiao Tong University |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Learning Categories and Concepts
Abstract: Articulated objects like cabinets and doors are widespread in daily life. However, directly manipulating 3D articulated objects is challenging because they have diverse geometrical shapes, semantic categories, and kinetic constraints. Prior works mostly focused on recognizing and manipulating articulated objects with specific joint types. They can either estimate the joint parameters or distinguish suitable grasp poses to facilitate trajectory planning. Although these approaches have succeeded in certain types of articulated objects, they lack generalizability to unseen objects, which significantly impedes their application in broader scenarios. In this paper, we propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA), which learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories. In addition, GAMMA adopts adaptive manipulation to iteratively reduce the modeling errors and enhance manipulation performance. We train GAMMA with the PartNet-Mobility dataset and evaluate with comprehensive experiments in SAPIEN simulation and real-world Franka robot. Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects. Images, videos and codes are published on the project website at: http://sites.google.com/view/gamma-articulation
|
|
16:30-18:00, Paper TuCT12-CC.6 | Add to My Program |
Efficient End-To-End Detection of 6-DoF Grasps for Robotic Bin Picking |
|
Liu, Yushi | University Tübingen |
Qualmann, Alexander | Robert Bosch GmbH, Corporate Sector Research and Advance Enginee |
Yu, Zehao | University of Tübingen |
Gabriel, Miroslav | Bosch Center for Artificial Intelligence |
Schillinger, Philipp | Bosch Center for Artificial Intelligence |
Spies, Markus | Bosch Center for Artificial Intelligence |
Anh Vien, Ngo | Bosch GmbH |
Geiger, Andreas | Max Planck Institute for Intelligent Systems, Tübingen |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Grasping
Abstract: Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground-truth grasp orientation at a grasp location during training and therefore can only predict limited grasp orientations which leads to a reduced number of feasible grasps in bin picking with restricted reachability. In this paper, we propose a novel approach for learning dense and diverse 6-DoF grasps for parallel-jaw grippers in robotic bin picking. We introduce a parameterized grasp distribution model based on Power-Spherical distributions that enables a training based on all possible ground-truth samples. Thereby, we also consider the grasp uncertainty enhancing the model’s robustness to noisy inputs. As a result, given a single top-down view depth image, our model can generate diverse grasps with multiple collision-free grasp orientations. Experimental evaluations in simulation and on a real robotic bin picking setup demonstrate the model’s ability to generalize across various object categories achieving an object clearing rate of around 90% in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.
|
|
16:30-18:00, Paper TuCT12-CC.7 | Add to My Program |
Contact Energy Based Hindsight Experience Prioritization |
|
Sayar, Erdi | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
D'Eramo, Carlo | University of Würzburg |
Oguz, Ozgur S. | Bilkent University |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning, Learning from Experience
Abstract: Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms due to the inefficiency in collecting successful experiences. Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories and replacing the desired goal with one of the achieved states so that any failed trajectory can be utilized as a contribution to learning. However, HER uniformly chooses failed trajectories, without taking into account which ones might be the most valuable for learning. In this paper, we address this problem and propose a novel approach Contact Energy Based Prioritization~(CEBP) to select the samples from the replay buffer based on rich information due to contact, leveraging the touch sensors in the gripper of the robot and object displacement. Our prioritization scheme favors sampling of contact-rich experiences, which are arguably the ones providing the largest amount of information. We evaluate our proposed approach on various sparse reward robotic tasks and compare it with the state-of-the-art methods. We show that our method surpasses or performs on par with those methods on robot manipulation tasks. Finally, we deploy the trained policy from our method to a real Franka robot for a pick-and-place task. We observe that the robot can solve the task successfully. The videos and code are publicly available at: https://erdiphd.github.io/HER_force/
|
|
16:30-18:00, Paper TuCT12-CC.8 | Add to My Program |
ASGrasp: Generalizable Transparent Object Reconstruction and 6-DoF Grasp Detection from RGB-D Active Stereo Camera |
|
Shi, Jun | Samsung Research China – Beijing (SRC-B) |
A, Yong | Samsung Research China – Beijing (SRC-B) |
Jin, Yixiang | Samsung Research China – Beijing (SRC-B) |
Li, Dingzhe | Beihang University |
Niu, Haoyu | University of Chinese Academy of Sciences |
Jin, Zhezhu | Samsung Research Institute China – Beijing (SRC-B) |
Wang, He | Peking University |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Grasping
Abstract: In this paper, we tackle the problem of grasping transparent and specular objects. This issue holds importance, yet it remains unsolved within the field of robotics due to failure of recover their accurate geometry by depth cameras. For the first time, we propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera. ASGrasp utilizes a two-layer learning-based stereo network for the purpose of transparent object reconstruction, enabling material-agnostic object grasping in cluttered environments. In contrast to existing RGB-D based grasp detection methods, which heavily depend on depth restoration networks and the quality of depth maps generated by depth cameras, our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction. We create an extensive synthetic dataset through domain randomization, which is based on GraspNet-1Billion. Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping in both simulation and the real via seamless sim-to-real transfer. Our method significantly outperforms SOTA networks and even surpasses the performance upper bound set by perfect visible point cloud inputs. Project page: https://pku-epic.github.io/ASGrasp
|
|
16:30-18:00, Paper TuCT12-CC.9 | Add to My Program |
An Offline Learning of Behavior Correction Policy for Vision-Based Robotic Manipulation |
|
Dong, Qingxiuxiong | Toshiba Corporation |
Kaneko, Toshimitsu | Toshiba Corporation |
Sugiyama, Masashi | The University of Tokyo |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning, Imitation Learning
Abstract: Offline learning usually requires a large dataset for training. In this paper, we focus on vision-based robotic manipulation tasks and utilize certain task properties to achieve offline learning with a small dataset. We propose a two-stage agent consisting of a tentative decision stage and a correction stage, where the tentative decision stage determines a tentative action from the original camera image, and the correction stage determines a correction to the tentative action based on the cropped image according to the tentative action. The correction stage utilizes task properties to obtain the cropped image with task-relevant features, enabling efficient correction. In particular, the training of the two stages can be performed individually, which enables a straightforward application of general offline learning algorithms. We conduct experiments by combining the two-stage agent with conventional offline reinforcement learning and imitation learning algorithms. In both cases, we benchmark the proposed method using RLBench and demonstrate that the task performance is significantly improved by the correction stage.
|
|
TuCT13-AX Oral Session, AX-201 |
Add to My Program |
Social HRI |
|
|
Chair: Rehm, Matthias | Aalborg University |
Co-Chair: Sheng, Weihua | Oklahoma State University |
|
16:30-18:00, Paper TuCT13-AX.1 | Add to My Program |
Training a Non-Cooperator to Identify Vulnerabilities and Improve Robustness for Robot Navigation |
|
Qiu, Quecheng | School of Data Science, USTC, Hefei 230026, China |
Zhang, Xuyang | University of Science and Technology of China |
Yao, Shunyi | University of Science and Technology of China |
Chen, Yu'an | University of Science and Technology of China |
Chen, Guangda | NetEase |
Hua, Bei | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Keywords: Social HRI, Reinforcement Learning, Collision Avoidance
Abstract: Autonomous mobile robots have become popular in various applications coexisting with humans, which requires robots to navigate efficiently and safely in crowd environments with diverse pedestrians. Pedestrians may cooperate with the robot by avoiding it actively or ignoring the robot during their walking, while some pedestrians, denoted as non-cooperators, may try to block the robot. It is also challenging to identify potential vulnerabilities of a navigation policy, i.e., situations that the robot may cause a collision, in various crowd environments, which reduces the reliability and the safety of the robot. In this paper, we propose a deep reinforcement learning (DRL) approach to train a policy simulating the behavior of non-cooperators, which can effectively identify vulnerabilities of a navigation policy. We evaluate the approach both on the ROS navigation stack with DWA and a DRL-based navigation policy, which identifies useful vulnerabilities of both navigation policies for further improvements. Moreover, these non-cooperators play a game with the DRL-based navigation policy, then we can improve the robustness of such navigation policy by retraining it in the sense of asymmetric self-play. We evaluate the retrained navigation policy in various crowd environments with diverse pedestrians. The experimental results show that the approach can improve the robustness of the navigation policy. The source code for the training and the simulation platform is released online at h
|
|
16:30-18:00, Paper TuCT13-AX.2 | Add to My Program |
Toward Grounded Commonsense Reasoning |
|
Kwon, Minae | Stanford University |
Hu, Hengyuan | Facebook |
Myers, Vivek | UC Berkeley |
Karamcheti, Siddharth | Stanford University |
Dragan, Anca | University of California Berkeley |
Sadigh, Dorsa | Stanford University |
Keywords: AI-Based Methods, Social HRI, Human-Centered Automation
Abstract: Consider a robot tasked with tidying a desk with a meticulously constructed Lego sports car. A human may recognize that it is not appropriate to disassemble the sports car and put it away as part of the “tidying.” How can a robot reach that conclusion? Although large language models (LLMs) have recently been used to enable commonsense reasoning, grounding this reasoning in the real world has been challenging. To reason in the real world, robots must go beyond passively querying LLMs and actively gather information from the environment that is required to make the right decision. For instance, after detecting that there is an occluded car, the robot may need to actively perceive the car to know whether it is an advanced model car made out of Legos or a toy car built by a toddler. We propose an approach that leverages an LLM and vision language model (VLM) to help a robot actively perceive its environment to perform grounded commonsense reasoning. To evaluate our framework at scale, we release the MESSYSURFACES dataset which contains images of 70 real-world surfaces that need to be cleaned. We additionally illustrate our approach with a robot on 2 carefully designed surfaces. We find an average 12.9% improvement on the MESSYSURFACES benchmark and an average 15% improvement on the robot experiments over baselines that do not use active perception. The dataset, code, and videos of our approach can be found at https://minaek.github.io/grounded_commonsense_reasoning/.
|
|
16:30-18:00, Paper TuCT13-AX.3 | Add to My Program |
The Effect of Rejection Strategy on Trust and Shopping Choices in Robot-Assisted Shopping |
|
Rehm, Matthias | Aalborg University |
Krummheuer, Antonia | Aalborg University |
Gomez Cubero, Carlos | Aalborg University |
Keywords: Social HRI, Acceptability and Trust, Human-Centered Robotics
Abstract: In this paper, we investigate how a customer-facing service robot can support decision making in shopping interactions. In this role, a robot needs sometimes to reject a customer's choice. Thus, we investigate different rejection strategies with the goal of changing customer behavior. The implemented strategies have been developed based on an ethnographic study on assisted shopping and tested in a lab experiment with 31 participants. The experiment showed significant differences in trust ratings and decision-making depending on the employed strategy.
|
|
16:30-18:00, Paper TuCT13-AX.4 | Add to My Program |
Planning of Explanations for Robot Navigation |
|
Halilovic, Amar | Ulm University |
Krivic, Senka | University of Sarajevo |
Keywords: Social HRI, Human-Centered Robotics, Motion and Path Planning
Abstract: The choices made by autonomous robots in social settings bear consequences for humans and their presumptions of robot behavior. Explanations can serve to alleviate detrimental impacts on humans and amplify their comprehension of robot decisions. We model the process of explanation generation for robot navigation as an automated planning problem considering different possible explanation attributes. Our visual and textual explanations of a robot's navigation are influenced by the robot's personality. Moreover, they account for different contextual, environmental, and spatial characteristics. We present the results of a user study demonstrating that users are more satisfied with multimodal than unimodal explanations. Additionally, our findings reveal low user satisfaction with explanations of a robot with extreme personality traits. In conclusion, we deliberate on potential future research directions and the associated constraints. Our work advocates for fostering socially adept and safe autonomous robot navigation.
|
|
16:30-18:00, Paper TuCT13-AX.5 | Add to My Program |
Learning Crowd Behaviors in Navigation with Attention-Based Spatial-Temporal Graphs |
|
Zhou, Yanying | University of Bonn |
Garcke, Jochen | Universität Bonn |
Keywords: Social HRI, Human-Aware Motion Planning, Deep Learning Methods
Abstract: Safe and efficient navigation in dynamic environments shared with humans remains an open and challenging task for mobile robots. Previous works have shown the efficacy of using reinforcement learning frameworks to train policies for efficient navigation. However, their performance deteriorates when crowd configurations change, i.e. become larger or more complex. Thus, it is crucial to fully understand the complex, dynamic, and sophisticated interactions of the crowd resulting in proactive and foresighted behaviors for robot navigation. In this paper, a novel deep graph learning architecture based on attention mechanisms is proposed, which leverages the spatial-temporal graph to enhance robot navigation. We employ spatial graphs to capture the current spatial interactions, and through the integration with RNN, the temporal graphs utilize past trajectory information to infer the future intentions of each agent. The spatial-temporal graph reasoning ability allows the robot to better understand and interpret the relationships between agents over time and space, thereby making more informed decisions. Compared to previous state-of-the-art methods, our method demonstrates superior robustness in terms of safety, efficiency, and generalization in various challenging scenarios.
|
|
16:30-18:00, Paper TuCT13-AX.6 | Add to My Program |
Grounding Conversational Robots on Vision through Dense Captioning and Large Language Models |
|
Grassi, Lucrezia | University of Genova |
Hong, Zhouyang | University of Genoa |
Recchiuto, Carmine Tommaso | University of Genova |
Sgorbissa, Antonio | University of Genova |
Keywords: Social HRI, Robot Companions, Natural Dialog for HRI
Abstract: This work explores a novel approach to empowering robots with visual perception capabilities using textual descriptions. Our approach involves the integration of GPT-4 with dense captioning, enabling robots to perceive and interpret the visual world through detailed text-based descriptions. To assess both user experience and the technical feasibility of this approach, experiments were conducted with human participants interacting with a Pepper robot equipped with visual capabilities. The results affirm the viability of the proposed approach, allowing to perform vision-based conversations effectively, despite processing time limitations.
|
|
16:30-18:00, Paper TuCT13-AX.7 | Add to My Program |
Exploring the Impact of Narrator Type on Response Latency and Utterance Length During Interactive Storytelling |
|
Bakhoda, Iman | Intelligent Robotics Laboratory, Oakland University, Michigan |
Shahverdi, Pourya | Oakland University, Michigan, USA |
Rousso, Katelyn | Intelligent Robotics Lab, Oakland University, Michigan |
Klotz, Justin | Intelligent Robotics Laboratory, Oakland University, Michigan |
Louie, Wing-Yue Geoffrey | Oakland University |
Keywords: Social HRI, Design and Human Factors, Human-Centered Robotics
Abstract: The inexorable progress of technology brought forth an era where robots increasingly integrate into human life which necessitates the understanding of human-robot interactions (HRI). This study unravels the details of HRI within interactive storytelling contexts. Through a between-subject experiment with 28 participants, we assessed response latency and utterance lengths to interactive story narrations delivered by either a human or a robot. Findings indicated that participants displayed longer response latency interacting with the robot narrator while articulating shorter utterances compared to the human condition where participants displayed longer utterances and shorter response latency. These observations suggest significant differences in cognitive and communicative strategies in human-human versus human-robot interactions. The results underscore the challenges and potential of designing social robots that are time-sensitive in interacting with humans. Future explorations should focus on the cognitive and emotional drivers behind these interactions.
|
|
16:30-18:00, Paper TuCT13-AX.8 | Add to My Program |
Design of Embodied Mediator Haru for Remote Cross Cultural Communication |
|
Gomez, Randy | Honda Research Institute Japan Co., Ltd |
Szapiro, Deborah | University of Technology Sydney |
Cooper, Sara | Honda Research Institute Japan |
Bougria Sanchez, Nabil | Universidad Pablo De Olavide |
Pérez, Guillermo | 4i Intelligent Insights |
Nichols, Eric | Honda Research Institute Japan |
Giménez-Figueroa, Javier | Universidad Pablo De Olavide |
Perez-Moleron, Jose Manuel | Universidad Pablo De Olavide |
Peavy, Matthew | Universidad Pablo De Olavide |
Serrano, Daniel | Eurecat |
Merino, Luis | Universidad Pablo De Olavide |
Keywords: Social HRI, Human-Robot Collaboration, Design and Human Factors
Abstract: Social robots for children have focused mainly on conventional education domains such as teaching language, science, and math, while applications focusing on the enhancement of cultural competency are quite scarce. In this paper, we present a prototype of a robot-mediation framework for cross-cultural communication. This framework paves the way for a social robot to act as a mediator between groups of schoolchildren from different countries. First, we conducted a participatory design activity with an interdisciplinary team, resulting in the extraction of the design, robot roles, and technical requirements. Based on these requirements, we built the robot-mediation system prototype. We conducted a pilot study using the system with groups of high school children in Japan and Australia and our results show the potential of the system to drive children's interest in communicating, sharing, and discussing cultural themes with their remote peers through the social robot.
|
|
16:30-18:00, Paper TuCT13-AX.9 | Add to My Program |
ChatAdp: ChatGPT-Powered Adaptation System for Human-Robot Interaction |
|
Su, Zhidong | Oklahoma State University |
Sheng, Weihua | Oklahoma State University |
Keywords: Social HRI, Reinforcement Learning
Abstract: Different people have different preferences when it comes to human-robot interaction. Therefore, it is desirable for the robot to adapt its actions to fit users' preferences. Human feedback is essential to facilitating robot adaptation. However, when the task is complex or the robot action space is large, it requires a large amount of user feedback. ChatGPT is a powerful generative AI tool based on large language models (LLMs), which possesses a significant corpus of information obtained from human society, and exhibits robust proficiency in the comprehension and acquisition of natural language. Therefore, in this paper, we proposed a ChatGPT-powered adaptation system (ChatAdp) for human-robot interaction which requires less user feedback to achieve a good adaptation result. In the proposed ChatAdp, we use ChatGPT as a user simulator to provide feedback. We evaluated ChatAdp in a case study for context-aware conversation adaptation. The results are very promising. Our proposed method can achieve a mean success rate of 92% on the user's natural language-described preferences after receiving 33 rounds of feedback from a user on average, which is only 2% of the number of states covered by the user preferences and outperforms the two baseline methods.
|
|
TuCT14-AX Oral Session, AX-202 |
Add to My Program |
Rehabilitation Robotics |
|
|
Chair: Nakata, Yoshihiro | The University of Electro-Communications |
Co-Chair: Stroppa, Fabio | Kadir Has University |
|
16:30-18:00, Paper TuCT14-AX.1 | Add to My Program |
The Impact of Evolutionary Computation on Robotic Design: A Case Study with an Underactuated Hand Exoskeleton |
|
Akbas, Baris | Kadir Has University |
Soylemez, Aleyna | Kadir Has University |
Yuksel, Huseyin Taner | Kadir Has University |
Zyada, Mazhar Eid | Kadir Has University |
Sarac, Mine | Kadir Has University |
Stroppa, Fabio | Kadir Has University |
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Optimization and Optimal Control
Abstract: Robotic exoskeletons can enhance human strength and aid people with physical disabilities. However, designing them to ensure safety and optimal performance presents significant challenges. Developing exoskeletons should incorporate specific optimization algorithms to find the best design. This study investigates the potential of Evolutionary Computation (EC) methods in robotic design optimization, with an underactuated hand exoskeleton (U-HEx) used as a case study. We propose improving the performance and usability of the UHEx design, which was initially optimized using a naive bruteforce approach, by integrating EC techniques such as Genetic Algorithm and Big Bang-Big Crunch Algorithm. Comparative analysis revealed that EC methods consistently yield more precise and optimal solutions than brute force in a significantly shorter time. This allowed us to improve the optimization by increasing the number of variables in the design, which was impossible with naive methods. The results show significant improvements in terms of the torque magnitude the device transfers to the user, enhancing its efficiency. These findings underline the importance of performing proper optimization while designing exoskeletons, as well as providing a significant improvement to this specific robotic design.
|
|
16:30-18:00, Paper TuCT14-AX.2 | Add to My Program |
Design & Systematic Evaluation of Power Transmission Efficiency of an Ankle Exoskeleton for Walking Post-Stroke |
|
Cooper, Myles | Harvard University |
Canete, Santiago | Harvard University |
Eckert-Erdheim, Asa | Harvard University |
Kimberley, Aidan | Harvard University |
Siviy, Christopher | Harvard University School of Engineering and Applied Sciences |
Baker, Teresa | Boston University |
Ellis, Terry | Boston University |
Slade, Patrick | Harvard University |
Walsh, Conor James | Harvard University |
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Physically Assistive Devices
Abstract: Community-based locomotor training post-stroke has shown improvements in independent ambulation by increasing dose, intensity, and specificity of walking practice. Robotic ankle exoskeletons hold the potential to facilitate continued rehabilitation at home, but understanding what aspects of the design are most relevant for successful translation to the community presents a challenge. Here, we design a portable rigid ankle exoskeleton to use as a research platform for investigating the effect of assistance on post-stroke gait during overground, community-based walking. We first test our device with stroke survivors and validate its potential for future community use. We then present a systematic method for quantifying power transmission losses at each transmission stage from the battery to the wearer, using data gathered from walking trials with healthy participants. Our evaluation method revealed inefficiencies in power transfer at the interface level, likely resulting from the compliance in the structural components of the system, which motivates future redesign considerations. Overall, our method provides a framework to identify and characterize the components that must be redesigned to lower exoskeleton weight and maximize performance.
|
|
16:30-18:00, Paper TuCT14-AX.3 | Add to My Program |
Achieving Mechanical Transparency Using Fusion Hybrid Linear Actuator for Shoulder Flexion and Extension in Exoskeleton Robot |
|
Shimoyama, Takuma | Graduate School of Informatics and Engineering, the University O |
Noda, Tomoyuki | ATR Computational Neuroscience Laboratories |
Teramae, Tatsuya | ATR Computational Neuroscience Laboratories |
Nakata, Yoshihiro | The University of Electro-Communications |
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics
Abstract: Recently, the importance of mechanical transparency in human-assistive robots has grown. Traditionally, its primary goal was minimizing interaction forces during assistance. However, under this conventional definition, mechanical transparency was not considered when an interaction force was required during assistance. This research focuses on achieving mechanical transparency within the context of shoulder motion in upper extremity exoskeletons for rehabilitation. Our primary goal is maintaining interaction forces at target values, even with motion disturbances. To this end, we developed a shoulder actuation testbed for exoskeletons, incorporating a fusion hybrid linear actuator distinguished by high back-drivability, robust torque generation capability, and safety features. To attain mechanical transparency, we created a model for calculating the required joint torque, accounting for gravitational dynamics, and subsequently determined the necessary actuator output. The system characteristics were evaluated based on the joint torque generated by the actuator. The actuator utilized pneumatic pressure to generate force and compensated for kinetic friction using electromagnetic forces. The results showed that the compensation by the electromagnetic force reduced the root mean square error of the torque to less than 60% in relation to pneumatic pressure alone. This demonstrated the ability to generate consistent torque with high robustness to motion disturbances.
|
|
16:30-18:00, Paper TuCT14-AX.4 | Add to My Program |
Imitation Learning-Based System for the Execution of Self-Paced Robotic-Assisted Passive Rehabilitation Exercises |
|
Escarabajal, Rafael J. | Universidad Politécnica De Valencia |
Pulloquinga, José Luis | Universidad Politécnica De Valencia |
Zamora-Ortiz, Pau | Universitat Politècnica De València |
Valera, Angel | Universidad Politécnica De Valencia |
Mata, Vicente | Universidad Politécnica De Valencia |
Valles, Marina | Universitat Politècnica De València |
Keywords: Rehabilitation Robotics, Imitation Learning, Parallel Robots
Abstract: The development of robotic-assisted rehabilitation exercises involving physical human-robot interaction requires extreme care since an injured limb may be in physical contact with the robot, so compliant behavior is imperative for these tasks. Typical approaches involve force control schemes like admittance controllers that allow humans to adapt the motion. However, when the patient's limb has limited mobility or is potentially injured, unintentional forces may occur during the robot's trajectory that could be incompatible with these controllers. This paper addresses a new way of generating compliant trajectories for passive rehabilitation exercises, considering that previous positions of the trajectory are attainable for the patient, so reversing the trajectory is a safe operation. Since there is no clear way to optimize such a goal due to the physiological variability among patients, the condition of reversal is based on imitation learning by taking the analogous healthy limb of the patient as a reference and encoding the forces using Gaussian Mixture Regression, and reversibility is accomplished by means of Reversible Dynamic Movement Primitives. The system allows for self-paced rehabilitation exercises by back-and-forth movements along the trajectory according to the patient's reaction, and it has been successfully applied to a 4-DOF parallel robot for lower-limb rehabilitation.
|
|
16:30-18:00, Paper TuCT14-AX.6 | Add to My Program |
An Adaptable Ankle Trajectory Generation Method for Lower-Limb Exoskeletons by Means of Safety Constraints Computation and Minimum Jerk Planning |
|
Giannattasio, Raffaele | Italian Institute of Technology |
Maludrottu, Stefano | Italian Institute of Technology |
Zinni, Gaia | Istituto Italiano Di Tecnologia |
De Momi, Elena | Politecnico Di Milano |
Laffranchi, Matteo | Istituto Italiano Di Tecnologia |
De Michieli, Lorenzo | Istituto Italiano Di Tecnologia |
Keywords: Rehabilitation Robotics, Prosthetics and Exoskeletons, Medical Robots and Systems
Abstract: This paper presents a method to compute smooth ankle trajectories for lower limb exoskeletons with powered ankle joints. The proposed approach defines ankle trajectories using four polynomial functions, each representing one of the four primary phases of gait. These polynomials are computed according to different safety constraints. During the single support phase, ground contact constraints are enforced. In the swing phase, an optimization problem is solved to achieve minimum jerk planning while respecting a set of equality and inequality constraints designed to minimize the risk of stumbling. The used approach focuses on making the ankle joint able to smoothly adapt in real-time to different walking styles defined by user-selected gait parameters such as step length and clearance. The primary aim is to improve the user experience by producing a secure and comfortable walking pattern. To validate the effectiveness of the proposed method, the new ankle trajectories were tested on a group of healthy volunteers using the TWIN lower limb exoskeleton.
|
|
16:30-18:00, Paper TuCT14-AX.7 | Add to My Program |
Controlling FES of Arm Movements Using Physics-Informed Reinforcement Learning Via Co-Kriging Adjustment |
|
Wannawas, Nat | Imperial College London |
Diaz-Pintado, Clara | Imperial College London |
Narayan, Jyotindra | Imperial College / University of Bayreuth |
Faisal, Aldo | Imperial College London |
Keywords: Rehabilitation Robotics, Reinforcement Learning, Model Learning for Control
Abstract: Upper limb paralysis affects the quality of life. Functional Electrical Stimulation (FES) offers a solution to restore lost motor functions. Yet, there remain challenges in controlling FES to induce arbitrary arm movements. Reinforcement learning (RL) emerges as a promising method for controlling arm movement with success in simulation. However, challenges remain in translating the successes into real-world settings. One dominant challenge is the sample efficiency of RL. This study presents a practical RL setup to control FES for arm movements. We also present a flexible method, called co-kriging adjustment (CKA), which combines a biomechanical simulator and real data to build an accurate model of the real system. We demonstrate our RL-based control on a 2-DoF planar setting where the subject's arm, placed on a frictionless supporter, is stimulated to perform point-to-point reaching. By using 90 seconds of real interaction data, our RL-based control can perform the reaching with the average error over the workspace of 5.5 cm. Beyond the application of FES, our method can be extended to other control systems, propelling RL towards general uses in the real world.
|
|
16:30-18:00, Paper TuCT14-AX.8 | Add to My Program |
Adaptive Control for Triadic Human-Robot-FES Collaboration in Gait Rehabilitation: A Pilot Study |
|
Christou, Andreas | The University of Edinburgh |
del-Ama, Antonio J. | Rey Juan Carlos University |
Moreno, Juan C. | Cajal Institute, CSIC |
Vijayakumar, Sethu | University of Edinburgh |
Keywords: Rehabilitation Robotics, Wearable Robotics, Prosthetics and Exoskeletons
Abstract: The hybridisation of robot-assisted gait training and functional electrical stimulation (FES) can provide numerous physiological benefits to neurological patients. However, the design of an effective hybrid controller poses significant challenges. In this over-actuated system, it is extremely difficult to find the right balance between robotic assistance and FES that will provide personalised assistance, prevent muscle fatigue and encourage the patient’s active participation in order to accelerate recovery. In this paper, we present an adaptive hybrid robot-FES controller to do this and enable the triadic collaboration between the patient, the robot and FES. A patient-driven controller is designed where the voluntary movement of the patient is prioritised and assistance is provided using FES and the robot in a hierarchical order depending on the patient’s performance and their muscles’ fitness. The performance of this hybrid adaptive controller is tested in simulation and on one healthy subject. Our results indicate an increase in tracking performance with lower overall assistance, and less muscle fatigue when the hybrid adaptive controller is used, compared to its non adaptive equivalent. This suggests that our hybrid adaptive controller may be able to adapt to the behaviour of the user to provide assistance as needed and prevent the early termination of physical therapy due to muscle fatigue.
|
|
16:30-18:00, Paper TuCT14-AX.9 | Add to My Program |
Stretch with Stretch: Physical Therapy Exercise Games Led by a Mobile Manipulator |
|
Lamsey, Matthew | Georgia Institute of Technology |
Wells, Meredith | Emory University School of Medicine |
Tan, You Liang | Georgia Institute of Technology |
Beatty, Madeline | Georgia Institute of Technology |
Liu, Zexuan | University of Michigan |
Majumdar, Arjun | Georgia Institute of Technology |
Washington, Kendra | The Georgia Institute of Technology |
Feldman, Jerry | Parkinson's Foundation |
Kuppuswamy, Naveen | Toyota Research Institute |
Nguyen, Elizabeth | Long School of Medicine |
Wallenstein, Arielle | Emory University |
Kemp, Charles C. | Hello Robot Inc |
Hackney, Madeleine Eve | Emory University |
Keywords: Physical Human-Robot Interaction, Rehabilitation Robotics, Human-Centered Robotics
Abstract: Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use in homes and clinics. We present Stretch with Stretch (SWS), a novel robotic system for leading stretching exercise games for older adults with PD. SWS consists of a compact and lightweight mobile manipulator (Hello Robot Stretch RE1) that visually and verbally guides users through PT exercises. The robot's soft end effector serves as a target that users repetitively reach towards and press with a hand, foot, or knee. For each exercise, target locations are customized for the individual via a visually estimated kinematic model, a haptically estimated range of motion, and the person's exercise performance. The system includes sound effects and verbal feedback from the robot to keep users engaged throughout a session and augment physical exercise with cognitive exercise. We conducted a user study for which people with PD (n=10) performed 6 exercises with the system. Participants perceived the SWS to be useful and easy to use. They also reported mild to moderate perceived exertion (RPE).
|
|
TuCT15-AX Oral Session, AX-203 |
Add to My Program |
Intention Recognition |
|
|
Chair: Figueroa, Nadia | University of Pennsylvania |
Co-Chair: Detry, Renaud | KU Leuven |
|
16:30-18:00, Paper TuCT15-AX.1 | Add to My Program |
Subject-Independent Estimation of Continuous Movements Using CNN-LSTM for a Home-Based Upper Limb Rehabilitation System |
|
Li, He | Beijing Institute of Technology |
Guo, Shuxiang | Kagawa University |
Bu, Dongdong | Beijing Institute of Technology |
Wang, Hanze | Beijing Institute of Technology |
Kawanishi, Masahiko | Kagawa University |
Keywords: Intention Recognition, Machine Learning for Robot Control, Rehabilitation Robotics
Abstract: Exoskeleton-assisted home-based rehabilitation plays a vital role in upper limb rehabilitation of stroke patients in the early stage. The surface electromyography (sEMG)-based control can facilitate friendly interactions between individuals and rehabilitation exoskeletons. The exoskeleton can also meet the requirements of home-based rehabilitation, including affordability, portability, safety, and active participation. Although various systems have been proposed to enhance upper limb training, few studies addressed the inter-subject variability of sEMG signals, which limits the generalization capability of the intention estimation model. In this paper, a subject-independent continuous motion estimation method combining convolutional neural networks (CNN) and long and short term memory (LSTM) is proposed, which is applied to a home-based bilateral training system. The sEMG-driven CNN-LSTM model builds the relationship between sEMG signals and continuous movements. To verify the effectiveness of the CNN-LSTM model in achieving subject-independent estimation, the offline estimation under the backpropagation neural network, CNN, and CNN-LSTM are compared. Moreover, the online intention estimation and the real-time control are performed, and the estimation angle error and time delay are controlled at approximately 10°and 300 ms, proving the feasibility of the subject-independent estimation method and its availability in the upper-limb rehabilitation system.
|
|
16:30-18:00, Paper TuCT15-AX.2 | Add to My Program |
Robot Trajectron: Trajectory Prediction-Based Shared Control for Robot Manipulation |
|
Song, Pinhao | KU Leuven |
Li, Pengteng | Shenzhen University |
Aertbelien, Erwin | KU Leuven |
Detry, Renaud | KU Leuven |
Keywords: Intention Recognition, Probabilistic Inference, Deep Learning Methods
Abstract: We address the problem of (a) predicting the trajectory of an arm reaching motion, based on a few seconds of the motion's onset, and (b) leveraging this predictor to facilitate shared-control manipulation tasks, easing the cognitive load of the operator by assisting them in their anticipated direction of motion. Our novel intent estimator, dubbed the Robot Trajectron (RT), produces a probabilistic representation of the robot's anticipated trajectory based on its recent position, velocity and acceleration history. Taking arm dynamics into account allows RT to capture the operator's intent better than other SOTA models that only use the arm's position, making it particularly well-suited to assist in tasks where the operator's intent is susceptible to change. We derive a novel shared-control solution that combines RT's predictive capacity to a representation of the locations of potential reaching targets. Our experiments demonstrate RT's effectiveness in both intent estimation and shared-control tasks. We will make the code and data supporting our experiments publicly available at https://gitlab.kuleuven.be/detry-lab/public/robot-trajectron.
|
|
16:30-18:00, Paper TuCT15-AX.3 | Add to My Program |
On the Feasibility of EEG-Based Motor Intention Detection for Real-Time Robot Assistive Control |
|
Choi, Ho Jin | University of Pennsylvania |
Das, Satyajeet | University of Pennsylvania |
Peng, Shaoting | University of Pennsylvania |
Bajcsy, Ruzena | Univ of California, Berkeley |
Figueroa, Nadia | University of Pennsylvania |
Keywords: Brain-Machine Interfaces, Intention Recognition, Physically Assistive Devices
Abstract: This paper explores the feasibility of employing EEG-based intention detection for real-time robot assistive control. We focus on predicting and distinguishing motor intentions of left/right arm movements by presenting: i) an offline data collection and training pipeline, used to train a classifier for left/right motion intention prediction, and ii) an online realtime prediction pipeline leveraging the trained classifier and integrated with an assistive robot. Central to our approach is a rich feature representation composed of the tangent space projection of time-windowed sample covariance matrices from EEG filtered signals and derivatives; allowing for a simple SVM classifier to achieve unprecedented accuracy and realtime performance. In pre-recorded real-time settings (160 Hz), a peak accuracy of 86.88% is achieved, surpassing prior works. In robot-in-the-loop settings, our system successfully detects intended motion solely from EEG data with 70% accuracy, triggering a robot to execute an assistive task. We provide a comprehensive evaluation of the proposed classifier.
|
|
16:30-18:00, Paper TuCT15-AX.4 | Add to My Program |
Microexpression to Macroexpression: Facial Expression Magnification by Single Input |
|
Song, Yaqi | Southwest University |
Chen, Tong | Southwest University |
Li, Shigang | Hiroshima City University |
Li, Jianfeng | Southwest University |
Keywords: Gesture, Posture and Facial Expressions, Intention Recognition
Abstract: Microexpressions are expressions that people inadvertently express, and therefore often represent a person's true emotion. However, because it has a low intensity and a short duration, it is hard to be recognized correctly. In this paper, we propose a deep learning magnification method to generate macroexpressions from a single microexpression image. In the first stage, we extract the expression information from a single microexpression image. Then, We combine the idea of cyclegan and optical flow consistency to model the extracted expression features as the optical flow field between the neutral face and microexpressions. To extract a reliable optical flow field from the expression information, we design an optical flow refiner. In the second stage, we adopt an encoder-decoder network and let it learn to magnify the optical flow. Finally, the magnified optical flow guided the microexpression images to generate macroexpression images. We compare our single input based network with current two-frames-input based networks. The results show that our method performs better, even in wild images. We fed our magnified images directly into a simple ResNet18 network for recognition, achieving a competitive score under the MEGC2019 standard, compared with recent complex recognition networks.
|
|
16:30-18:00, Paper TuCT15-AX.5 | Add to My Program |
Looking Inside Out: Anticipating Driver Intent from Videos |
|
Kung, Yung-Chi | The University of Texas at Austin |
Zhang, Arthur | University of Texas at Austin |
Wang, Junmin | University of Texas at Austin |
Biswas, Joydeep | University of Texas at Austin |
Keywords: Intention Recognition, Intelligent Transportation Systems, Human-Robot Collaboration
Abstract: Anticipating driver intention is an important task when vehicles of mixed and varying levels of human/machine autonomy share roadways. Driver intention can be leveraged to improve road safety, such as warning surrounding vehicles in the event the driver is attempting a dangerous maneuver. In this work, we propose a novel method of utilizing both in-cabin and external camera data to improve state-of-the-art performance in predicting future driver actions. Compared to existing methods, our approach explicitly extracts object and road-level features from external camera data, which we demonstrate are important features for predicting driver intention. Using our handcrafted features as inputs for both a transformer and a long-short-term-memory-based architecture, we empirically show that jointly utilizing in-cabin and external features improves performance compared to using in-cabin features alone. Furthermore, our models predict driver maneuvers more accurately and sooner than existing approaches, with an accuracy of 87.5% and an average prediction time of 4.35 seconds before the maneuver takes place. We release our model configurations and training scripts on https://github.com/ykung83/Driver-Intent-Prediction.
|
|
16:30-18:00, Paper TuCT15-AX.6 | Add to My Program |
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots |
|
Rivkin, Dmitriy | None |
Kakodkar, Nikhil Rajiv | McGill University |
Hogan, Francois | Massachusetts Institute of Technology |
Hamed Baghi, Bobak | Unaffiliated |
Dudek, Gregory | McGill University |
Keywords: Intention Recognition, AI-Enabled Robotics, Semantic Scene Understanding
Abstract: This work explores the capacity of large language models (LLMs) to address problems at the intersection of spatial planning and natural language interfaces for navigation. We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics. Unlike most prior work where navigation directives are provided as simple imperative commands (e.g., "go to the fridge"), we examine implicit directives obtained through conversational interactions.We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types. We demonstrate that a robot using our method CARTIER (Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots) can parse descriptive language queries up to 42% more reliably than existing LLM-enabled methods by exploiting the ability of LLMs to interpret the user interaction in the context of the objects in the scenario.
|
|
16:30-18:00, Paper TuCT15-AX.7 | Add to My Program |
A Novel Hybrid Unsupervised Domain Adaptation Method for Cross-Subject Joint Angle Estimation from Surface Electromyography |
|
Wang, Long | Xi'an Jiaotong University |
Li, Xiaoling | Xi'an Jiaotong University |
Chen, Zhangyi | Xi'an Jiaotong University |
Sun, Zhipeng | Xi'an Jiaotong University |
Xue, Jingyi | Xi'an Jiaotong University |
Zhang, Shiwen | Xi’an Jiaotong University |
Sun, Wei | Xi'an Jiaotong University |
Chen, Guimin | Xi'an Jiaotong University |
Sun, Jiajia | Xi'an Jiaotong University |
Keywords: Intention Recognition, Human-Robot Collaboration, Physical Human-Robot Interaction
Abstract: Individual physiological differences constrain the cross-user application of joint angle estimation models based on surface electromyography (sEMG) signals. Current cross-user methods for myoelectric joint angle estimation often involve the use of angle or optical sensors, which increases the training burden and costs for new users. To enable new users to perform joint angle estimation solely using sEMG sensors, this study proposes a hybrid unsupervised domain adaptation network that combines multi-order metric and adversarial mechanisms (MADAN). MADAN aims to minimize the distribution differences of sEMG signals between the source and target subjects by first aligning the domain distributions and then increasing domain confusion. The effectiveness of MADAN is validated through cross-subject wrist joint angle estimation involving 10 subjects, with an estimation frequency of 20Hz. The results demonstrate that MADAN achieves a significantly higher average coefficient of determination (0.8688 ± 0.0307) compared to other advanced UDA cross-subject estimation methods, such as TCDA (0.6534 ± 0.234) and ADANN (0.6655 ± 0.2255). Notably, MADAN requires only 20 seconds of unlabeled samples from the target subject for training. This work is expected to alleviate the cost and training burden for new users in performing myoelectric continuous motion estimation.
|
|
16:30-18:00, Paper TuCT15-AX.8 | Add to My Program |
A Novel Benchmarking Paradigm and a Scale and Motion-Aware Model for Egocentric Pedestrian Trajectory Prediction |
|
Rasouli, Amir | Huawei Technologies Canada |
Keywords: Intention Recognition, Performance Evaluation and Benchmarking, Intelligent Transportation Systems
Abstract: In this paper, we present a new paradigm for evaluating egocentric pedestrian trajectory prediction algorithms. Based on various contextual information, we extract driving scenarios for a meaningful and systematic approach to identifying challenges for prediction models. In this regard, we also propose a new metric for more effective ranking within the scenario-based evaluation. We conduct extensive empirical studies of existing models on these scenarios to expose shortcomings and strengths of different approaches. The scenario-based analysis highlights the importance of using multimodal sources of information and challenges caused by inadequate modeling of ego-motion and scale of pedestrians. To this end, we propose a novel egocentric trajectory prediction model that benefits from multimodal sources of data fused in an effective and efficient step-wise hierarchical fashion and two auxiliary tasks designed to learn more robust representation of scene dynamics. We conduct empirical evaluation on common benchmark datasets and show that our model not only achieves state-of-the-art performance, but also significantly improves performance by up to 39% in challenging scenarios, such as high ego-speed, compared to the past arts.
|
|
16:30-18:00, Paper TuCT15-AX.9 | Add to My Program |
A 3D Vector Field and Gaze Data Fusion Framework for Hand Motion Intention Prediction in Human-Robot Collaboration |
|
Jayasuriya, Maleen | University of Technology Sydney |
Hu, Gibson | University of Technology, Sydney |
Le, Dinh Dang Khoa | The University of Technology Sydney |
Ang, Karyne | UTS Robotics Institute |
Sankaran, Shankar | UTS Robotics Institute |
Liu, Dikai | University of Technology, Sydney |
Keywords: Intention Recognition, Human-Robot Collaboration, Multi-Modal Perception for HRI
Abstract: In human-robot collaboration (HRC) settings, hand motion intention prediction (HMIP) plays a pivotal role in ensuring prompt decision-making, safety, and an intuitive collaboration experience. Precise and robust HMIP with low computational resources remains a challenge due to the stochastic nature of hand motion and the diversity of HRC tasks. This paper proposes a framework that combines hand trajectories and gaze data to foster robust, real-time HMIP with minimal to no training. A novel 3D vector field method is introduced for hand trajectory representation, leveraging minimum jerk trajectory predictions to discern potential hand motion endpoints. This is statistically combined with gaze fixation data using a weighted Naive Bayes Classifier (NBC). Acknowledging the potential variances in saccadic eye motion due to factors like fatigue or inattentiveness, we incorporate stationary gaze entropy to gauge visual concentration, thereby adjusting the contribution of gaze fixation to the HMIP. Empirical experiments substantiate that the proposed framework robustly predicts intended endpoints of hand motion before at least 50% of the trajectory is completed. It also successfully exploits gaze fixations when the human operator is attentive and mitigates its influence when the operator loses focus. A real-time implementation in a construction HRC scenario (collaborative tiling) showcases the intuitive nature and potential efficiency gains to be leveraged by introducing the proposed HMIP into HRC contexts. The opensource implementation of the framework is made available at https://github.com/maleenj/hmip_ros.git.
|
|
TuCT16-AX Oral Session, AX-204 |
Add to My Program |
Force and Tactile Sensing III |
|
|
Chair: Chen, Haoyao | Harbin Institute of Technology, Shenzhen |
Co-Chair: Liu, Jialun | The University of Sheffield |
|
16:30-18:00, Paper TuCT16-AX.1 | Add to My Program |
Contrastive Learning-Based Attribute Extraction Method for Enhanced Terrain Classification |
|
Liu, Xiao | Harbin Institute of Technology, Shenzhen |
Chen, Hongjin | Harbin Institute of Technology Shenzhen |
Chen, Haoyao | Harbin Institute of Technology, Shenzhen |
Keywords: Force and Tactile Sensing, Field Robots, Representation Learning
Abstract: The outdoor environment has many uneven surfaces that put the robot at risk of sinking or tipping over. Recognizing the type of terrain can help robot avoid risks and choose an appropriate gait. One of the critical problems is how to extract the terrain-related knowledge from sensor data collected as the robot traversed the ground. Many existing vision-based approaches are limited in directly perceiving the intrinsic properties of various terrains. The intuitive approach entails directly analyzing data recorded by the robot's proprioceptive sensors. However, it faces challenges in being specific to certain robot leg configurations or in the lack of interpretability of the extracted features. In this paper, a terrain attribute extraction algorithm is proposed based on contrastive learning. It leverages the haptic data generated from the interaction between the robot's legs and terrain to automatically extract terrain attributes. The results demonstrate that the attributes extracted using this method strongly correlate with the actual softness of the terrain. Furthermore, these attributes played an important role in achieving high accuracy in terrain classification tasks.
|
|
16:30-18:00, Paper TuCT16-AX.2 | Add to My Program |
Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing |
|
Liu, Yun | Tsinghua University |
Xu, Xiaomeng | Stanford University |
Chen, Weihang | Tsinghua University |
Yuan, Haocheng | Northwestern Polytechnical University |
Wang, He | Peking University |
Xu, Jing | Tsinghua University |
Chen, Rui | Tsinghua University |
Yi, Li | Tsinghua University |
Keywords: Force and Tactile Sensing, Sensor Fusion, Visual Tracking
Abstract: When manipulating an object to accomplish complex tasks, humans rely on both vision and touch to keep track of the object's 6D pose. However, most existing object pose tracking systems in robotics rely exclusively on visual signals, which hinder a robot's ability to manipulate objects effectively. To address this limitation, we introduce TEG-Track, a tactile-enhanced 6D pose tracking system that can track previously unseen objects held in hand. From consecutive tactile signals, TEG-Track optimizes object velocities from marker flows when slippage does not occur, or regresses velocities using a slippage estimation network when slippage is detected. The estimated object velocities are integrated into a geometric-kinematic optimization scheme to enhance existing visual pose trackers. To evaluate our method and to facilitate future research, we construct a real-world dataset for visual-tactile in-hand object pose tracking. Experimental results demonstrate that TEG-Track consistently enhances state-of-the-art generalizable 6D pose trackers in synthetic and real-world scenarios. Our code and dataset are available at https://github.com/leolyliu/TEG-Track.
|
|
16:30-18:00, Paper TuCT16-AX.3 | Add to My Program |
UnfoldIR: Tactile Robotic Unfolding of Cloth |
|
Proesmans, Remko | Ghent University |
Verleysen, Andreas | Ghent University |
Wyffels, Francis | Ghent University |
Keywords: Force and Tactile Sensing, Sensor-based Control, Dual Arm Manipulation
Abstract: Robotic unfolding of cloth is challenging due to the wide range of textile materials and their ability to deform in unpredictable ways. Previous work has focused almost exclusively on visual feedback to solve this task. We present UnfoldIR ("unfolder"), a dual-arm robotic system relying on infrared (IR) tactile sensing and cloth manipulation heuristics to achieve in-air unfolding of randomly crumpled rectangular textiles by means of edge tracing. The system achieves >85% coverage on multiple textiles of different sizes and textures. After unfolding, at least three corners are visible in 83.3 up to 94.7% of cases. Given these strong "tactile-only" results, we argue that the fusion of both tactile and visual sensing can bring cloth unfolding to a new level of performance.
|
|
16:30-18:00, Paper TuCT16-AX.4 | Add to My Program |
3D Force and Contact Estimation for a Soft-Bubble Visuotactile Sensor Using FEM |
|
Peng, Jing-Chen | University of Illinois @ Urbana-Champaign |
Yao, Shaoxiong | University of Illinois Urbana-Champaign |
Hauser, Kris | University of Illinois at Urbana-Champaign |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators
Abstract: Soft-bubble tactile sensors have the potential to capture dense contact and force information across a large contact surface. However, it is difficult to extract contact forces directly from observing the bubble surface because local contacts change the global surface shape significantly due to membrane mechanics and air pressure. This paper presents a model-based method of reconstructing dense contact forces from the bubble sensor's internal RGBD camera and air pressure sensor. We present a finite element model of the force response of the bubble sensor that uses a linear plane stress approximation that only requires calibrating 3 variables. Our method is shown to reconstruct normal and shear forces significantly more accurately than the state-of-the-art, with comparable accuracy for detecting the contact patch, and with very little calibration data.
|
|
16:30-18:00, Paper TuCT16-AX.5 | Add to My Program |
A Detachable FBG-Based Contact Force Sensor for Capturing Gripper-Vegetable Interactions |
|
Lai, Wenjie | Nanyang Technological University |
Liu, Jiajun | Nanyang Technological University |
Sim, Bing Rui | Nanyang Technological University |
Tan, Ming Rui Joel | Nanyang Technological University |
Hegde, Chidanand | Nanyang Technological University, Singapore |
Magdassi, Shlomo | Hebrew University of Jerusalem |
Phee, Louis | Nanyang Technological University |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators, Agricultural Automation
Abstract: Vertical farming, a sustainable key for urban agriculture, has garnered attention for its land use optimization and enhanced food production capabilities. The adoption of automation in vertical farming is a pivotal response to labor shortages, addressing the need for increased efficiency, particularly in labor-intensive tasks like harvesting. Although soft robotic grippers offer a significant promise for delicately handling fragile objects, the absence of sensors has hindered their full potential to execute precise and secure grasping. To address this challenge, we present a new solution: a detachable Fiber Bragg Grating-based flexible contact force sensor to capture gripper-vegetable interactions. The sensing module was 3D printed using soft material, and the FBG fiber was attached to the module using epoxy. From evaluation tests, this lightweight sensor demonstrated a wide measurement range of up to 9.87 N, with a high sensitivity of 141.7 pm/N, good repeatability, and a hysteresis of 7.96%. Compared to commercial load cells, our sensor achieves a small measurement RMSE of 0.41 N and a percentage error of 4.15%. The sensor was integrated into two robotic 3D-printed soft grippers to enable real-time monitoring of dynamic contact force during vegetable harvesting in vertical farming scenarios. By reflecting contact status, this sensor provides a promising glimpse into the future of agricultural automation, enhancing operational efficiency and strengthening situation awareness and decision-making capabilities in vertical farms. Beyond agriculture, the versatility of this sensor extends to application in areas such as warehousing, logistics, and the food and beverage industry.
|
|
16:30-18:00, Paper TuCT16-AX.6 | Add to My Program |
SATac: A Thermoluminescence Enabled Tactile Sensor for Concurrent Perception of Temperature, Pressure, and Shear |
|
Song, Ziwu | Tsinghua University |
Yu, Ran | Tsinghua University |
Zhang, Xuan | Tsinghua University |
Sou, Kit Wa | Tsinghua University |
Mu, Shilong | Tsinghua University |
Peng, Dengfeng | Shenzhen University |
Zhang, Xiao-Ping | Ryerson University |
Ding, Wenbo | Tsinghua University |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators, Multi-Modal Perception for HRI
Abstract: Most vision-based tactile sensors use elastomer deformation to infer tactile information, which can not sense some modalities, like temperature. As an important part of human tactile perception, temperature sensing can help robots better interact with the environment. In this work, we propose a novel multi-modal vision-based tactile sensor, SATac, which can simultaneously perceive information on temperature, pressure, and shear. SATac utilizes the thermoluminescence of strontium aluminate to sense a wide range of temperatures with exceptional resolution. Additionally, the pressure and shear can also be perceived by analyzing the Voronoi diagram. A series of experiments are conducted to verify the performance of our proposed sensor. We also discuss the possible application scenarios and demonstrate how SATac could benefit robot perception capabilities.
|
|
16:30-18:00, Paper TuCT16-AX.7 | Add to My Program |
Optimizing Multi-Touch Textile and Tactile Skin Sensing through Circuit Parameter Estimation |
|
Su, Bo Ying | Carnegie Mellon University |
Wu, Yuchen | Carnegie Mellon University |
Wen, Chengtao | Siemens |
Liu, Changliu | Carnegie Mellon University |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators, Touch in HRI
Abstract: Tactile and textile skin technologies have become increasingly important for enhancing human-robot interaction and allowing robots to adapt to different environments. Despite notable advancements, there are ongoing challenges in skin signal processing, particularly in achieving both accuracy and speed in dynamic touch sensing. This paper introduces a new framework that poses the touch sensing problem as an estimation problem of resistive sensory arrays. Utilizing a Regularized Least Squares objective function—which estimates the resistance distribution of the skin—we enhance the touch sensing accuracy and mitigate the ghosting effects, where false or misleading touches may be registered. Furthermore, our study presents a streamlined skin design that simplifies manufacturing processes without sacrificing performance. Experimental outcomes substantiate the effectiveness of our method, showing 26.9% improvement in multi-touch force-sensing accuracy for the tactile skin.
|
|
16:30-18:00, Paper TuCT16-AX.8 | Add to My Program |
CushSense: Soft, Stretchable, and Comfortable Tactile Sensing Skin for Physical Human-Robot Interaction |
|
Xu, Boxin | Cornell University |
Zhong, Luoyan | Cornell University |
Zhang, Grace | Cornell University |
Liang, Xiaoyu | Cornell University |
Virtue, Diego | Cornell University |
Madan, Rishabh | Cornell University |
Bhattacharjee, Tapomayukh | Cornell University |
Keywords: Force and Tactile Sensing, Touch in HRI, Physical Human-Robot Interaction
Abstract: Whole-arm tactile feedback is crucial for robots to ensure safe physical interaction with their surroundings. In this paper, we introduce CushSense, a fabric-based soft and stretchable tactile-sensing skin designed for physical human-robot interaction (pHRI) tasks such as robotic caregiving. Utilizing a combination of stretchable fabric and hyper-elastic polymer, CushSense identifies contacts by monitoring capacitive changes due to skin deformation. CushSense is cost-effective and easy to fabricate. We detail the sensor design and fabrication process, provide characterization results showcasing its sensing proficiency, and present a user study underscoring its perceived safety and comfort for the assistive task of limb manipulation. We open source all sensor-related resources on our project website.
|
|
16:30-18:00, Paper TuCT16-AX.9 | Add to My Program |
Augmenting Tactile Simulators with Real-Like and Zero-Shot Capabilities |
|
Azulay, Osher | Tel Aviv University |
Mizrahi, Alon | Tel-Aviv University |
Curtis, Nimrod | Tel-Aviv University |
Sintov, Avishai | Tel-Aviv University |
Keywords: Force and Tactile Sensing, Transfer Learning, Perception for Grasping and Manipulation
Abstract: Simulating tactile perception could potentially leverage the learning capabilities of robotic systems in manipulation tasks. However, the reality gap of simulators for high-resolution tactile sensors remains large. Models trained on simulated data often fail in zero-shot inference and require fine-tuning with real data. In addition, work on high-resolution sensors commonly focus on ones with flat surfaces while 3D round sensors are essential for dexterous manipulation. In this paper, we propose a bi-directional Generative Adversarial Network (GAN) termed SightGAN. SightGAN relies on the early CycleGAN while including two additional loss components aimed to accurately reconstruct background and contact patterns including small contact traces. The proposed SightGAN learns real-to-sim and sim-to-real processes over difference images. It is shown to generate real-like synthetic images while maintaining accurate contact positioning. The generated images can be used to train zero-shot models for newly fabricated sensors. Consequently, the resulted sim-to-real generator could be built on top of the tactile simulator to provide a real-world framework. Potentially, the framework can be used to train, for instance, reinforcement learning policies of manipulation tasks. The proposed model is verified in extensive experiments with test data collected from real sensors and also shown to maintain embedded force information within the tactile images.
|
|
TuCT17-AX Oral Session, AX-205 |
Add to My Program |
Legged Robots III |
|
|
Chair: Yan, Cong | Ritsumeikan University |
Co-Chair: Lin, Pei-Chun | National Taiwan University |
|
16:30-18:00, Paper TuCT17-AX.1 | Add to My Program |
Investigating Stability Outcomes across Diverse Gait Patterns in Quadruped Robots: A Comparative Analysis |
|
Ju, Zhongjin | Yanshan University |
Wei, Ke | YanShanUniversity |
Jin, Lei | Yanshan University |
Xu, Yundou | Parallel Robot and Mechatronic System Laboratory of Hebei Provin |
Keywords: Legged Robots, Methods and Tools for Robot System Design, Motion Control
Abstract: Quadruped robots have gained attention for their potential to navigate various terrains. However, the stability of these robots in different gait sequences remains an open question. This study investigates the relationship between different gait sequences and the motion stability of quadruped robots, assuming a flat terrain for the purpose of the analysis. Utilizing mathematical models based on spiral theory, we examine the stability margins associated with different leg movement sequences. Notably, our findings confirm that the most commonly observed sequence in both natural and robotic contexts indeed offers optimal stability. The study also scrutinizes the influence of the robot's structural parameters and gait configuration on its motion stability. These results provide a theoretical foundation for the design and stability control of quadruped robots, setting the stage for future work on more complex terrains.
|
|
16:30-18:00, Paper TuCT17-AX.2 | Add to My Program |
Pedipulate: Enabling Manipulation Skills Using a Quadruped Robot's Leg |
|
Arm, Philip | ETH Zurich |
Mittal, Mayank | ETH Zurich |
Kolvenbach, Hendrik | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Legged Robots, Mobile Manipulation, Reinforcement Learning
Abstract: Legged robots have the potential to become vital in maintenance, home support, and exploration scenarios. In order to interact with and manipulate their environments, most legged robots are equipped with a dedicated robot arm, which means additional mass and mechanical complexity compared to standard legged robots. In this work, we explore pedipulation - using the legs of a legged robot for manipulation. By training a reinforcement learning policy that tracks position targets for one foot, we enable a dedicated pedipulation controller that is robust to disturbances, has a large workspace through whole-body behaviors, and can reach far-away targets with gait emergence, enabling loco-pedipulation. By deploying our controller on a quadrupedal robot using teleoperation, we demonstrate various real-world tasks such as door opening, sample collection, and pushing obstacles. We demonstrate load carrying of more than 2.0 kg at the foot. Additionally, the controller is robust to interaction forces at the foot, disturbances at the base, and slippery contact surfaces. Videos of the experiments are available at https://sites.google.com/leggedrobotics.com/pedipulate.
|
|
16:30-18:00, Paper TuCT17-AX.3 | Add to My Program |
An Efficient Model Based Approach on Learning Agile Motor Skills without Reinforcement |
|
Shi, Haojie | Chinese University of Hong Kong |
Li, Tingguang | The Chinese University of Hong Kong |
Zhu, Qingxu | Tencent |
Sheng, Jiapeng | Shandong University |
Han, Lei | Tencent Robotics X |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Legged Robots, Model Learning for Control, Imitation Learning
Abstract: Learning-based methods have improved locomotion skills of quadruped robots through deep reinforcement learning. But the sim-to-real gap and low sample efficiency still limits the skill transfer. To address this issue, we propose a model-based supervised learning framework that combines a world model with a policy network. We train a differentiable world model to predict future states and use it to train a Variational Autoencoder (VAE)-based policy network through supervised learning to imitate the natural behavior of real animals. This approach significantly diminishes the requirement for substantial amounts of real interaction data since it solely focuses on training the world model, concurrently allowing for rapid policy updates through supervised learning. We also develop a high-level network to track diverse commands and trajectories. We initially train the policy within a simulation environment and subsequently fine-tune it using a physical robot. Simulated results show a tenfold sample efficiency increase compared to reinforcement learning methods such as PPO. Transitioning to real-world testing, our policy achieves proficient command-following performance with only a two-minute data collection period, and generalizes well to new speeds and paths.
|
|
16:30-18:00, Paper TuCT17-AX.4 | Add to My Program |
Adaptive Model Predictive Control with Data-Driven Error Model for Quadrupedal Locomotion |
|
Zeng, Xuanqi | Chinese University of Hong Kong |
Zhang, Hongbo | The Chinese University of Hong Kong |
Yue, Linzhu | The Chinese University of Hong Kong |
Song, Zhitao | The Chinese University of Hong Kong |
Zhang, Lingwei | Hong Kong Centre for Logistics Robotics |
Liu, Yunhui | Chinese University of Hong Kong |
Keywords: Legged Robots, Motion Control, Dynamics
Abstract: Model Predictive Control (MPC) relies heavily on the robot model for its control law. However, a gap always exists between the reduced-order control model with uncertainties and the real robot, which degrades its performance. To address this issue, we propose the controller of integrating a data-driven error model into traditional MPC for quadruped robots. Our approach leverages real-world data from sensors to compensate for defects in the control model. Specifically, we employ the Autoregressive Moving Average Vector (ARMAV) model to construct the state error model of the quadruped robot using data. The predicted state errors are then used to adjust the predicted future robot states generated by MPC. By such an approach, our proposed controller can provide more accurate inputs to the system, enabling it to achieve desired states even in the presence of inaccuracies in the model parameters or disturbances. The proposed controller exhibits the capability to partially eliminate the disparity between the model and the real-world robot, thereby enhancing the locomotion performance of quadruped robots. We validate our proposed method through simulations and real-world experimental trials on a large-size quadruped robot that involves carrying a 20 kg un-modeled payload (84% of body weight).
|
|
16:30-18:00, Paper TuCT17-AX.5 | Add to My Program |
Layered Control for Cooperative Locomotion of Two Quadrupedal Robots: Centralized and Distributed Approaches |
|
Kim, Jeeseop | Caltech |
Fawcett, Randall | Virginia Polytechnic Institute and State University |
Kamidi, Vinay | Virginia Tech |
Ames, Aaron | Caltech |
Akbari Hamed, Kaveh | Virginia Tech |
Keywords: Legged Robots, Motion Control, Optimization and Optimal Control, Multi-Contact Whole-Body Motion Planning and Control
Abstract: This paper presents a layered control approach for real-time trajectory planning and control of robust cooperative locomotion by two holonomically constrained quadrupedal robots. A novel interconnected network of reduced-order models, based on the single rigid body (SRB) dynamics, is developed for trajectory planning purposes. At the higher level of the control architecture, two different model predictive control (MPC) algorithms are proposed to address the optimal control problem of the interconnected SRB dynamics: centralized and distributed MPCs. The distributed MPC assumes two local quadratic programs that share their optimal solutions according to a one-step communication delay and an agreement protocol. At the lower level of the control scheme, distributed nonlinear controllers are developed to impose the full-order dynamics to track the prescribed reduced-order trajectories generated by MPCs. The effectiveness of the control approach is verified with extensive numerical simulations and experiments for the robust and cooperative locomotion of two holonomically constrained A1 robots with different payloads on variable terrains and in the presence of disturbances. It is shown that the distributed MPC has a performance similar to that of the centralized MPC, while the computation time is reduced significantly.
|
|
16:30-18:00, Paper TuCT17-AX.6 | Add to My Program |
RL + Model-Based Control: Using On-Demand Optimal Control to Learn Versatile Legged Locomotion |
|
Kang, Dongho | ETH Zurich |
Cheng, Jin | ETH Zurich |
Zamora Mora, Miguel Angel | ETH Zurich |
Zargarbashi, Fatemeh | ETH Zurich |
Coros, Stelian | ETH Zurich |
Keywords: Legged Robots, Motion Control, Reinforcement Learning
Abstract: This letter presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL’s flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters.
|
|
16:30-18:00, Paper TuCT17-AX.7 | Add to My Program |
Generation of Steady Wheel Gait for Planar X-Shaped Walker with Reaction Wheel |
|
Asano, Fumihiko | Japan Advanced Institute of Science and Technology |
Sedoguchi, Taiki | Japan Advanced Institute of Science and Technology |
Yan, Cong | Ritsumeikan University |
Keywords: Legged Robots, Motion Control, Underactuated Robots
Abstract: This paper addresses the problem of realizing a novel robotic bipedal locomotion called wheel gait, which is achieved by rotating the stance and swing legs in the same direction. First, a model of a planar 3-DOF X-shaped walker with a reaction wheel is introduced, and the mathematical equations are described. Second, the condition for stabilizing zero dynamics is formulated as the time integral value of control input to the reaction wheel for one step becomes zero, and the control system for achieving this is designed based on the method of continuous-time output deadbeat control. Third, a typical steady wheel gait of the linearized model is numerically generated, and its extension to the nonlinear model is discussed. Although the nonlinear model has only one nonlinear term in the gravity term, numerical simulations show that there is a big gap between this and the linearized model. Through analysis of the typical nonlinear wheel gaits, the difficulty of achieving the same walking speed as the linearized model is discussed.
|
|
16:30-18:00, Paper TuCT17-AX.8 | Add to My Program |
Trajectory Optimization Strategy That Considers Body Tip-Over Stability, Limb Dynamics, and Motion Continuity in Legged Robots |
|
Lu, Kuan-Lun | National Taiwan University |
Chang, I-Chia | Purdue University |
Yu, Wei-Shun | National Taiwan University |
Lin, Pei-Chun | National Taiwan University |
Keywords: Legged Robots, Multi-Contact Whole-Body Motion Planning and Control, Optimization and Optimal Control
Abstract: We propose a limb trajectory planning method that considers both body and limb dynamics in robots, particularly suitable for those with non-trivial limb mass. To simplify the complexity and computation cost of using the full-body dynamics of the limbs, a reduced-order model that can simulate the dynamic characteristics of the original limb is proposed. The performance of the model is experimentally validated using an exemplary single leg-wheel of the leg-wheel transformable robot. The limb trajectory optimization is developed using a genetic algorithm that considers many aspects, including body and limb dynamics, limb workspace, limb motion continuity, body tip-over stability, and power consumption. The performance of the proposed limb trajectory planning strategy is experimentally validated using the same leg-wheel transformable robot, and the results confirm the effectiveness of the strategy.
|
|
TuCT18-AX Oral Session, AX-206 |
Add to My Program |
Motion Control III |
|
|
Chair: Matsubara, Takamitsu | Nara Institute of Science and Technology |
Co-Chair: Ott, Christian | TU Wien |
|
16:30-18:00, Paper TuCT18-AX.1 | Add to My Program |
Hybrid Force-Impedance Control for Fast End-Effector Motions |
|
Iskandar, Maged | German Aerospace Center - DLR |
Ott, Christian | TU Wien |
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Siciliano, Bruno | Univ. Napoli Federico II |
Dietrich, Alexander | German Aerospace Center (DLR) |
Keywords: Compliance and Impedance Control, Force Control, Motion Control
Abstract: Controlling the contact force on various surfaces is essential in many robotic applications such as in service tasks or industrial use cases. Mostly, classical impedance and hybrid motion-force control approaches are employed for these kinds of physical interaction scenarios. In this work, an extended Cartesian impedance control algorithm is developed, which includes geometrical constraints and enables explicit force tracking in a hybrid manner. The unified framework features compliant behavior in the free (motion) task directions and explicit force tracking in the constrained directions. Advantageously, the involved force subspace in contact direction is fully dynamically decoupled from dynamics in the motion subspace. The experimental validation with a torque-controlled robotic manipulator on both flat and curved surfaces demonstrates the performance during highly dynamic desired trajectories and confirms the theoretical claims of the approach.
|
|
16:30-18:00, Paper TuCT18-AX.2 | Add to My Program |
A Reinforcement Learning-Based Control Strategy for Robust Interaction of Robotic Systems with Uncertain Environments |
|
Sacerdoti, Diletta | University of Modena and Reggio-Emilia |
Benzi, Federico | University of Modena and Reggio Emilia |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Keywords: Compliance and Impedance Control, Force Control, Reinforcement Learning
Abstract: In the context of interaction with unmodelled systems, it becomes imperative for a robot controller to possess the capability to dynamically adjust its actions in real-time, enhancing its resilience in the face of fluctuating environmental conditions. This adaptation process must be performed in a stability-preserving fashion, and resourcefully exploit the knowledge acquired during the interaction process. In this article, we propose a novel control strategy, based on the synergistic usage of state-of-the-art passivity-based control and Deep Reinforcement Learning (DRL). The concept of energy tank is used to provide stability guarantees for the interaction controller with uncertain environments, while an online learning policy allows to properly estimate the requirements of the task and adapt the controller accordingly, thus simultaneously achieving stability and performance. The proposed architecture is successfully validated through simulations and experiments with a collaborative manipulator in a surface polishing task.
|
|
16:30-18:00, Paper TuCT18-AX.3 | Add to My Program |
Stiffness-Based Hybrid Motion/ Force Control for Cable-Driven Serpentine Manipulator |
|
Li, Wenshuo | Harbin Institute of Technology |
Xu, Wenfu | Harbin Institute of Technology, Shenzhen |
Huang, Peisheng | Harbin Institute of Technology, Shenzhen |
Lin, Boyang | Harbin Institute of Technology |
Liang, Bin | Tsinghua University |
Keywords: Force Control, Integrated Planning and Control, Motion Control
Abstract: In recent years, there has been a growing demand for robotic manipulators to perform tasks in various unstructured environments and situations requiring precision and force control. However, traditional robotic arms have limitations in fully leveraging their advantages in such scenarios. To address this demand, we have designed a cable-driven serpentine manipulator (CDSM) that combines force and precision motion control. This control method allows for precise manipulation of forces and torques at the end-effector, particularly in applications like electric vehicle charging and narrow-space exploration. It also enables independent control in multiple configurations. We achieve force-position hybrid control in task space, ensuring accurate control of end-effector force while achieving precise position control in other directions. Additionally, we implement joint angle closed-loop control in joint space to reduce the impact of cable elasticity deformation and friction on joint motion accuracy. Finally, servo control is applied at the lowest motor level. This paper investigates the modeling, sensing, and control of CDSM within a unified framework of hybrid motion/force control. Through experiments and simulations, we demonstrate the high accuracy and practicality of this control method in various scenarios.
|
|
16:30-18:00, Paper TuCT18-AX.4 | Add to My Program |
Reinforcement Learning for Reduced-Order Models of Legged Robots |
|
Chen, Yu-Ming | University of Pennsylvania |
Bui, Hien | University of Pennsylvania |
Posa, Michael | University of Pennsylvania |
Keywords: Model Learning for Control, Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control
Abstract: Model-based approaches for planning and control for bipedal locomotion have a long history of success. It can provide stability and safety guarantees while being effective in accomplishing many locomotion tasks. Model-free reinforcement learning, on the other hand, has gained much popularity in recent years due to computational advancements. It can achieve high performance in specific tasks, but it lacks physical interpretability and flexibility in re-purposing the policy for a different set of tasks. For instance, we can initially train a neural network (NN) policy using velocity commands as inputs. However, to handle new task commands like desired hand or footstep locations at a desired walking velocity, we must retrain a new NN policy. In this work, we attempt to bridge the gap between these two bodies of work on a bipedal platform. We formulate a model-based reinforcement learning problem to learn a reduced-order model (ROM) within a model predictive control (MPC). Results show a 49% improvement in viable task region size and a 21% reduction in motor torque cost. All videos and code are available at https://sites.google.com/view/ymchen/research/rl-for-roms.
|
|
16:30-18:00, Paper TuCT18-AX.5 | Add to My Program |
K-BMPC: Derivative-Based Koopman Bilinear Model Predictive Control for Tractor-Trailer Trajectory Tracking with Unknown Parameters |
|
Wang, Zehao | ShanghaiJiaoTong University |
Zhang, Han | Shanghai Jiao Tong University |
Wang, Jingchuan | Shanghai Jiao Tong University |
Keywords: Model Learning for Control, Motion Control
Abstract: Nonlinear dynamics bring difficulties to controller design for control-affine systems such as tractor-trailer vehicles, especially when the parameters in the dynamics are unknown. To address this constraint, we propose a derivative-based lifting function construction method, show that the corresponding infinite dimensional Koopman bilinear model over the lifting function is equivalent to the original control-affine system. Further, we analyze the propagation and bounds of state prediction errors caused by the truncation in derivative order. The identified finite dimensional Koopman bilinear model would serve as predictive model in the next step. Koopman Bilinear Model Predictive control (K-BMPC) is proposed to solve the trajectory tracking problem. We linearize the bilinear model around the estimation of the lifted state and control input. Then the bilinear Model Predictive Control problem is approximated by a quadratic programming problem. Further, the estimation is updated at each iteration until the convergence is reached. Moreover, we implement our algorithm on a tractor-trailer system, taking into account the longitudinal and side slip effects. The open-loop simulation shows the proposed Koopman bilinear model captures the dynamics with unknown parameters and has good prediction performance. Closed-loop tracking results show the proposed K-BMPC exhibits elevated tracking precision with the commendable computational efficiency. The experimental results demonstrate the feasibility of K-BMPC.
|
|
16:30-18:00, Paper TuCT18-AX.6 | Add to My Program |
Learning to Shape by Grinding: Cutting-Surface-Aware Model-Based Reinforcement Learning |
|
Hachimine, Takumi | Nara Institute of Science and Technology |
Morimoto, Jun | ATR Computational Neuroscience Labs |
Matsubara, Takamitsu | Nara Institute of Science and Technology |
Keywords: Model Learning for Control, Reinforcement Learning, Manipulation Planning
Abstract: Object shaping by grinding is a crucial industrial process in which a rotating grinding belt removes material. Object-shape transition models are essential to achieving automation by robots; however, learning such a complex model that depends on process conditions is challenging because it requires a significant amount of data, and the irreversible nature of the removal process makes data collection expensive. This paper proposes a cutting-surface-aware Model-Based Reinforcement Learning (MBRL) method for robotic grinding. Our method employs a cutting-surface-aware model as the object's shape transition model, which in turn is composed of a geometric cutting model and a cutting-surface-deviation model, based on the assumption that the robot action can specify the cutting surface made by the tool. Furthermore, according to the grinding resistance theory, the cutting-surface-deviation model does not require raw shape information, making the model's dimensions smaller and easier to learn than a naive shape transition model directly mapping the shapes. Through evaluation and comparison by simulation and real robot experiments, we confirm that our MBRL method can achieve high data efficiency for learning object shaping by grinding and also provide generalization capability for initial and target shapes that differ from the training data.
|
|
16:30-18:00, Paper TuCT18-AX.7 | Add to My Program |
Adaptive Contact-Implicit Model Predictive Control with Online Residual Learning |
|
Huang, Wei-Cheng | University of Pennsylvania, GRASP Lab |
Aydinoglu, Alp | University of Pennsylvania |
Jin, Wanxin | Arizona State University |
Posa, Michael | University of Pennsylvania |
Keywords: Model Learning for Control, Robust/Adaptive Control, Dexterous Manipulation
Abstract: The hybrid nature of multi-contact robotic systems, due to making and breaking contact with the environment, creates significant challenges for high-quality control. Existing model-based methods typically rely on either good prior knowledge of the multi-contact model or require significant offline model tuning effort, thus resulting in low adaptability and robustness. In this paper, we propose a real-time adaptive multi-contact model predictive control framework, which enables online adaption of the hybrid multi-contact model and continuous improvement of the control performance for contact-rich tasks. This framework includes an adaption module, which continuously learns a residual of the hybrid model to minimize the gap between the prior model and reality, and a real-time multi-contact MPC controller. We demonstrated the effectiveness of the framework in synthetic examples, and applied it on hardware to solve contact-rich manipulation tasks, where a robot uses its end-effector to roll different unknown objects on a table to track given paths. The hardware experiments show that with a rough prior model, the multi-contact MPC controller adapts itself on-the-fly with an adaption rate around 20 Hz and successfully manipulates previously unknown objects with non-smooth surface geometries. Accompanying media can be found at: https://sites.google.com/view/adaptive-contact-implicit-mpc /home
|
|
16:30-18:00, Paper TuCT18-AX.8 | Add to My Program |
DRIVE: Data-Driven Robot Input Vector Exploration |
|
Baril, Dominic | Université Laval |
Deschênes, Simon-Pierre | Université Laval |
Coupal, Luc | Université Laval |
Goffin, Cyril | EPFL |
Lépine, Julien | Université Laval |
Giguère, Philippe | Université Laval |
Pomerleau, Francois | Université Laval |
Keywords: Model Learning for Control, Software Tools for Benchmarking and Reproducibility, Field Robots
Abstract: An accurate motion model is a fundamental component of most autonomous navigation systems. While much work has been done on improving model formulation, no standard protocol exists for gathering empirical data required to train models. In this work, we address this issue by proposing Data-driven Robot Input Vector Exploration (DRIVE), a protocol that enables characterizing uncrewed ground vehicles (UGVs) input limits and gathering empirical model training data. We also propose a novel learned slip approach outperforming similar acceleration learning approaches. Our contributions are validated through an extensive experimental evaluation, cumulating over 7 km and 1.8 h of driving data over three distinct UGVs and four terrain types. We show that our protocol offers increased predictive performance over common human-driven data-gathering protocols. Furthermore, our protocol converges with 46 s of training data, almost four times less than the shortest human dataset gathering protocol. We show that the operational limit for our model is reached in extreme slip conditions encountered on surfaced ice. DRIVE is an efficient way of characterizing UGV motion in its operational conditions. Our code and dataset are both available online at this link: https://github.com/norlab-ulaval/DRIVE.
|
|
16:30-18:00, Paper TuCT18-AX.9 | Add to My Program |
Redundancy Resolution at Position Level |
|
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Sachtler, Arne | Technical University of Munich (TUM) |
Keywords: Redundant Robots, Dynamics, Kinematics, Compliance and Impedance Control
Abstract: Increasing robotic systems' degrees of freedom (DoFs) makes them more versatile and flexible. This usually renders the system kinematically redundant: the main manipulation or interaction task does not fully determine its joint maneuvers. Additional constraints or objectives are required to solve the underdetermined control and planning problems. The state-of-the-art approaches arrange tasks in a hierarchy and decouple lower priority tasks from higher priority tasks on velocity or torque level using projectors. We develop an approach to redundancy resolution and decoupling on position level by determining subspaces of the configurations space independent of the primary task. We call them orthogonal foliations because they are, in a certain sense, orthogonal to the task self-motion manifolds. The approach provides a better insight into the topological properties of robot kinematics and control problems, allowing a global view. A condition for the existence of orthogonal foliations is derived. If the condition is not satisfied, we will still find approximate solutions by numerical optimization. Coordinates can be defined on these orthogonal foliations and used as additional task variables for control. We show in simulations that we can control the system without the need for projectors using these coordinates, and we validate the approach experimentally on a seven-DoF robot.
|
|
TuCT19-NT Oral Session, NT-G301 |
Add to My Program |
Medical Robots III |
|
|
Chair: Arata, Jumpei | Kyushu University |
Co-Chair: Ren, Hongliang | National University of Singapore |
|
16:30-18:00, Paper TuCT19-NT.1 | Add to My Program |
A Force-Driven and Vision-Driven Hybrid Control Method of Autonomous Laparoscope-Holding Robot |
|
Fang, Jin | Hefei University of Technology |
Li, Ling | Hefei University of Technology |
Li, Xiaojian | Hefei University of Technology |
Mo, Hangjie | City University of HongKong |
Guo, Pengxin | Hefei University of Technology |
Xiao, Xilin | Hefei University of Technology |
Qu, Yanwei | Hefei University of Technology |
Keywords: Medical Robots and Systems, Human-Robot Collaboration
Abstract: Laparoscope-holding robots significantly enhance the stability and precision of visualization in minimally invasive surgeries. Most existing robots of this kind depend on visual servo systems and struggle with efficient, rapid adjustments in the field-of-view (FOV), especially when identifying organs and needles outside the FOV. This paper presents a laparoscope-holding robot system capable of employing both vision-driven and force-driven mechanisms for continuous and large-scale FOV adjustments, respectively. The system features an integrated tactile handle, enabling the reception of human-robot interaction forces during surgical navigation. We propose a hybrid control method that leverages both force and vision inputs for laparoscopic FOV adjustments. This approach integrates a virtual wrench, generated from visual information, and an interaction wrench, obtained from the tactile handle, into the robot's dynamic model, which complies with remote center of motion constraints. The interaction wrench's gain is adjusted with the gripping force on the integrated tactile handle, ensuring that unintended movements caused by accidental contacts are prevented, thus safeguarding operational safety. The proposed method eliminates the need to switch control modes, enabling simultaneous visual tracking and tactile interaction guidance. Experimental results demonstrate that the proposed method not only allows for FOV adjustments with surgical instrument guiding but also adapts well to large-scale FOV adjustment tasks.
|
|
16:30-18:00, Paper TuCT19-NT.2 | Add to My Program |
Inconstant Curvature Kinematics of Parallel Continuum Robot without Static Model |
|
Zhang, Tao | Chinese University of Hong Kong |
Gao, Huxin | National University of Singapore |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: Medical Robots and Systems, Kinematics, Flexible Robotics
Abstract: In the study of minimally invasive surgical robots, a mini parallel continuum robot has shown motion advantage after passing through a long and winding working channel. However, due to the interaction force between the elastic wires of the parallel robots during motion generation processes, the constant curvature assumption has shown modeling errors. This causes the current geometric kinematic model to become unreliable. Therefore, there is a need for a more accurate kinematic model in the absence of a complicated static model. This paper aims to solve this issue. The simulation in ANSYS is carried out, and the shape of one of the driving wires, when bending, is fitted by a two-segment polynomial curve. Then, the position of the distal wrist tip can be calculated based on the curve shape. To verify the accuracy of the proposed model, bending simulation and experiment are carried out. The accuracy of the proposed model is compared with that of the kinematic model based on constant curvature assumption. The result shows that the proposed model can get more accurate results, especially when the driving wire displacement increases. For a 10 mm parallel robot, when the displacements of the two pairs of wires are both 3.0 mm, the errors of the two models are 0.42 mm and 5.79 mm (4.2% and 57.9%), respectively.
|
|
16:30-18:00, Paper TuCT19-NT.3 | Add to My Program |
A Novel SEA-Based Haptic Interface for Robot-Assisted Vascular Interventional Surgery |
|
Yan, Yonggan | Beijing Institute of Technology |
Guo, Shuxiang | Kagawa University |
Lyu, Chuqiao | Beijing Institute of Technology |
Guo, Jian | Shenzhen Institute of Advanced Biomedical Robot Co., Ltd |
Wang, Jian | Shenzhen Institute of Advanced Biomedical Robot Co., Ltd |
Yang, Pengfei | Changhai Hospital |
Zhang, Yongwei | Changhai Hospital |
Zhang, Yongxin | Changhai Hospital |
Liu, Jianmin | Changhai Hospital |
Keywords: Medical Robots and Systems, Mechanism Design, Force Control
Abstract: Robot-assisted vascular interventional surgery can isolate interventionists and X-ray radiation, and improve surgical accuracy. However, the leader side outside the operating room still has problems such as incomplete collection of operating information and unrealistic tactile feedback. The main objective of this paper is to design a haptic interface that can simultaneously capture the force-position information of the interventionists and generate force to assist the interventionists in performing surgeries on the leader side. It can capture the interventionists' delivery displacement, twisting angle, clamping force, and provide real-time force feedback. A leader-follower bidirectional force feedback control strategy was proposed. Based on this strategy, on the one hand, the interventionist perceives the multi-modal information fed back from the follower side, makes judgments, and actively adjusts the surgical operation. On the other hand, the interventionist controls the grasping state of the instruments remotely to control the safety operating force threshold. Finally, the experimental setup was built and a series of evaluation experiments were performed. The experimental results verified the feasibility of the designed haptic interface. It can generate dynamic and accurate force feedback and realize leader-follower grasping force control.
|
|
16:30-18:00, Paper TuCT19-NT.4 | Add to My Program |
A Miniature 1R1T Precision Manipulator with Remote Center of Motion for Minimally Invasive Surgery |
|
Suzuki, Hiroyuki | Sony Computer Science Laboratories, Inc |
Keywords: Medical Robots and Systems, Mechanism Design, Parallel Robots
Abstract: In robotic-assisted minimally invasive surgery, the remote center of motion (RCM) achieves precision and safe manipulation of surgical devices through the insertion point into the patient’s body. One of the RCM configurations, one-rotation and one-translation (1R1T) RCM based on a closed-loop design, enables two-degrees-of-freedom transmission from the proximal end of the robotic arm to the distal end. This feature offers important advantages, particularly in enhancing safety by minimizing physical contact risks with patients or other surgical tools owing to the simplified layout near the surgical field. However, conventional 1R1T RCM robots typically employ complex structures with numerous joints consisting of pin-and-hole mating mechanisms. This complexity can increase the overall size of the robot and compromise motion precision. This study presents a miniature 1R1T precision manipulator with RCM, ORIGANOID (dimensions: W60 x D120 x H30 mm, weight: 12.6 g). The robotic arm features flexure hinges, eliminates clearance issues, and can be fabricated using an origami-inspired robotic approach. Furthermore, using a novel backlash-free coupling method, the robotic arm could be easily attached and detached from the drive units. A prototype was fabricated and experimentally validated. The results demonstrated that high-resolution motion could be achieved within 10 um. Furthermore, a demonstration using an eyeball model confirmed the successful implementation of 1R1T RCM.
|
|
16:30-18:00, Paper TuCT19-NT.5 | Add to My Program |
A Three-Dimensional Compliant Bowtie-Shaped Mechanical Amplifier to Magnify Coaxial Displacement in a Confined Space |
|
Im, Jintaek | DGIST |
Jang, Eunsil | Daegu Gyeongbuk Institute of Science and Technology |
Song, Cheol | DGIST |
Keywords: Medical Robots and Systems, Mechanism Design
Abstract: This paper proposes a novel form of a three-dimensional coaxial bowtie-shaped mechanical amplifier. The proposed model incorporates a lever mechanism into the Sarrus linkage structure. It allows the target plate to move along one axis with amplified displacement in a parallel manner. The amplifier was assembled after machining the components using a 3-axis computer numerical control machine. A flexible hinge was incorporated into the amplifier design for simplified fabrication and reduced friction in the actuation mechanism. Castigliano’s theorem is used to build a mathematical model of the proposed mechanical amplifier, and the performance was validated through finite element analysis and prototype fabrication. We achieved the amplification ratio of ×8.44, resulting in the axial displacement up to 86 µm. The demonstrated amplifier is expected to apply to compact microsurgical robots or biomedical imaging apparatus requiring coaxial displacement amplification in confined spaces.
|
|
16:30-18:00, Paper TuCT19-NT.6 | Add to My Program |
A Magnetic Continuum Robot with In-Situ Magnetic Reprogramming Capability |
|
Xue, Junnan | The Chinese University of Hong Kong |
Zhang, Moqiu | The Chinese University of Hong Kong |
Liu, Xurui | The Chinese University of Hong Kong |
Zhu, Jiaqi | Huazhong University of Science and Technology |
Cao, Yanfei | The Chinese University of Hong Kong |
Zhang, Li | The Chinese University of Hong Kong |
Keywords: Medical Robots and Systems, Micro/Nano Robots, Soft Robot Materials and Design
Abstract: Magnetic continuum robots (MCR) have shown great potential in minimally invasive interventions because they can be actively and remotely navigated through complex in vivo environments. However, the deformation capability of current MCRs is limited by fixed magnetization configurations, preventing them from accessing hard-to-reach areas. This is due to the fact that under a global magnetic field, fixed magnetization configuration causes the magnets on the MCRs exposed to coupled magnetic forces and torques, resulting in a lack of controllable degrees of freedom. Here, we introduce a reprogrammable magnetic continuum robot (RMCR) enabled by magnetic reprogramming modules (MRM). Actuated by shape memory alloys, the magnetic moment direction of MRMs can be selectively reprogrammed in real-time and in-situ. Magnetic reprogramming capabilities enable the RMCR to achieve complex shape transformations. Results show that the range of motion in the tip direction of the RMCR increases by 193% compared with regular MCR. Besides, MRMs on the RMCR can achieve active attraction and separation under simple magnetic fields. The reprogramming process of the RMCR is theoretically investigated. A design methodology for MRMs is then proposed and the fabrication process of RMCR is described in detail. Furthermore, a kinematic model of the RMCR is established, simulated, and experimentally validated.
|
|
16:30-18:00, Paper TuCT19-NT.7 | Add to My Program |
Design and Control of a Magnetically-Actuated Anti-Interference Microrobot for Targeted Therapeutic Delivery |
|
Qin, Yanding | Nankai University |
Cai, Zhuocong | Nankai University |
Han, Jianda | Nankai University |
Keywords: Medical Robots and Systems, Micro/Nano Robots
Abstract: This paper proposes an anti-interference targeted therapeutic delivery microrobot, where the targeted therapeutic delivery to the lesion site in human intestine can be operated by the external magnetic field. The robot is composed of a shell and a targeted delivery mechanism. Under the actuation of the external magnetic field, the spiral structure on the outer surface of the shell can actively moves the robot back and forth in human intestine. The internal embedded targeted delivery mechanism is fixed with a radial magnetized O-type permanent magnet. It not only realizes flexible movement in the fluid environment, but also realizes the intestinal anchoring and targeted therapeutic delivery functions against the constant peristalsis of human intestine. The proposed robot is designed to finish the drug treatment concentration and treatment effect precisely in the lesion site against the intestinal peristalsis. A series of simulations and experiments are conducted to evaluate the feasibility of the developed robot. In experiments, the robot is used to release drugs after anchoring in the lesion site. Finally, ex vivo experiments are carried out on fresh porcine intestines.
|
|
16:30-18:00, Paper TuCT19-NT.8 | Add to My Program |
Design and Visual Servoing Control of a Hybrid Dual-Segment Flexible Neurosurgical Robot for Intraventricular Biopsy |
|
Chen, Jian | University of Chinese Academy of Sciences |
Chen, Mingcong | City University of Hong Kong |
Zhao, Qing xiang | Hong Kong Institute of Science & Innovation, Centre for Artifici |
Wang, Shuai | HKPU(The Hong Kong Polytechnic University) |
Wang, Yihe | Centre for Artificial Intelligence and Robotics |
Xiao, Ying | Institute of Automation, Chinese Academy of Sciences |
Hu, Jian | Institute of Automation, Chinese Academy of Sciences |
Chan, Tat-Ming | Prince of Wales Hospital |
Yeung, Kam Tong Leo | Department of Surgery Faculty of Medicine the Chinese University |
Chan, David Yuen Chung | Department of Surgery Faculty of Medicine the Chinese University |
Liu, Hongbin | Hong Kong Institute of Science & Innovation, Chinese Academy Of |
Keywords: Medical Robots and Systems, Modeling, Control, and Learning for Soft Robots, Actuation and Joint Mechanisms
Abstract: Traditional rigid endoscopes have challenges in flexibly treating tumors located deep in the brain, and low operability and fixed viewing angles limit its development. This study introduces a novel dual-segment flexible robotic endoscope MicroNeuro, designed to to perform biopsies with dexterous surgical manipulation deep in the brain. Taking into account the uncertainty of the control model, an image-based visual servoing with online robot Jacobian estimation has been implemented to enhance motion accuracy. Furthermore, the application of model predictive control with constraints significantly bolsters the flexible robot's ability to adaptively track mobile objects and resist external interference. Experimental results underscore that the proposed control system enhances motion stability and precision. Phantom testing substantiates its considerable potential for deployment in neurosurgery.
|
|
16:30-18:00, Paper TuCT19-NT.9 | Add to My Program |
Uncertainty-Aware Shape Estimation of a Surgical Continuum Manipulator in Constrained Environments Using Fiber Bragg Grating Sensors |
|
Schwarz, Alexander | Johns Hopkins University |
Mehrfard, Arian | Johns Hopkins University |
Amirkhani, Golchehr | Johns Hopkins University |
Phalen, Henry | Johns Hopkins University |
Ma, Justin | Johns Hopkins University |
Grupp, Robert | Johns Hopkins University |
Martin-Gomez, Alejandro | Johns Hopkins University |
Armand, Mehran | Johns Hopkins University |
Keywords: Medical Robots and Systems, Modeling, Control, and Learning for Soft Robots, Flexible Robotics
Abstract: Continuum Dexterous Manipulators (CDMs) are well-suited tools for minimally invasive surgery due to their inherent dexterity and reachability. Nonetheless, their flexible structure and non-linear curvature pose significant challenges for shape-based feedback control. The use of Fiber Bragg Grating (FBG) sensors for shape sensing has shown great potential in estimating the CDM's tip position and subsequently reconstructing the shape using optimization algorithms. This optimization, however, is under-constrained and may be ill-posed for complex shapes, falling into local minima. In this work, we introduce a novel method capable of directly estimating a CDM's shape from FBG sensor wavelengths using a deep neural network. In addition, we propose the integration of uncertainty estimation to address the critical issue of uncertainty in neural network predictions. Neural network predictions are unreliable when the input sample is outside the training distribution or corrupted by noise. Recognizing such deviations is crucial when integrating neural networks within surgical robotics, as inaccurate estimations can pose serious risks to the patient. We present a robust method that not only improves the precision upon existing techniques for FBG-based shape estimation but also incorporates a mechanism to quantify the models' confidence through uncertainty estimation. We validate the uncertainty estimation through extensive experiments, demonstrating its effectiveness and reliability on out-of-distribution (OOD) data, adding an additional layer of safety and precision to minimally invasive surgical robotics.
|
|
TuCT20-NT Oral Session, NT-G302 |
Add to My Program |
Search & Rescue Robotics in Fields |
|
|
Chair: Sattar, Junaed | University of Minnesota |
Co-Chair: Tadakuma, Kenjiro | Tohoku University |
|
16:30-18:00, Paper TuCT20-NT.1 | Add to My Program |
Autonomous Robotic Re-Alignment for Face-To-Face Underwater Human-Robot Interaction |
|
Kutzke, Demetrious T. | University of Minnesota - Twin Cities |
Wariar, Ashwin | University of Minnesota Twin-Cities |
Sattar, Junaed | University of Minnesota |
Keywords: Robotics in Hazardous Fields, Human-Aware Motion Planning, Visual Servoing
Abstract: The use of autonomous underwater vehicles (AUVs) to accomplish traditionally challenging and dangerous tasks has proliferated thanks to advances in sensing, navigation, manipulation, and on-board computing technologies. Utilizing AUVs in underwater human-robot interaction (UHRI) has witnessed comparatively smaller levels of growth due to limitations in bi-directional communication and significant technical hurdles to bridge the gap between analogies with terrestrial interaction strategies and those that are possible in the underwater domain. A necessary component to support UHRI is establishing a system for safe robotic-diver approach to establish face-to-face communication that considers non-standard human body pose. In this work, we introduce a stereo vision system for enhancing UHRI that utilizes three-dimensional reconstruction from stereo image pairs and machine learning for localizing human joint estimates. We then establish a convention for a coordinate system that encodes the direction the human is facing with respect to the camera coordinate frame. This allows automatic setpoint computation that preserves human body scale and can be used as input to an image-based visual servo control scheme. We show that our setpoint computations tend to agree both quantitatively and qualitatively with experimental setpoint baselines. The methodology introduced shows promise for enhancing UHRI by improving robotic perception of human orientation underwater.
|
|
16:30-18:00, Paper TuCT20-NT.2 | Add to My Program |
Mechanism Design for New Sensors Field Deployment by LineRanger Powerline Robot |
|
Richard, Pierre-Luc | Hydro-Quebec Research Institute |
Jonathan, Bellemare | IREQ Hydro-Québec Research Institute |
Hamelin, Philippe | Hydro-Quebec Research Institute |
Hébert, Camille | Hydro-Québec's Research Institute |
Lambert, Ghislain | Hydro-Quebec Research Institute |
Lavoie, Samuel | Hydro-Quebec Research Institute |
Sebastien, Leprohon | IREQ Hydro-Québec Research Institute |
Montfrond, Matthieu | Hydro-Québec Research Institute |
Marion, Nourry | IREQ Hydro-Québec Research Institute |
Sartor, Alex | Hydro-Quebec Research Institute |
Pouliot, Nicolas | IREQ Hydro-Québec Research Institute |
Keywords: Robotics in Hazardous Fields, Mechanism Design, Field Robots
Abstract: Powerline robotics is slowly becoming key tools for electric utilities. Contrary to drones that are usually limited to inspection tasks, wheeled robots like LineRanger can ensure a broader range of applications. In this paper, a suite of mechanical devices is featured, as several new asset management tasks were recently added to LineRanger’s capabilities. While previous applications were focusing on non-contact inspection (visual, electro-magnetic, etc.), the new tasks at hand involved reaching adjacent conductors to probe line components with micro-Ohmmeter, installing and retrieving custom build sensors for multi-day line monitoring, and assessing aging conductors surface properties, to refine their thermal model and optimize the line capacity during heat waves. All three applications were recently field validated onto LineRanger, and mechanical design insights shall be presented for each module.
|
|
16:30-18:00, Paper TuCT20-NT.3 | Add to My Program |
Translational Disturbance Rejection for Jet-Actuated Flying Continuum Robots on Mobile Bases |
|
Maezawa, Yukihiro | Tohoku University |
Ambe, Yuichi | Osaka University |
Yamauchi, Yu | Akita Prefectural University |
Konyo, Masashi | Tohoku University |
Tadakuma, Kenjiro | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Search and Rescue Robots, Aerial Systems: Mechanics and Control, Flexible Robotics
Abstract: Although continuum robots have the potential to operate in narrow areas by changing their shapes and propelling their bodies, they easily vibrate under sudden or periodic applications of external forces. Suppressing vibrations is difficult in our jet-actuated continuum robot because the movements of its mobile base cannot be controlled with the same system as the movements of the robot, and mobile base oscillation increases the risk of resonance. In this study, a disturbance rejection was realized for the Dragon Firefighter, a jet-actuated flying continuum robot on a mobile base, for rapid and safe fire extinguishing using a 4-m-long flying fire hose consisting of two nozzle units and flexible hoses. An H-infinity-based disturbance-rejection controller was designed to suppress the vibration of the head nozzle unit posture against the acceleration of the mobile base. Then, the robot parameters were identified from tensile tests and dynamic excitation experiments. Dynamic simulations confirmed that the controller reduced the peak gain of the frequency response by approximately 2 dB for various robot shapes. Robot experiments confirmed that the proposed method reduced the peak gain of the frequency response by approximately 3 dB, which increased the extra injection range of the nozzle by approximately 16 %.
|
|
16:30-18:00, Paper TuCT20-NT.4 | Add to My Program |
Cellular-Enabled Collaborative Robots Planning and Operations for Search-And-Rescue Scenarios |
|
Romero, Arnau | I2CAT Foundation |
Delgado, Carmen | I2CAT Foundation |
Zanzi, Lanfranco | NEC Laboratories |
Suarez, Raul | Universitat Politecnica De Catalunya (UPC) |
Costa, Xavier | I2CAT Foundation |
Keywords: Search and Rescue Robots, Energy and Environment-Aware Automation, Planning, Scheduling and Coordination
Abstract: Mission-critical operations, particularly in the context of Search-and-Rescue (SAR) and emergency response situations, demand optimal performance and efficiency from every component involved to maximize the success probability of such operations. In these settings, cellular-enabled collaborative robotic systems have emerged as invaluable assets, assisting first responders in several tasks, ranging from victim localization to hazardous area exploration. However, a critical limitation in the deployment of cellular-enabled collaborative robots in SAR missions is their energy budget, primarily supplied by batteries, which directly impacts their task execution and mobility. This paper tackles this problem, and proposes a search-and-rescue framework for cellular-enabled collaborative robots use cases that, taking as input the area size to be explored, the robots fleet size, their energy profile, exploration rate required and target response time, finds the minimum number of robots able to meet the SAR mission goals and the path they should follow to explore the area. Our results, i) show that first responders can rely on a SAR cellular-enabled robotics framework when planning mission-critical operations to take informed decisions with limited resources, and, ii) illustrate the number of robots versus explored area and response time trade-off depending on the type of robot: wheeled vs quadruped.
|
|
16:30-18:00, Paper TuCT20-NT.5 | Add to My Program |
STAGE: Scalable and Traversability-Aware Graph Based Exploration Planner for Dynamically Varying Environments |
|
Patel, Akash | Luleå University of Technology |
Valdes Saucedo, Mario Alberto | Lulea University of Technology |
Kanellakis, Christoforos | LTU |
Nikolakopoulos, George | Luleå University of Technology |
Keywords: Search and Rescue Robots, Field Robots, Autonomous Vehicle Navigation
Abstract: In this article, we propose a novel navigation framework that leverages a two layered graph representation of the environment for efficient large-scale exploration, while it integrates a novel uncertainty awareness scheme to handle dynamic scene changes in previously explored areas. The framework is structured around a novel goal oriented graph representation, that consists of, i) the local sub-graph and ii) the global graph layer respectively. The local sub-graphs encode local volumetric gain locations as frontiers, based on the direct pointcloud visibility, allowing fast graph building and path planning. Additionally, the global graph is build in an efficient way, using node-edge information exchange only on overlapping regions of sequential sub-graphs. Different from the state-of-the-art graph based exploration methods, the proposed approach efficiently re-uses sub-graphs built in previous iterations to construct the global navigation layer. Another merit of the proposed scheme is the ability to handle scene changes (e.g. blocked pathways), adaptively updating the obstructed part of the global graph from traversable to not-traversable. This operation involved oriented sample space of a path segment in the global graph layer, while removing the respective edges from connected nodes of the global graph in cases of obstructions. As such, the exploration behavior is directing the robot to follow another route in the global re-positioning phase through path-way updates in the global graph. Finally, we showcase the performance of the method both in simulation runs as well as deployed in real-world scene involving a legged robot carrying camera and lidar sensor.
|
|
16:30-18:00, Paper TuCT20-NT.6 | Add to My Program |
Computation-Aware Multi-Object Search in 3D Space Using Submodular Tree |
|
Li, Yan-Shuo | National Central University |
Tseng, Kuo-Shih | National Central University |
Keywords: Search and Rescue Robots, Human Detection and Tracking, Aerial Systems: Applications
Abstract: Search for targets in a 3D environment can be formulated as submodular maximization problems with routing constraints. However, it involves solving two NP-hard problems: maximal coverage and traveling salesman problems. Since the time constraint is critical for search problems, this research proposes a Computation-Aware Search for Multiple Objects (CASMO) algorithm to further consider the computational time in the cost constraints. Due to the submdularity, the greedy algorithm achieves frac{1}{2}(1-frac{1}{e})overline{OPT}, where overline{OPT} is the approximate optimum. The experiment results show that the proposed algorithms outperform state-of-the-art approaches in multi-object search.
|
|
16:30-18:00, Paper TuCT20-NT.7 | Add to My Program |
Multi-Robot Search in a 3D Environment with Intersection System Constraints |
|
Li, Yan-Shuo | National Central University |
Tseng, Kuo-Shih | National Central University |
Keywords: Search and Rescue Robots, Path Planning for Multiple Mobile Robots or Agents, Planning, Scheduling and Coordination
Abstract: The efficient task allocation is a challenge for multi-robot search. The multi-robot search problem is reformulated as submodular maximization subject to intersection system constraints. The objective function is submodular and consists of a coverage function to cover environments and a balancing function to efficiently dispatch robots. The intersection system is composed of routing and clustering constraints. The experiment results show that the proposed approach outperforms state-of-the-art methods in multi-robot search.
|
|
16:30-18:00, Paper TuCT20-NT.8 | Add to My Program |
Wireless Communication Infrastructure Building for Mobile Robot Search and Inspection Missions |
|
Zoula, Martin | Czech Technical University in Prague |
Faigl, Jan | Czech Technical University in Prague |
Keywords: Sensor Networks, Search and Rescue Robots, Networked Robots
Abstract: In the paper, we address wireless communication infrastructure building by relay placement based on approaches utilized in wireless network sensors. The problem is motivated by search and inspection missions with mobile robots, where known sensing ranges may be exploited. We investigate the relay placement, establishing network connectivity to support robust flood-based communication routing. The proposed method decomposes the given area into Open space and Corridor space where specific deployment patterns allow for guaranteed k-connectivity, making the resulting network redundant while keeping channel utilization bounded. In particular, a hexagonal tesselation coverage pattern with 3-connectivity is investigated in Open space and a linear 4-connectivity pattern in Corridor space, respectively. The proposed approach is empirically evaluated in a realistic scenario, and based on the reported results, it is found superior compared to the existing stochastic randomized dual sampling schema.
|
|
16:30-18:00, Paper TuCT20-NT.9 | Add to My Program |
RB5 Low-Cost Explorer: Implementing Autonomous Long-Term Exploration on Low-Cost Robotic Hardware |
|
Seewald, Adam | Yale University |
Chancán, Marvin | Yale University |
McCann, Connor | Harvard University |
Noh, Seonghoon | Yale University |
Fallahi, Omeed | Yale University |
Castillo, Hector | Yale University |
Abraham, Ian | Yale University |
Dollar, Aaron | Yale University |
Keywords: Engineering for Robotic Systems, Software-Hardware Integration for Robot Systems
Abstract: This systems paper presents the implementation and design of RB5, a wheeled robot for autonomous long-term exploration with fewer and cheaper sensors. Requiring just an RGB-D camera and low-power computing hardware, the system consists of an experimental platform with rocker-bogie suspension. It operates in unknown and GPS-denied environments and on indoor and outdoor terrains. The exploration consists of a methodology that extends frontier- and sampling-based exploration with a path-following vector field and a state-of-the-art SLAM algorithm. The methodology allows the robot to explore its surroundings at lower update frequencies, enabling the use of lower-performing and lower-cost hardware while still retaining good autonomous performance. The approach further consists of a methodology to interact with a remotely located human operator based on an inexpensive long-range and low-power communication technology from the internet-of-things domain (i.e., LoRa) and a customized communication protocol. The results and the feasibility analysis show the possible applications and limitations of the approach.
|
|
TuCT21-NT Oral Session, NT-G303 |
Add to My Program |
Bioinspired Microrobots |
|
|
Chair: Morishima, Keisuke | Osaka University |
Co-Chair: Perez-Arancibia, Nestor O | Washington State University (WSU) |
|
16:30-18:00, Paper TuCT21-NT.1 | Add to My Program |
Takeoff of a 2.1g Fully Untethered Tailless Flapping-Wing Micro Aerial Vehicle with Integrated Battery |
|
Ozaki, Takashi | Toyota Central R&D Labs. Inc |
Ohta, Norikazu | Company |
Jimbo, Tomohiko | Toyota Central R&d Labs., Inc |
Hamaguchi, Kanae | Toyota Central R&D Labs., Inc |
Keywords: Aerial Systems: Mechanics and Control, Biologically-Inspired Robots, Biomimetics
Abstract: Insect-scale micro-aerial vehicles (MAV) are becoming increasingly important for sensing and mapping spatially constrained environments. However, achieving untethered flight powered by a small and lightweight on-board energy source remains a challenge. In this study, we successfully demonstrated the untethered takeoff of a flapping-wing MAV powered by a commercially available LiPo battery in short duration without stability control. By incorporating a high-efficiency direct-drive piezoelectric actuator, and an optimized control circuit for high-impedance-width modulation and charge recovery, this electronics realizes over 5 min of practical operation with a thrust of 1.5 times its own weight, while offering wireless communication capability and sensors for attitude estimation. Our MAV has a total mass of 2.1 g, which is an eight-fold reduction compared to the lightest battery-powered tailless flapping-wing MAV currently available, making this is the first report of a battery-powered tailless flapping-wing MAV with insect-scale weight.
|
|
16:30-18:00, Paper TuCT21-NT.2 | Add to My Program |
Design and Optimization of a Miniature Locust-Inspired Stable Jumping Robot |
|
Xu, Yi | Beijing Institute of Technology |
Jin, Yanzhou | Beijing Institute of Technology |
Zhang, Weitao | Beijing Institute of Technology |
Si, Yunhao | Beijing Institute of Technology |
Zhang, Yulai | Beijing Institute of Technology |
Li, Chang | Beijing Institute of Technology |
Shi, Qing | Beijing Institute of Technology |
Keywords: Biologically-Inspired Robots, Biomimetics, Mechanism Design
Abstract: Jumping is a key locomotion for miniature robots, but it is difficult for a robot to jump a long distance without flipping. To solve this problem, we develop a miniature locust-inspired jumping robot, which has a body length of 10 cm and weight of 60 g. On the basis of the extracted skeletal muscle movement of a locust, we make full use of the Stephenson six-bar mechanism in designing a jumping leg to achieve power amplification. Moreover, we carry out a two-step optimization of the mechanism parameters to achieve high jumping energy (first step) through optimizing the storage and dissipation of energy and then high jumping stability (second step) through optimizing the force characteristics. A series of experimental tests show that the robot can jump to a height three times its body length and a distance seven times its body length. Remarkably, the jumping height and distance relative to the body length of our jumper exceeds that of other robots with stable mechanisms by 30% and 33%, respectively. Meanwhile, our robot has a high degree of stability, which allows it to maintain a proper aerial orientation without flipping.
|
|
16:30-18:00, Paper TuCT21-NT.3 | Add to My Program |
Multi-Modal Jumping and Crawling in an Autonomous, Springtail-Inspired Microrobot |
|
Singh, Shashwat | Carnegie Mellon University |
Temel, Zeynep | Carnegie Mellon University |
St. Pierre, Ryan | University at Buffalo |
Keywords: Biologically-Inspired Robots, Biomimetics, Micro/Nano Robots
Abstract: Springtails are tiny arthropods that crawl and jump. They jump by temporarily storing elastic energy in resilin elastic cuticular structures and releasing that energy to accelerate a tail, called a furca, propelling them in the air. This paper presents an autonomous, springtail-inspired microrobot that can crawl and jump. The microrobot has a mass of 980 mg and stands 13 mm tall, and has on-board sensing, computation, and power, enabling autonomy. The microrobot was designed with a super-elastic shape memory alloy (SMA) spring that is manually loaded to store elastic energy. The on-board sensing and computation triggers an actuator at the jump frequency range that unlatches the spring, launching the microrobot into the air at speeds up to 3.171 m/s. At the same time, the microrobot is capable of crawling, when actuated at frequencies lower or higher than the jump frequency range, demonstrating autonomous multi-modal locomotion. This work opens up new pathways toward autonomy in multi-modal microrobots.
|
|
16:30-18:00, Paper TuCT21-NT.4 | Add to My Program |
High-Speed Interfacial Flight of an Insect-Scale Robot |
|
Gao, Hang | Cornell University |
Jung, Sunghwan | Cornell University |
Helbling, E. Farrell | Cornell University |
Keywords: Biologically-Inspired Robots, Micro/Nano Robots
Abstract: Several insect species are able to locomote across the air-water interface by leveraging surface tension to remain above the water surface. A subset of these insects, such as the stonefly and waterlily beetle, flap their wings to actively move around the two dimensional surface — a locomotion strategy referred to as interfacial flight. Here, we present an insect-scale robot, the gamma-bot, inspired by these interfacial fliers. The robot is comprised of a flapping-wing vehicle that generates a thrust force parallel to the water surface, and three passive legs utilize surface tension to support the body mass and maintain contact with the air-water interface. We developed and validated a simple model to characterize the drag forces acting on the vehicle and estimate the robot’s velocity. This 112 mg robot can reach maximum velocities of 0.9 m/s (corresponding to 15 body lengths per second) and can initiate both left and right turns, demonstrating high maneuverability along the air-water interface. In addition, the robot can carry an additional 419 mg, enabling future sensing, control, and power autonomous operation.
|
|
16:30-18:00, Paper TuCT21-NT.5 | Add to My Program |
VLEIBot: A New 45-Mg Swimming Microrobot Driven by a Bioinspired Anguilliform Propulsor |
|
Blankenship, Elijah | Washington State University |
Trygstad, Conor | Washington State University |
Gonçalves, Francisco | Washington State University |
Perez-Arancibia, Nestor O | Washington State University (WSU) |
Keywords: Biologically-Inspired Robots, Micro/Nano Robots, Marine Robotics
Abstract: This paper presents the VLEIBot * ( Very Little Eel-Inspired roBot), a 45-mg/23-mm 3 microrobotic swimmer that is propelled by a bioinspired anguilliform propulsor. The propulsor is excited by a single 6-mg high-work-density (HWD) microactuator and undulates periodically due to wave propagation phenomena generated by fluid-structure interaction (FSI) during swimming. The microactuator is composed of a carbon-fiber beam, which functions as a leaf spring, and shape-memory alloy (SMA) wires, which deform cyclically when excited periodically using Joule heating. The VLEIBot can swim at speeds as high as 15.1 mm · s −1 (0.33 Bl · s −1) when driven with a heuristically-optimized propulsor. To improve maneuverability, we evolved the VLEIBot design into the 90-mg/47-mm 3 VLEIBot +, which is driven by two propulsors and fully controllable in the two-dimensional (2D) space. The VLEIBot + can swim at speeds as high as 16.1 mm · s −1 (0.35 Bl · s −1), when driven with heuristically-optimized propulsors, and achieves turning rates as high as 0.28 rad · s −1, when tracking path references. The measured root-mean-square (RMS) values of the tracking errors are as low as 4 mm.
|
|
16:30-18:00, Paper TuCT21-NT.6 | Add to My Program |
Direct Learning of Home Vector Direction for Insect-Inspired Robot Navigation |
|
Firlefyn, Michiel Vital M | TU Delft |
Hagenaars, Jesse | Delft University of Technology |
de Croon, Guido | TU Delft |
Keywords: Biologically-Inspired Robots, Vision-Based Navigation, Bioinspired Robot Learning
Abstract: Insects have long been recognized for their ability to navigate and return home using visual cues from their nest's environment. However, the precise mechanism underlying this remarkable homing skill remains a subject of ongoing investigation. Drawing inspiration from the learning flights of honey bees and wasps, we propose a robot navigation method that directly learns the home vector direction from visual percepts during a learning flight in the vicinity of the nest. After learning, the robot will travel away from the nest, come back by means of odometry, and eliminate the resultant drift by inferring the home vector orientation from the currently experienced view. Using a compact convolutional neural network, we demonstrate successful learning in both simulated and real forest environments, as well as successful homing control of a simulated quadrotor. The average errors of the inferred home vectors in general stay well below the 90° required for successful homing, and below 24° if all images contain sufficient texture and illumination. Moreover, we show that the trajectory followed during the initial learning flight has a pronounced impact on the network's performance. A higher density of sample points in proximity to the nest results in a more consistent return. Code and data are available at https://mavlab.tudelft.nl/learning_to_home.
|
|
16:30-18:00, Paper TuCT21-NT.7 | Add to My Program |
A Dragonfly-Inspired Flapping Wing Robot Mimicking Force Vector Control Approach |
|
Liu, Fangyuan | Beihang University |
Li, Song | Beihang University |
Xiang, Jinwu | Beihang University |
Li, Daochun | Beihang University |
Tu, Zhan | Beihang University |
Keywords: Biomimetics, Biologically-Inspired Robots, Mechanism Design
Abstract: Dragonflies show impressive flying skills by achieving both high efficiency and agility. They can perform distinctive flight maneuvers, such as flying backwards, which has proven to be achieved through ”force vectoring” mechanism recently. In this paper, to explore the agile flight ability of dragonflies on man-made flapping wing systems, we designed, optimized and fabricated a dragonfly-inspired flapping wing robot (DFWR) with inclinable stroke plane control degrees. The proposed platform employs a four-wing configuration, each of which integrates an extra servo motor to enable the rotation of the flapping plane and imitate the ”force vectoring” mechanism. Besides, referring to the flapping kinematics of dragonflies, the installation angle and wing pitch angle of the proposed DFWR are optimized considering the total lift and energy consumption through multiobjective optimization based on NSGA-II method. The ”force vector” produced by the proposed platform has been illustrated through both theoretical method and experimental method. Moreover, the feasibility of the design is further verified through a series of operation validation experiments. Such a robot has the potential to provide a highly biomimetic platform to validate the flight mechanism studying of Odonata as well as the relative on-board applications such as bio-inspired vision.
|
|
16:30-18:00, Paper TuCT21-NT.8 | Add to My Program |
Microrobotic Flight Enabled by Ultralight Ion Thrusters with High Thrust-To-Weight Ratio and Low Fabrication Cost |
|
Gu, Yang | Nanjing University of Posts and Telecommunications |
Cai, Xianfa | Nanjing University of Posts and Telecommunications |
Thakuri, Khadga | University of Vermont |
Yang, Wenyu | Huazhong University of Science and Technology |
Guo, Yufeng | Nanjing University of Posts and Telecommunications |
Li, Wei | University of Vermont |
Keywords: Micro/Nano Robots, Mechanism Design, Aerial Systems: Mechanics and Control
Abstract: Flying microrobots have garnered growing research interest owing to their technological intricacies and suitability for various applications leveraging miniaturized size. Electrohydrodynamic (EHD) thrust offers advantages by generating propulsion without moving parts, but real-world use is limited by insufficient thrust generation, manufacturing challenges, fragility, and cost. This work presents the design and development of an optimized ion-propelled flying microrobot that excels in low weight, high thrust-to-weight ratio, and cost efficiency. Regarding design, multiphysics simulations guided structural optimization to increase thrust while decreasing weight. For materials, metal-coated polyethylene terephthalate (PET) film was selected to leverage the combined merits of metal conductivity and polymer flexibility, light weight, and low cost, enabling further weight reduction, easy assembly, robustness, and cost-effectiveness. Various experiments, including voltage-current measurements, ionic wind speed, thrust quantification, and airflow visualization, directed design refinements and validated performance. Through structural optimization, the maximum wind speed attained 2.18 m/s. Flight demonstrations with payloads evidenced the microrobot can stably fly at an inherent 16 mg weight while carrying an additional 72 mg load, achieving a record 5.5 thrust-to-weight ratio. These results open possibilities to incorporate microelectronics, enabling autonomous flight functionality.
|
|
16:30-18:00, Paper TuCT21-NT.9 | Add to My Program |
A Modular Biological Neural Network-Based Neuro-Robotic System Via Local Chemical Stimulation and Calcium Imaging |
|
Chen, Zhe | Beijing Institute of Technology |
Chen, Xie | Beijing Institute of Technology |
Shimoda, Shingo | RIKEN |
Huang, Qiang | Beijing Institute of Technology |
Shi, Qing | Beijing Institute of Technology |
Fukuda, Toshio | Nagoya University |
Sun, Tao | Beijing Institute of Technology |
Keywords: Neurorobotics, Cyborgs
Abstract: Embodying in vitro biological neural networks (BNNs) with robots to explore the rise of intelligence in these simpler models and to endow robots with biological intelligence has been attracting increasing attention in the fields of neuroscience and robotics. However, current research suffers from unstable sensory-motor mapping due to the random wiring of neurons seeded on multi-electrode arrays (MEAs). Therefore, here we propose a modular BNN (mBNN)-based neuro-robotic system via local chemical stimulation and calcium recording. In this system, reliable evoked sensory-motor mapping (success rate > 89%) from the sensory to the motor area in the mBNN was demonstrated. It is achieved in the mBNNs by combining global chemical modulation (for suppressing spontaneous signal transmission) and local chemical stimulation (for inducing the evoked signal transmission). The neural signals of the motor area of the BNN are recorded by calcium imaging, analyzed, and decoded to control the motion state of the mobile robot in real-time. The sensory signals of the robot are encoded and transmitted to the sensory area of the BNN, closing the loop. This system presents a platform to investigate how information is processed and transmitted in mBNNs, and also to examine the influence of local and global chemical modulation on within-network signal transmission.
|
|
TuCT22-NT Oral Session, NT-G304 |
Add to My Program |
Marine Robotics III |
|
|
Chair: Hollinger, Geoffrey | Oregon State University |
Co-Chair: Quattrini Li, Alberto | Dartmouth College |
|
16:30-18:00, Paper TuCT22-NT.1 | Add to My Program |
An Augmented Catenary Model for Underwater Tethered Robots |
|
Filliung, Martin | CNRS LIS, COSMER Laboratory, Université De Toulon |
Drupt, Juliette | Université De Toulon |
Peraud, Charly | COSMER Laboratory, Université De Toulon |
Dune, Claire | Université De Toulon |
Boizot, Nicolas | Université De Toulon |
Comport, Andrew Ian | CNRS-I3S/UNS |
Anthierens, Cedric | Universite De Toulon |
Hugel, Vincent | University of Toulon |
Keywords: Marine Robotics
Abstract: This paper examines the relevance of using catenary-based curves to model cables in underwater tethered robotic applications in order to take into account the influence of hydrodynamic damping. To this end, an augmented catenary- based model is introduced to deal with the dynamical effects of surge motion, sway motion or a combination of both on a cable. Experimental studies are carried out with eight cables of varying stiffness, weight and buoyancy. One end of the cable is fixed, while the other end is moved by the underwater robot. The obtained results help to determine which cables and which dynamics are compatible with a fair estimation of the cable shape through the proposed models.
|
|
16:30-18:00, Paper TuCT22-NT.2 | Add to My Program |
A Hybrid Dynamical Model for Robotic Underwater Vehicles When Submerged or Surfaced: Approach and Preliminary Evaluation |
|
Hunt, James | Johns Hopkins University |
Whitcomb, Louis | The Johns Hopkins University |
Keywords: Marine Robotics, Field Robots
Abstract: This paper reports a numerical method for modeling underwater vehicle (UV) interactions with the free surface using a finite-dimensional dynamical plant model. Although finite-dimensional plant models of fully submerged UV behavior are well-established, they are unable to model the ubiquitous condition of a UV operating at or near the free surface. We report a Monte Carlo-based hybrid model approach for calculating the buoyancy and righting moment of a partially or fully submerged UV in order to model interactions with the free surface. We also report a preliminary evaluation of the hybrid model in numerical simulations, comparing the hybrid model's performance to that of a model for fully submerged UVs and to the experimentally observed behavior of an actual vehicle while fully submerged and while interacting with the free surface. The results of this preliminary study suggest that the proposed hybrid approach may offer a simple and practical method for modeling UV behavior when submerged or interacting with the free surface.
|
|
16:30-18:00, Paper TuCT22-NT.3 | Add to My Program |
Surfing Algorithm: Agile and Safe Transition Strategy for Hybrid Aerial Underwater Vehicle in Waves |
|
Bi, Yuanbo | Shanghai Jiao Tong University |
Jin, Yufei | Shanghai Jiao Tong University |
Zhou, Hexiong | Shanghai Jiao Tong University |
Bai, YuLin | Shanghai Jiao Tong University |
Lyu, Chenxin | Shanghai Jiao Tong University |
Zeng, Zheng | Shanghai Jiao Tong University |
Lian, Lian | Shanghai Jiaotong University |
Keywords: Marine Robotics, Field Robots, Aerial Systems: Perception and Autonomy, Motion Control
Abstract: The agile and safe trans-domain in waves is a promising feature but the primary bottleneck of the hybrid aerial underwater vehicle (HAUV). In this article, the surfing algorithm is proposed to search for the dynamic window facilitating takeoff in waves and avoiding hazardous waves. For the first time, the cross-domain window, i.e. the vehicle is at the wave crest and heading downstream, is characterized and defined. The novel surfing algorithm consists of the gradient perceptron, time-limited momentum gradient search, heading server, and initial conditions. Numerical simulations and experiments in regular and irregular waves reveal the effectiveness of the algorithm. The algorithm ensures a healthy initial attitude and inaccessible wave disturbance during takeoff, thus alleviating the thrust distraction from stability recovery and uncertainty. The average transition time and energy cost are saved by 59.2% and 26.1% compared with random take-off cases, and the locomotion is smooth, graceful, and low-risk. Compared with the adaptive robust controller, this article provides an ingenious and enlightening strategy from the perspective of harnessing waves.
|
|
16:30-18:00, Paper TuCT22-NT.4 | Add to My Program |
ReefGlider: A Highly Maneuverable Vectored Buoyancy Engine Based Underwater Robot |
|
Macauley, Kevin | University of Wisconsin-Madison |
Cai, Levi | Massachusetts Institute of Technology |
Adamczyk, Peter G. | University of Wisconsin - Madison |
Girdhar, Yogesh | Woods Hole Oceanographic Institution |
Keywords: Marine Robotics, Field Robots, Environment Monitoring and Management
Abstract: There exists a capability gap in the design of currently available autonomous underwater vehicles (AUV). Most AUVs use a set of thrusters, and optionally control surfaces, to control their depth and pose. AUVs utilizing thrusters can be highly maneuverable, making them well-suited to operate in complex environments such as in close-proximity to coral reefs. However, they are inherently power-inefficient and produce significant noise and disturbance. Underwater gliders, on the other hand, use changes in buoyancy and center of mass, in combination with a control surface to move around. They are extremely power efficient but not very maneuverable. Gliders are designed for long-range missions that do not require precision maneuvering. Furthermore, since gliders only activate the buoyancy engine for small time intervals, they do not disturb the environment and can also be used for passive acoustic observations. In this paper we present ReefGlider, a novel AUV that uses only buoyancy for control but is still highly maneuverable from additional buoyancy control devices. ReefGlider bridges the gap between the capabilities of thruster-driven AUVs and gliders. These combined characteristics make ReefGlider ideal for tasks such as long-term visual and acoustic monitoring of coral reefs. We present the overall design and implementation of the system, as well as provide analysis of some of its capabilities.
|
|
16:30-18:00, Paper TuCT22-NT.5 | Add to My Program |
Robust Model Predictive Control with Control Barrier Functions for Autonomous Surface Vessels |
|
Wang, Wei | University of Wisconsin-Madison |
Xiao, Wei | MIT |
Gonzalez-Garcia, Alejandro | KU Leuven |
Swevers, Jan | KU Leuven |
Ratti, Carlo | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: Marine Robotics, Field Robots, Optimization and Optimal Control
Abstract: In autonomous robot navigation, the trajectories from path planners are considered to be safe regions, and deviations could endanger vessels. Model Predictive Control (MPC) stands as a popular choice for trajectory tracking problems as it naturally addresses operational constraints, such as dynamics and control constraints. Nevertheless, achieving robustness in changing environments like oceans and rivers, which are constantly subject to significant external disturbances, remains an ongoing challenge for MPC. It must consistently keep the system within a predefined safe region (such as a reference trajectory) even in the presence of model inaccuracies and perturbations. To address this challenge, we present a robust model predictive control strategy utilizing Control Barrier Functions (CBFs), which increases the disturbance-rejection abilities. We verify our method on an autonomous surface vessel in simulation and natural waters, both with external disturbances. Specifically, compared with the traditional MPC method, our proposed MPC-CBF strategy reduces tracking errors by 17.82% and 40.26% in simulations and field experiments, respectively. Although the control effort slightly increases by 7.78% and 4.20%, respectively, these results clearly demonstrate the enhanced resilience of MPC-CBF to disturbances.
|
|
16:30-18:00, Paper TuCT22-NT.6 | Add to My Program |
Design and Optimization of a Multimode Amphibious Robot with Propeller-Leg |
|
Ma, Xinmeng | Harbin Engineering University |
Wang, Gang | Harbin Engineering University |
Kaixin, Liu | Harbin Engineering University |
Keywords: Marine Robotics, Field Robots, Robotics in Hazardous Fields
Abstract: This paper describes a novel multimode motion robot named SHOALBOT,which with multimode operations depends on only one type of propulsiondevice and can work flexibly in the amphibious environment. Robots that work in water need to minimize the number of drive components to improve reliability and reduce communication pressure, our unique design enables the robot to rely on a kind of propulsion device named propeller-leg to have the ability to run rapidly in the seaside and seabed, and swim with multi degrees of freedom in the water (contains only four driving elements). We analyzed and optimized propeller-leg by simulation combined with the open water test, the propeller-leg's thrust in the water before and after optimization differs by 400% according to the test results. We optimized the minimum difference between the forward and reverse thrust of the propeller-leg to improve the stability of the robot movement process, and optimized the difference from 25% to 3%. This paper provides sufficient technical details and completeness, and through a series of experiments validated that the SHOALBOT has excellent movement ability in the amphibious environment.
|
|
16:30-18:00, Paper TuCT22-NT.7 | Add to My Program |
Underwater Dome-Port Camera Calibration: Modeling of Refraction and Offset through N-Sphere Camera Model |
|
Roznere, Monika | Dartmouth College |
Pediredla, Adithya | Dartmouth College |
Lensgraf, Samuel | Dartmouth College |
Girdhar, Yogesh | Woods Hole Oceanographic Institution |
Quattrini Li, Alberto | Dartmouth College |
Keywords: Marine Robotics, Calibration and Identification
Abstract: The optical effects that are observed in underwater imagery are more complex than those in-air. This is partially because we enclose most underwater cameras in a watertight enclosure, such as a hemispheric dome window. We then observe optical issues including the distortion effects of the lens, e.g., wide-angle field-of-view (FOV), the refractive effects at the enclosure (water-acrylic and acrylic-air) interfaces, and offset effects of a non-centered camera with respect to the dome. In this paper, we present an N-Sphere (NS) and Shifted N-Sphere (S-NS) camera models, tailored to these cameras and lenses mounted in water-tight dome enclosures. The proposed camera models treat each layer of effects as a ‘sphere’ that a 3D point will project on. Furthermore, the S-NS model includes additional parameters to address the camera offset variability. The versatility of the NS model makes it applicable to various lenses, as validated with fisheye (FOV >120 deg) and wide-FOV (FOV ~120 deg). We validated our models with different in-water calibration sequences, lenses, and housing setups, as well as with comparisons with other state-of-the-art camera models. Additionally, we demonstrated the performance of our proposed models in an example stereo-based visual odometry application. The low computational load of the proposed models makes it ideal for integrating in real-time visual navigation and reconstruction frameworks. We provide full math derivations of the proposed models as well as example C++ header files for easy incorporation in independent projects.
|
|
16:30-18:00, Paper TuCT22-NT.8 | Add to My Program |
Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration |
|
Ling, Li | KTH Royal Institute of Technology |
Zhang, Jun | KTH Royal Institute of Technology |
Bore, Nils | KTH Royal Institute of Technology |
Folkesson, John | KTH |
Wåhlin, Anna | University of Gothenburg |
Keywords: Marine Robotics, Software Tools for Benchmarking and Reproducibility, Data Sets for SLAM
Abstract: Deep learning has shown promising results for multiple 3D point cloud registration datasets. However, in the underwater domain, most registration of multibeam echo-sounder (MBES) point cloud data are still performed using classical methods in the iterative closest point (ICP) family. In this work, we curate and release DotsonEast, a semi-synthetic MBES registration dataset constructed from an autonomous underwater vehicle in West Antarctica. Using this dataset, we systematically benchmark the performance of 2 classical and 4 learning-based methods. The experimental results show that the learning-based methods work well for coarse alignment, and are better at recovering rough transforms consistently at high overlap (20-50%). In comparison, GICP (a variant of ICP) performs well for fine alignment and is better across all metrics at extremely low overlap (10%). To the best of our knowledge, this is the first work to benchmark both learning-based and classical registration methods on an AUV-based MBES dataset. To facilitate future research, both the code and data are made available online.
|
|
16:30-18:00, Paper TuCT22-NT.9 | Add to My Program |
Angler: An Autonomy Framework for Intervention Tasks with Lightweight Underwater Vehicle Manipulator Systems |
|
Palmer, Evan | Oregon State University |
Holm, Chris | Oregon State University |
Hollinger, Geoffrey | Oregon State University |
Keywords: Marine Robotics, Software, Middleware and Programming Environments, Control Architectures and Programming
Abstract: Developing autonomous intervention capabilities for lightweight underwater vehicle manipulator systems (UVMS) has garnered significant attention within recent years because of the opportunity for these systems to reduce intervention operating costs. Developing autonomous UVMS capabilities is challenging, however, because of the lack of available standardized software frameworks and pipelines. Previous works offer simulation environments and deployment pipelines for underwater vehicles, but fall short of providing a complete UVMS software framework. We address this gap by creating Angler: a software framework for developing localization, control, and decision-making algorithms with support for sim-to-real transfer. We validate this framework by implementing a state-of-the-art control architecture and demonstrate the ability to perform station keeping with a mean error below 0.25 m and waypoint tracking with an average final error of 0.398 m.
|
|
TuCT23-NT Oral Session, NT-G401 |
Add to My Program |
Aerial Systems: Mechanics and Control III |
|
|
Chair: Foong, Shaohui | Singapore University of Technology and Design |
Co-Chair: Chirarattananon, Pakpong | City University of Hong Kong |
|
16:30-18:00, Paper TuCT23-NT.1 | Add to My Program |
Hitchhiker: A Quadrotor Aggressively Perching on a Moving Inclined Surface Using Compliant Suction Cup Gripper (I) |
|
Liu, Sensen | ShanghaiJiaotong University |
Wang, Zhaoying | Shanghai Jiao Tong University |
Sheng, Xinjun | Shanghai Jiao Tong University |
Dong, Wei | Shanghai Jiao Tong University |
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications
Abstract: Perching on the surface of moving objects, like vehicles, could extend the flight time and range of quadrotors. Suction cups are usually adopted for surface attachment due to their durability and large adhesive force. To seal on a surface, suction cups must be aligned with the surface and possess proper relative tangential velocity. However, quadrotors' attitude and relative velocity errors would become significant when the object's surface is moving and inclined. To address this problem, we proposed a real-time trajectory planning algorithm. The time optimal aggressive trajectory is efficiently generated through multimodal search in a dynamic time-domain. The velocity errors relative to the moving surface are alleviated. To further adapt to the residual errors, we design a compliant gripper using self-sealing cups. Multiple cups in different directions are integrated into a wheel-like mechanism to increase the tolerance to attitude errors. The wheel mechanism also eliminates the requirement of matching the attitude and tangential velocity. Extensive tests are conducted to perch on static and moving surfaces at various inclinations. Results demonstrate that our proposed system enables a quadrotor to reliably perch on moving inclined surfaces (up to 1.07m/s and 90◦) with a success rate of 70% or higher. The efficacy of the trajectory planner is also validated.
|
|
16:30-18:00, Paper TuCT23-NT.2 | Add to My Program |
Harnessing the Differential Flatness of Monocopter Dynamics for the Purpose of Trajectory Tracking in a Stable Invertible Coaxial Actuated ROtorcraft (SICARO) |
|
Tang, Emmanuel | Singapore University of Technology & Design |
Ang, Wei Jun | Singapore University of Technology & Design |
Tan, Kian Wee | Singapore University of Technology & Design |
Foong, Shaohui | Singapore University of Technology and Design |
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Biologically-Inspired Robots
Abstract: In this paper, the dynamics of an emerging class of rotating nature-inspired micro aerial vehicles known as the Monocopter is proven and shown to be differentially flat. By exploiting this phenomenon, trajectory tracking can now be implemented on Monocopters via feed-forward terms that are computed per the trajectory. To demonstrate this, a Monocopter in the form of a Stable Invertible Coaxial Actuated ROtorcraft (SICARO) is chosen to harness this approach fully. The SICARO is capable of flying with either side of the wing facing up and this feature determines the craft’s direction of rotation about its body Z axis as well. In addition, it has the unique feature of a coaxial motor configuration that allows for a pitching-up moment regardless of the wing side facing up. The feed-forward terms computed are fused into a cascaded nonlinear controller on the craft to ensure its effectiveness in tracking trajectories. Lastly, the flight experiments extend to both sides of the wing to validate this method as being applicable for trajectory tracking for Monocopters such as the SICARO which has an extended range of flying capabilities.
|
|
16:30-18:00, Paper TuCT23-NT.3 | Add to My Program |
Passive Aligning Physical Interaction of Fully-Actuated Aerial Vehicles for Pushing Tasks |
|
Hui, Tong | Technical University of Denmark |
Cuniato, Eugenio | ETH Zurich |
Pantic, Michael | ETH Zürich |
Tognon, Marco | Inria Rennes |
Fumagalli, Matteo | Danish Technical University |
Siegwart, Roland | ETH Zurich |
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Dynamics
Abstract: Recently, the utilization of aerial manipulators for performing pushing tasks in non-destructive testing (NDT) applications has seen significant growth. Such operations entail physical interactions between the aerial robotic system and the environment. End-effectors with multiple contact points are often used for placing NDT sensors in contact with a surface to be inspected. Aligning the NDT sensor and the work surface while preserving contact, requires that all available contact points at the end-effector tip are in contact with the work surface. With a standard full-pose controller, attitude errors often occur due to perturbations caused by modeling uncertainties, sensor noise, and environmental uncertainties. Even small attitude errors can cause a loss of contact points between the end-effector tip and the work surface. To preserve full alignment amidst these uncertainties, we propose a control strategy which selectively deactivates angular motion control and enables direct force control in specific directions. In particular, we derive two essential conditions to be met, such that the robot can passively align with flat work surfaces achieving full alignment through the rotation along non-actively controlled axes. Additionally, these conditions serve as hardware design and control guidelines for effectively integrating the proposed control method for practical usage. Real world experiments are conducted to validate both the control design and the guidelines.
|
|
16:30-18:00, Paper TuCT23-NT.4 | Add to My Program |
Modeling and Control of PADUAV: A Passively Articulated Dual UAVs Platform for Aerial Manipulation |
|
Sun, Jiali | Beijing Institute of Technology |
Wang, Kaidi | Beijing Institute of Technology |
Shi, Chuanbeibei | Univeristy of Bristol |
Li, Xiujia | Beijing Institute of Technology |
Yi, Xiaojian | Beijing Institute of Technology |
Yu, Yushu | Beijing Institute of Technology |
Sun, Fuchun | Tsinghua University |
Dong, Yiqun | Nanyang Technological University |
Keywords: Aerial Systems: Mechanics and Control, Control Architectures and Programming, Aerial Systems: Applications
Abstract: In this paper, we introduce PADUAV, a novel 5-DOF aerial platform designed to overcome the limitations of traditional tiltrotor vehicles. PADUAV features a unique mechanical design that incorporates two off-the-shelf quadrotors passively articulated to a rigid frame. This innovation enables free pitch rotation without mechanical constraints like cable winding, significantly enhancing its capabilities for various tasks. To control PADUAV's 5 degrees of freedom, we propose a versatile and straightforward 5-DOF geometric tracking control strategy that generates 2D force and 3D torque. A decomposition approach is designed to distribute the output to the torque and thrust commands for each subplane, with no need for complex optimization. We validate our approach through three simulation experiments conducted in the Gazebo environment, leveraging the utilities provided by the RotorS simulator. These experiments not only demonstrate the feasibility of our platform but also provide new perspectives for future aerial platform development, particularly in terms of simulation-based approaches.
|
|
16:30-18:00, Paper TuCT23-NT.5 | Add to My Program |
Particle Filter with Stable Embedding for State Estimation of the Rigid Body Attitude System on the Set of Unit Quaternions |
|
Jang, Hee-Deok | Korea Advanced Institute of Science Technology |
Park, Jae-Hyeon | Korea Advanced Institute of Science and Technology (KAIST) |
Chang, Dong Eui | KAIST |
Keywords: Aerial Systems: Mechanics and Control, Kinematics, Localization
Abstract: This paper presents a novel method for state estimation of rigid body attitude system evolving on the manifold, S3, which is crucial in robotics and drone applications. We introduce a particle filter with stable embedding that extends the system into Euclidean space while ensuring stability of the manifold. Our particle filter with stable embedding enables accurate state estimation by maintaining estimated state values in close proximity to the manifold, while requiring significantly fewer computational resources than the standard exponential-map-based method that keeps state estimates on the manifold. Furthermore, our method facilitates the application of usual techniques designed for particle filters in Euclidean spaces, to the manifold system, as is, without any modification. The accuracy and the efficiency of our particle filter are confirmed both by simulation and by real drone experiments.
|
|
16:30-18:00, Paper TuCT23-NT.6 | Add to My Program |
End-To-End Reinforcement Learning for Time Optimal Quadcopter Flight |
|
Ferede, Robin | TU Delft |
De Wagter, Christophe | Delft University of Technology |
Izzo, Dario | European Space Agency |
de Croon, Guido | TU Delft |
Keywords: Aerial Systems: Mechanics and Control, Machine Learning for Robot Control, Reinforcement Learning
Abstract: Aggressive time-optimal control of quadcopters poses a significant challenge in the field of robotics. The state-of-the-art approach leverages reinforcement learning (RL) to train optimal neural policies. However, a critical hurdle is the sim-to-real gap, often addressed by employing a robust inner loop controller —an abstraction that, in theory, constrains the optimality of the trained controller, necessitating margins to counter potential disturbances. In contrast, our novel approach introduces high-speed quadcopter control using end-to-end RL (E2E) that gives direct motor commands. To bridge the reality gap, we incorporate a learned residual model and an adaptive method that can compensate for modeling errors in thrust and moments. We compare our E2E approach against a state-of-the-art network that commands thrust and body rates to an INDI inner loop controller, both in simulated and real-world flight. E2E showcases a significant 1.39-second advantage in simulation and a 0.17-second edge in real-world testing, highlighting end-to-end reinforcement learning's potential. The performance drop observed from simulation to reality shows potential for further improvement, including refining strategies to address the reality gap or exploring offline reinforcement learning with real flight data.
|
|
16:30-18:00, Paper TuCT23-NT.7 | Add to My Program |
Quadrolltor: A Reconfigurable Quadrotor with Controlled Rolling and Turning |
|
Jia, Huaiyuan | City University of Hong Kong |
Ding, Runze | City University of Hongkong |
Dong, Kaixu | City University of Hong Kong |
Bai, Songnan | City University of Hong Kong |
Chirarattananon, Pakpong | City University of Hong Kong |
Keywords: Aerial Systems: Mechanics and Control, Mechanism Design, Wheeled Robots
Abstract: This letter reports an aerial robot--Quadrolltor, with the ability to roll and turn. Existing bimodal quadrotors feature cylindrical rolling cages that are rotationally decoupled from the robot's main rigid body. In contrast, the proposed robot employs passively reconfigurable structures to enable the second mode of locomotion, tightly coupling the attitude of the robot to the rolling cage. The benefits are precise rolling and turning control as well as improved rolling efficiency. Experiments were conducted to comprehensively validate the hybrid locomotion. The robot leveraged the superior maneuverability in the rolling mode to take photos of the surroundings at different tilting and panning angles to construct a panoramic image. Besides, the results of the power measurements show a significant reduction in the cost of transport brought by rolling, equating to a 15-fold extension in the operational range.
|
|
16:30-18:00, Paper TuCT23-NT.8 | Add to My Program |
Robust Control for Bidirectional Thrust Quadrotors under Instantaneously Drastic Disturbances |
|
Chen, Zujian | Shenzhen University |
Shaolin, Mo | Sun Yat-Sen University |
Zhang, Botao | Sun Yat-Sen University |
Cheng, Hui | Sun Yat-Sen University |
Li, Jiyu | College of Engineering, South China Agricultural University |
Keywords: Aerial Systems: Mechanics and Control, Robust/Adaptive Control, Aerial Systems: Applications
Abstract: Quadrotors may crash and cause severe accidents under instantaneously drastic disturbances. To mitigate the effect of such disturbances, these critical issues should be considered: efficient disturbance observation and compensation, full attitude controllability, and instant output power generation of the quadrotor. In this paper, to keep the quadrotor stable even under suddenly drastic disturbances, a novel control framework is presented to by integrating the advantages of active disturbance rejection control (ADRC) as well as geometric control for a quadrotor with bidirectional thrust capabilities. Moreover, to strengthen the adaptability under significant disturbances, a novel switching strategy is introduced into the control framework by virtue of the quadrotor’s bidirectional thrust capabilities. The ADRC scheme is performed when the disturbances are within a range; alternatively, if the disturbances surpass the preset range and the desired control is beyond the ultimate output of the quadrotor, the quadrotor compliantly responds by executing a 180^circ flip reverse flight to handle such drastic disturbances. Numerical and real-world experiments demonstrate that the proposed robust control strategy has superior performance adapts to instantaneously drastic disturbances.
|
|
TuCT24-NT Oral Session, NT-G402 |
Add to My Program |
Field Robot Systems |
|
|
Chair: Shim, David Hyunchul | KAIST |
Co-Chair: Liarokapis, Minas | The University of Auckland |
|
16:30-18:00, Paper TuCT24-NT.1 | Add to My Program |
Topological Exploration Using Segmented Map with Keyframe Contribution in Subterranean Environments |
|
Kim, Boseong | Korea Advanced Institute of Science and Technology (KAIST) |
Seong, Hyunki | KAIST |
Shim, David Hyunchul | KAIST |
Keywords: Field Robots, Aerial Systems: Applications, Aerial Systems: Perception and Autonomy
Abstract: Existing exploration algorithms mainly generate frontiers using random sampling or motion primitive methods within a specific sensor range or search space. However, frontiers generated within constrained spaces lead to back-and-forth maneuvers in large-scale environments, thereby diminishing exploration efficiency. To address this issue, we propose a method that utilizes a 3D dense map to generate Segmented Exploration Regions (SERs) and generate frontiers from a global-scale perspective. In particular, this paper presents a novel topological map generation approach that fully utilizes Line-of-Sight (LOS) features of LiDAR sensor points to enhance exploration efficiency inside large-scale subterranean environments. Our topological map contains the contributions of keyframes that generate each SER, enabling rapid exploration through a switch between local path planning and global path planning to each frontier. The proposed method achieved higher explored volume generation than the state-of-the-art algorithm in a large-scale simulation environment and demonstrated a 62% improvement in explored volume increment performance. For validation, we conducted field tests using UAVs in real subterranean environments, demonstrating the efficiency and speed of our method.
|
|
16:30-18:00, Paper TuCT24-NT.2 | Add to My Program |
A Powerline Inspection UAV Equipped with Dexterous, Lockable Gripping Mechanisms for Autonomous Perching and Contact Rolling |
|
Lynch, Angus | University of Auckland |
Duguid, Corey | University of Auckland |
Buzzatto, Joao | The University of Auckland |
Liarokapis, Minas | The University of Auckland |
Keywords: Field Robots, Aerial Systems: Mechanics and Control
Abstract: Inspection of powerlines is a hard problem that requires humans to operate in remote locations and dangerous conditions. This paper proposes a quadcopter unmanned aerial vehicle (UAV) equipped with rolling-capable perching mechanisms and a depth-vision system for the purpose of autonomous power line inspection. The perching mechanism grips onto the power line, allowing the UAV to withstand external forces such as wind disturbances. Once engaged and applying the desired gripping force, the perching mechanism requires no power through the use of a ratcheting serial elastic transmission, allowing the UAV to perch indefinitely. The depth-vision system automates the perching and unperching procedures by estimating the position and pose of the UAV relative to the powerline. These measurements are sent to a local position controller that guides the UAV to and from the power line. Once perched, rollers in the fingers of the perching mechanism drive the UAV along the powerline, providing a close-up platform for inspection equipment. The proposed system was tested in an outdoor testing environment and shown to autonomously perch and unperch from a steel cable. The grippers force application was analysed and the UAVs powerless robust perch is demonstrated by total disconnect of power while perched. These results suggest that such a system could be a valuable tool for the upkeep of electricity networks.
|
|
16:30-18:00, Paper TuCT24-NT.3 | Add to My Program |
GIRA: Gaussian Mixture Models for Inference and Robot Autonomy |
|
Goel, Kshitij | Carnegie Mellon University |
Tabib, Wennie | Carnegie Mellon University |
Keywords: Field Robots, Aerial Systems: Perception and Autonomy, Multi-Robot SLAM
Abstract: This paper introduces the open-source framework, GIRA, which implements fundamental robotics algorithms for reconstruction, pose estimation, and occupancy modeling using compact generative models. Compactness enables perception in the large by ensuring that the perceptual models can be communicated through low-bandwidth channels during large-scale mobile robot deployments. The generative property enables perception in the small by providing high-resolution reconstruction capability. These properties address perception needs for diverse robotic applications, including multi-robot exploration and dexterous manipulation. State-of-the-art perception systems construct perceptual models via multiple disparate pipelines that reuse the same underlying sensor data, which leads to increased computation, redundancy, and complexity. GIRA bridges this gap by providing a unified perceptual modeling framework using Gaussian mixture models (GMMs) as well as a novel systems contribution, which consists of GPU-accelerated functions to learn GMMs 10-100x faster compared to existing CPU implementations. Because few GMM-based frameworks are open-sourced, this work seeks to accelerate innovation and broaden adoption of these techniques.
|
|
16:30-18:00, Paper TuCT24-NT.4 | Add to My Program |
Hybrid Trajectory Optimization for Autonomous Terrain Traversal of Articulated Tracked Robots |
|
Xu, Zhengzhe | Harbin Institute of Technology, Shenzhen |
Chen, Yanbo | Tsinghua University |
Jian, Zhuozhu | Tsinghua University |
Tan, Junbo | Tsinghua University |
Wang, Xueqian | Center for Artificial Intelligence and Robotics, Graduate School |
Liang, Bin | Center for Artificial Intelligence and Robotics, Graduate School |
Keywords: Field Robots, Autonomous Vehicle Navigation, Optimization and Optimal Control
Abstract: Autonomous terrain traversal of articulated tracked robots can reduce operator cognitive load to enhance task efficiency and facilitate extensive deployment. We present a novel hybrid trajectory optimization method aimed at generating efficient, stable, and smooth traversal motions. To achieve this, we develop a planar robot-terrain contact model and divide the robot’s motion into hybrid modes of driving and traversing. By using a generalized coordinate description, the configuration space dimension is reduced, which facilitates real-time planning. The hybrid trajectory optimization is transcribed into a nonlinear programming problem and divided into subproblems to be solved in a receding-horizon planning fashion. Mode switching is facilitated by associating optimized motion durations with a predefined traversal sequence. A multi-objective cost function is formulated to further improve the traversal performance. Additionally, map sampling, terrain simplification, and tracking controller modules are integrated into the autonomous terrain traversal system. Our approach is validated in simulation and real-world scenarios with the Searcher robotic platform. Comparative experiments with expert operator control and state-of-the-art methods show advantages in terms of time and energy efficiency, stability, and smoothness of motion.
|
|
16:30-18:00, Paper TuCT24-NT.5 | Add to My Program |
X-ICP: Localizability-Aware LiDAR Registration for Robust Localization in Extreme Environments |
|
Tuna, Turcan | ETH Zurich, Robotic Systems Lab |
Nubert, Julian | ETH Zürich |
Nava Chocrón, Yoshua Alfredo | ANYbotics AG |
Khattak, Shehryar | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Field Robots, Localization, Mapping, Point Cloud Registration
Abstract: LiDAR-based localization methods, such as the Iterative Closest Point(ICP) algorithm, can suffer in geometrically uninformative environments that are known to deteriorate registration performance and push optimization toward divergence along weakly constrained directions. To overcome this issue, this work proposes i) a robust multi-category (non-)localizability detection module, and ii) a localizability-aware constrained ICP optimization module and couples both in a unified manner. The proposed localizability detection is achieved by utilizing the correspondences between the scan and the map to analyze the alignment strength against the principal directions of the optimization as part of its multi-category LiDAR localizability analysis. In the second part, this localizability analysis is then integrated into the scan-to-map point cloud registration to generate drift-free pose updates by enforcing controlled updates or leaving the degenerate directions of the optimization unchanged. The proposed method is thoroughly evaluated and compared to state-of-the-art methods in simulation and during real-world experiments, underlying the gain in performance and reliability.
|
|
16:30-18:00, Paper TuCT24-NT.6 | Add to My Program |
Seabed Intervention with an Underwater Legged Robot |
|
Picardi, Giacomo | Instituto De Ciencias Del Mar (ICM)—Consejo Superior De Investig |
Astolfi, Anna | Scuola Superiore Sant'Anna |
Calisti, Marcello | The University of Lincoln |
Keywords: Field Robots, Marine Robotics, Legged Robots
Abstract: Efficiently performing intervention tasks underwater is crucial in various commercial and scientific sectors; however, propeller-driven vehicles face limitations due to their floating nature. In Remotely Operated Vehicles (ROVs) operations, this can be compensated by the ability of the operator, but they come with high operational costs. Instead, Autonomous Underwater Vehicles (AUVs) have shown promise, but demonstrated intervention tasks are limited to controlled environments or docked. To address these limitations, we focused on the use of Underwater Legged Robots (ULRs), which offer greater stability and agile seabed mobility thanks to their legged propulsion system. This paper presents the field demonstration of teleoperated pick-and-place tasks using the ULR SILVER2 for which a novel stance control, Graphic User Interface (GUI), and tendon-driven gripper have been developed based on the lessons learned through several hours of field use. The methodology is validated through four field trials, including missions in both shallow water and open sea environments. The trials involve picking and placing various objects, such as plastic bottles, bags, and cans. The results demonstrate successful teleoperated object grasping and manipulation in real-world conditions, with collection times ranging from a few minutes to around ten minutes. Overall, this research contributes to advancing the capabilities of ULRs and lays the foundation for future underwater intervention missions in various scientific and industrial applications, aligning with the goals of the Decade of Ocean Science for Sustainable Development.
|
|
16:30-18:00, Paper TuCT24-NT.7 | Add to My Program |
Predicting against the Flow: Boosting Source Localization by Means of Field Belief Modeling Using Upstream Source Proximity |
|
Busch, Finn Lukas | Hamburg University of Technology |
Bauschmann, Nathalie | Hamburg University of Technology |
Haddadin, Sami | Technical University of Munich |
Seifried, Robert | Hamburg University of Technology |
Duecker, Daniel Andre | Technical University of Munich (TUM) |
Keywords: Field Robots, Marine Robotics, Robotics in Hazardous Fields
Abstract: Time-effective and accurate source localization with mobile robots is crucial in safety-critical scenarios, e.g. leakage detection. This becomes particular challenging in realistic cluttered scenarios, i.e. in the presence of complex current flows or wind. Traditional methods often fall short due to simplifications or limited onboard resources. We propose to combine source localization with a Gaussian Markov Random Field (GMRF). This allows to improve source localization hypotheses by building on the GMRF's concentration and flow field belief that are continuously updated by gathered measurements. We introduce the upstream source proximity (USP) as a natural metric that exploits the joint knowledge represented in the field belief's concentration and flow field, i.e. predicting sources upstream. As a result, our method yields a computationally efficient source localization and field belief module providing substantially more stable gradients than conventional concentration gradient-based methods. We demonstrate the suitability of our approach in a series of numerical experiments covering complex source location scenarios. With regard to computational requirements, the method achieves update rates of 10 Hz on a RaspberryPi 4B.
|
|
16:30-18:00, Paper TuCT24-NT.8 | Add to My Program |
A Turning Radius Prediction Scheme for Sailing Robots under Complex Marine Environment |
|
Qi, Weimin | The Chinese University of Hong Kong, Shenzhen |
Sun, Qinbo | The Chinese Univeristy of Hong Kong, Shenzhen |
Qian, Huihuan (Alex) | The Chinese University of Hong Kong, Shenzhen |
Keywords: Field Robots, Marine Robotics, Underactuated Robots
Abstract: This paper presents a strategy for predicting the turning radius of a sailing robot with consideration of aerodynamic and hydrodynamic interferences from the marine environment. The turning radius is initially obtained based on three consecutive designated points during the turning process, which is regarded as the baseline method. Subsequently, on the basis of our constructed turning datasets, a model is trained using Gaussian process regression (GPR) to achieve radius prediction. The feasibility and effectiveness of the proposed scheme have been validated in both simulation and experiments (conducted with OceanVoy as shown in Fig.1). Under experimental circumstances, the Mean Absolute Error (MAE) of the turning radius produced by the trained prediction model is 0.58m. Furthermore, it has been observed that during long-term sailing covering a distance of 1200km, apart from wind speed and robot velocity, the tidal range also has a significant impact on the navigation of sailing robots.
|
|
16:30-18:00, Paper TuCT24-NT.9 | Add to My Program |
A Vision-Based Autonomous UAV Inspection Framework for Unknown Tunnel Construction Sites with Dynamic Obstacles |
|
Xu, Zhefan | Carnegie Mellon University |
Chen, Baihan | Carnegie Mellon University |
Zhan, Xiaoyang | Carnegie Mellon University |
Xiu, Yumeng | Carnegie Mellon University |
Suzuki, Christopher | Carnegie Mellon University |
Shimada, Kenji | Carnegie Mellon University |
Keywords: Field Robots, Motion and Path Planning, Aerial Systems: Perception and Autonomy
Abstract: Tunnel construction using the drill-and-blast method requires the 3D measurement of the excavation front to evaluate underbreak locations. Considering the inspection and measurement task's safety, cost, and efficiency, deploying lightweight autonomous robots, such as unmanned aerial vehicles (UAV), becomes more necessary and popular. Most of the previous works use a prior map for inspection viewpoint determination and do not consider dynamic obstacles. To maximally increase the level of autonomy, this paper proposes a vision-based UAV inspection framework for dynamic tunnel environments without using a prior map. Our approach utilizes a hierarchical planning scheme, decomposing the inspection problem into different levels. The high-level decision maker first determines the task for the robot and generates the target point. Then, the mid-level path planner finds the waypoint path and optimizes the collision-free static trajectory. Finally, the static trajectory will be fed into the low-level local planner to avoid dynamic obstacles and navigate to the target point. Besides, our framework contains a novel dynamic map module that can simultaneously track dynamic obstacles and represent static obstacles based on an RGB-D camera. After inspection, the Structure-from-Motion (SfM) pipeline is applied to generate the 3D shape of the target. To our best knowledge, this is the first time autonomous inspection has been realized in unknown and dynamic tunnel environments.
|
|
TuCT25-NT Oral Session, NT-G403 |
Add to My Program |
Localization III |
|
|
Chair: Tardos, Juan D. | Universidad De Zaragoza |
Co-Chair: Montiel, J.M.M | I3A. Universidad De Zaragoza |
|
16:30-18:00, Paper TuCT25-NT.1 | Add to My Program |
WayIL: Image-Based Indoor Localization with Wayfinding Maps |
|
Kwon, Obin | Seoul Natl University |
Jung, Dongki | NAVER LABS |
Kim, Youngji | NAVER Labs |
Ryu, Soohyun | NAVER LABS |
Yeon, Suyong | NAVER LABS |
Oh, Songhwai | Seoul National University |
Lee, Donghwan | Naverlabs |
Keywords: Localization, Deep Learning for Visual Perception
Abstract: This paper tackles a localization problem in large-scale indoor environments with wayfinding maps. A wayfinding map abstractly portrays the environment, and humans can localize themselves based on the map. However, when it comes to using it for robot localization, large geometrical discrepancies between the wayfinding map and the real world make it hard to use conventional localization methods. Our objective is to estimate a robot pose within a wayfinding map, utilizing RGB images from perspective cameras. We introduce two different imagination modules which are inspired by how humans can comprehend and interpret their surroundings for localization purposes. These modules jointly learn how to effectively observe the first-person-view (FPV) world to interpret bird-eye-view (BEV) maps. Providing explicit guidance to the two imagination modules significantly improves the precision of the localization system. We demonstrate the effectiveness of the proposed approach using real-world datasets, which are collected from various large-scale crowded indoor environments. The experimental results show that, in 85% of scenarios, the proposed localization system can estimate its pose within 3m in large indoor spaces. Project Site: https://rllab-snu.github.io/projects/WayIL/
|
|
16:30-18:00, Paper TuCT25-NT.2 | Add to My Program |
TransAPR: Absolute Camera Pose Regression with Spatial and Temporal Attention |
|
Qiao, Chengyu | Zhejiang University |
Xiang, Zhiyu | Zhejiang University |
Fan, Yuangang | Zhejiang University |
Bai, Tingming | Zhejiang University |
Zhao, Xijun | China North Vehicle Research Institute, China North Artificial I |
Fu, Jingyun | Zhejiang University |
Keywords: Localization, Deep Learning for Visual Perception
Abstract: Visual relocalization aims to estimate the absolute camera pose from an image or sequential images. Recent works tackle this problem by exploiting deep neural networks to regress camera poses. However, spatial and temporal clues from sequential images still remain underexplored, resulting in inaccurate poses and large outliers. In this work, we introduce a novel vision Transformer based absolute pose regression model, TransAPR, to tackle this problem. Upon the traditional CNN backbone, we design Transformer based spatial and temporal fusion modules respectively to realize sufficient feature interaction among the neighboring images in the sequence. A hierarchical feature aggregation (HFA) module is further designed to aggregate multi-scale and multi-level features in the pose regressor. Benefiting from these delicate designs, our model is able to generate reliable image representations for absolute pose regression, resulting in more robust localization under challenging environments. We conduct extensive experiments on various indoor and outdoor datasets and show that our method achieves state-of-the-art performance.
|
|
16:30-18:00, Paper TuCT25-NT.3 | Add to My Program |
Globalizing Local Features: Image Retrieval Using Shared Local Features with Pose Estimation for Faster Visual Localization |
|
Wenzheng, Song | Tohoku University |
Yan, Ran | Megvii |
Lei, Boshu | University of Pennsylvania |
Okatani, Takayuki | Tohoku University |
Keywords: Localization, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Visual localization is an important sub-task in SfM and visual SLAM that involves estimating a 6-DoF camera pose for an input query image relative to a given 3D model of the environment. The most accurate approach is a hierarchical one that splits the task into two stages: image retrieval and camera pose estimation. Each stage requires different image features, with global features compactly encoding holistic image information for the first stage and local features encoding the appearance around salient image points for the second stage. While existing methods use independent networks to extract these features, one for global and one for local, this strategy is suboptimal in terms of computational efficiency. In this paper, we propose a novel approach that achieves state-of-the-art inference accuracy with significantly improved efficiency. Our approach’s core component is SuperGF, a network that aggregates local features optimized for camera pose estimation to create a global feature that enables precise image retrieval. Through extensive experiments on the standard benchmark tests, we demonstrate that the method offers a better trade-off between accuracy and computational cost.
|
|
16:30-18:00, Paper TuCT25-NT.4 | Add to My Program |
Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization |
|
Chen, Le | Max Planck Institute for Intelligent Systems |
Chen, Weirong | ETH Zurich |
Wang, Rui | Technical University of Munich |
Pollefeys, Marc | ETH Zurich |
Keywords: Localization, Deep Learning for Visual Perception, Visual Learning
Abstract: As a promising fashion for visual localization, scene coordinate regression (SCR) has seen tremendous progress in the past decade. Most recent methods usually adopt neural networks to learn the mapping from image pixels to 3D scene coordinates, which requires a vast amount of annotated training data. We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for SCR. Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain, which can hinder the regression accuracy or bring unnecessary computational costs with redundant data. These challenges are addressed in three folds in this paper: (1) A NeRF is designed to separately predict uncertainties for the rendered color and depth images, which reveal data reliability at the pixel level. (2) SCR is formulated as deep evidential learning with epistemic uncertainty, which is used to evaluate information gain and scene coordinate quality. (3) Based on the three arts of uncertainties, a novel view selection policy is formed that significantly improves data efficiency. Experiments on public datasets demonstrate that our method could select the samples that bring the most information gain and promote the performance with the highest efficiency.
|
|
16:30-18:00, Paper TuCT25-NT.5 | Add to My Program |
JIST: Joint Image and Sequence Training for Sequential Visual Place Recognition |
|
Berton, Gabriele | Politecnico Di Torino |
Trivigno, Gabriele | Polytechnic of Turin |
Caputo, Barbara | Sapienza University |
Masone, Carlo | Politecnico Di Torino |
Keywords: Localization, Deep Learning for Visual Perception, Visual Learning
Abstract: Visual Place Recognition aims at recognizing previously visited places by relying on visual clues, and it is used in robotics applications for SLAM and localization. Since typically a mobile robot has access to a continuous stream of frames, this task is naturally cast as a sequence-to-sequence localization problem. Nevertheless, obtaining sequences of labelled data is much more expensive than collecting isolated images, which can be done in an automated way with little supervision. As a mitigation to this problem, we propose a novel Joint Image and Sequence Training protocol (JIST) that leverages large uncurated sets of images through a multi-task learning framework. With JIST we also introduce SeqGeM, an aggregation layer that revisits the popular GeM pooling to produce a single robust and compact embedding from a sequence of single-frame embeddings. We show that our model is able to outperform previous state of the art while being faster, using 8 times smaller descriptors, having a lighter architecture and allowing to process sequences of various lengths. The code is available at https://github.com/ga1i13o/JIST
|
|
16:30-18:00, Paper TuCT25-NT.6 | Add to My Program |
OptiState: State Estimation of Legged Robots Using Gated Networks with Transformer-Based Vision and Kalman Filtering |
|
Schperberg, Alexander | University of California Los Angeles |
Tanaka, Yusuke | University of California, Los Angeles |
Mowlavi, Saviz | Mitsubishi Electric Research Laboratories |
Xu, Feng | UCLA |
Balaji, Bharathan | Amazon |
Hong, Dennis | UCLA |
Keywords: Localization, Deep Learning Methods, Legged Robots
Abstract: State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState
|
|
16:30-18:00, Paper TuCT25-NT.7 | Add to My Program |
Pose-Graph Attentional Graph Neural Network for Lidar Place Recognition |
|
Ramezani, Milad | CSIRO |
Wang, Liang | University of Queensland |
Knights, Joshua Barton | Queensland University of Technology |
Li, Zhibin | CSIRO |
Pounds, Pauline | The University of Queensland |
Moghadam, Peyman | CSIRO |
Keywords: Localization, Deep Learning Methods, Recognition
Abstract: This paper proposes a pose-graph attentional graph neural network, called P-GAT, which compares (key)nodes between sequential and non-sequential sub-graphs for place recognition tasks as opposed to a common frame-to-frame retrieval problem formulation currently implemented in SOTA place recognition methods. P-GAT uses the maximum spatial and temporal information between neighbour cloud descriptors — generated by an existing encoder— utilising the concept of pose-graph SLAM. Leveraging intra- and inter-attention and graph neural network, P-GAT relates point clouds captured in nearby locations in Euclidean space and their embeddings in feature space. Experimental results on the large-scale publically available datasets demonstrate the effectiveness of our approach in scenes lacking distinct features and when training and testing environments have different distributions (domain adaptation). Further, an exhaustive comparison with the state-of-the-art shows improvements in performance gains. Code is available at https://github.com/csiro-robotics/P-GAT.
|
|
16:30-18:00, Paper TuCT25-NT.8 | Add to My Program |
ColonMapper: Topological Mapping and Localization for Colonoscopy |
|
Morlana, Javier | Universidad De Zaragoza, CIF: ESU5018001G, C/ Pedro Cerbuna 12 |
Tardos, Juan D. | Universidad De Zaragoza |
Montiel, J.M.M | I3A. Universidad De Zaragoza |
Keywords: Localization, Mapping, Deep Learning for Visual Perception
Abstract: We propose a topological mapping and localization system able to operate on real human colonoscopies, despite significant shape and illumination changes. The map is a graph where each node codes a colon location by a set of real images, while edges represent traversability between nodes. For close-in-time images, where scene changes are minor, place recognition can be successfully managed with the recent transformers-based local feature matching algorithms. However, under long-term changes --such as different colonoscopies of the same patient-- feature-based matching fails. To address this, we train on real colonoscopies a deep global descriptor achieving high recall with significant changes in the scene. The addition of a Bayesian filter boosts the accuracy of long-term place recognition, enabling relocalization in a previously built map. Our experiments show that ColonMapper is able to autonomously build a map and localize against it in two important use cases: localization within the same colonoscopy or within different colonoscopies of the same patient. Code will be available upon acceptance.
|
|
16:30-18:00, Paper TuCT25-NT.9 | Add to My Program |
Simultaneous Localization and Actuation Using Electromagnetic Navigation Systems |
|
von Arx, Denis | ETH Zurich |
Fischer, Cedric | ETH Zurich |
Torlakcik, Harun | ETHZ |
Pané, Salvador | ETH Zurich |
Nelson, Bradley J. | ETH Zurich |
Boehler, Quentin | ETH Zurich |
Keywords: Localization, Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Electromagnetic Actuation
Abstract: Remote magnetic navigation provides a promising approach for improving the maneuverability and safety of surgical tools, such as catheters and endoscopes, in complex anatomies. The lack of existing localization systems compatible with this modality, beyond fluoroscopy and its harmful ionizing radiation, impedes its translation to clinical practice. To address this challenge, we propose a localization method that achieves full pose estimation by superimposing oscillating magnetic fields for localization onto actuation fields generated by an electromagnetic navigation system. The resulting magnetic field is measured using a three-axis magnetic field sensor embedded in the magnetic device to be localized. The method is evaluated on a three-coil system, and simultaneous actuation and localization is demonstrated with a magnetic catheter prototype with a Hall-effect sensor embedded at its tip. We demonstrate position estimation with mean accuracy and precision below 1 mm, and orientation estimation with mean errors below 2 deg at 10 Hz in a workspace of 80 x 80 x 60 mm.
|
|
TuCT26-NT Oral Session, NT-G404 |
Add to My Program |
Mapping II |
|
|
Co-Chair: Zhang, Fu | University of Hong Kong |
|
16:30-18:00, Paper TuCT26-NT.1 | Add to My Program |
Scene Action Maps: Behavioural Maps for Navigation without Metric Information |
|
Loo, Joel | National University of Singapore |
Hsu, David | National University of Singapore |
Keywords: Mapping
Abstract: Humans are remarkable in their ability to navigate without metric information. We can read abstract 2D maps, such as floor-plans or hand-drawn sketches, and use them to navigate in unseen rich 3D environments, without requiring prior traversals to map out these scenes in detail. We posit that this is enabled by the ability to represent the environment abstractly as interconnected navigational behaviours, e.g., “follow the corridor” or “turn right”, while avoiding detailed, accurate spatial information at the metric level. We introduce the Scene Action Map (SAM), a behavioural topological graph, and propose a learnable map-reading method, which parses a variety of 2D maps into SAMs. Map-reading extracts salient information about navigational behaviours from the overlooked wealth of pre-existing, abstract and inaccurate maps, ranging from floor-plans to sketches. We evaluate the performance of SAMs for navigation, by building and deploying a behavioural navigation stack on a quadrupedal robot. Videos and more information is available at: https://scene-action-maps.github.io
|
|
16:30-18:00, Paper TuCT26-NT.2 | Add to My Program |
Continuous Occupancy Mapping in Dynamic Environments Using Particles |
|
Chen, Gang | Delft University of Technology |
Dong, Wei | Shanghai Jiao Tong University |
Peng, Peng | Shanghai Jiao Tong University |
Alonso-Mora, Javier | Delft University of Technology |
Zhu, Xiangyang | Shanghai Jiao Tong University |
Keywords: Mapping, Aerial Systems: Perception and Autonomy, Collision Avoidance, Dynamic Environment
Abstract: Particle-based dynamic occupancy maps were proposed in recent years to model the obstacles in dynamic environments. Current particle-based maps describe the occupancy status in discrete grid form and suffer from the grid size problem, wherein a large grid size is unfavorable for motion planning while a small grid size lowers efficiency and causes gaps and inconsistencies. To tackle this problem, this paper generalizes the particle-based map into continuous space and builds an efficient 3D egocentric local map. A dual-structure subspace division paradigm, composed of a voxel subspace division and a novel pyramid-like subspace division, is proposed to propagate particles and update the map efficiently with the consideration of occlusions. The occupancy status of an arbitrary point in the map space can then be estimated with the particles' weights. To reduce the noise in modeling static and dynamic obstacles simultaneously, an initial velocity estimation approach and a mixture model are utilized. Compared to the grid-form particle-based map, our map enables continuous occupancy estimation and substantially improves the mapping performance at different resolutions.
|
|
16:30-18:00, Paper TuCT26-NT.3 | Add to My Program |
Building Volumetric Beliefs for Dynamic Environments Exploiting Map-Based Moving Object Segmentation |
|
Mersch, Benedikt | University of Bonn |
Guadagnino, Tiziano | University of Bonn |
Chen, Xieyuanli | National University of Defense Technology |
Vizzo, Ignacio | Dexory |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Mapping, Computer Vision for Transportation, Intelligent Transportation Systems
Abstract: Mobile robots that navigate in unknown environments need to be constantly aware of the dynamic objects in their surroundings for mapping, localization, and planning. It is key to reason about moving objects in the current observation and at the same time to also update the internal model of the static world to ensure safety. In this paper, we address the problem of jointly estimating moving objects in the current 3D LiDAR scan and a local map of the environment. We use sparse 4D convolutions to extract spatio-temporal features from scan and local map and segment all 3D points into moving and non-moving ones. Additionally, we propose to fuse these predictions in a probabilistic representation of the dynamic environment using a Bayes filter. This volumetric belief models, which parts of the environment can be occupied by moving objects. Our experiments show that our approach outperforms existing moving object segmentation baselines and even generalizes to different types of LiDAR sensors. We demonstrate that our volumetric belief fusion can increase the precision and recall of moving object segmentation and even retrieve previously missed moving objects in an online mapping scenario.
|
|
16:30-18:00, Paper TuCT26-NT.4 | Add to My Program |
Fast and Robust Normal Estimation for Sparse LiDAR Scans |
|
Bogoslavskyi, Igor | Magic Leap |
Zampogiannis, Konstantinos | Magic Leap |
Phan, Raymond | Magic Leap |
Keywords: Mapping, Localization, SLAM
Abstract: Light Detection and Ranging (LiDAR) technology has proven to be an important part of many robotics systems. Surface normals estimated from LiDAR data are commonly used for a variety of tasks in such systems. As most of the today's mechanical LiDAR sensors produce sparse data, estimating normals from a single scan in a robust manner poses difficulties. In this paper, we address the problem of estimating normals for sparse LiDAR data avoiding the typical issues of smoothing out the normals in high curvature areas. Mechanical LiDARs rotate a set of rigidly mounted lasers. One firing of such a set of lasers produces an array of points where each point's neighbor is known due to the known firing pattern of the scanner. We use this knowledge to connect these points to their neighbors and label them using the angles of the lines connecting them. When estimating normals at these points, we only consider points with the same label as neighbors. This allows us to avoid estimating normals in high curvature areas. We evaluate our approach on various data, both self-recorded and publicly available, acquired using various sparse LiDAR sensors. We show that using our method for normal estimation leads to normals that are more robust in areas with high curvature which leads to maps of higher quality. We also show that our method only incurs a linear factor runtime overhead with respect to a lightweight baseline normal estimation procedure and is therefore suited for operation in computationally demanding environments.
|
|
16:30-18:00, Paper TuCT26-NT.5 | Add to My Program |
OmniColor: A Global Camera Pose Optimization Approach of LiDAR-360Camera Fusion for Colorizing Point Clouds |
|
Liu, Bonan | HKUST(GZ) |
Zhao, Guoyang | HKUST(GZ) |
Jiao, Jianhao | University College London |
Cai, Guang | The Hong Kong University of Science and Technology |
Li, Chengyang | The Hong Kong University of Science and Technology (Guangzhou) |
Yin, Handi | The Hong Kong University of Science and Technology (Guangzhou), |
Wang, Yuyang | Hong Kong University of Science and Technology (Guangzhou) |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Hui, Pan | Hong Kong University of Science and Technology |
Keywords: Mapping, Omnidirectional Vision, SLAM
Abstract: A Colored point cloud, as a simple and efficient 3D representation, has many advantages in various fields, including robotic navigation and scene reconstruction. This representation is now commonly used in 3D reconstruction tasks relying on cameras and LiDARs. However, fusing data from these two types of sensors is poorly performed in many existing frameworks, leading to unsatisfactory mapping results, mainly due to inaccurate camera poses. This paper presents OmniColor, a novel and efficient algorithm to colorize point clouds using an independent 360-degree camera. Given a LiDAR-based point cloud and a sequence of panorama images with initial coarse camera poses, our objective is to jointly optimize the poses of all frames for mapping images onto geometric reconstructions. Our pipeline works in an off-the-shelf manner that does not require any feature extraction or matching process. Instead, we find optimal poses by directly maximizing the photometric consistency of LiDAR maps. In experiments, we show that our method can overcome the severe visual distortion of omnidirectional images and greatly benefit from the wide field of view (FOV) of 360-degree cameras to reconstruct various scenarios with accuracy and stability. The code will be released at https://github.com/liubonan123/OmniColor/.
|
|
16:30-18:00, Paper TuCT26-NT.6 | Add to My Program |
Gaussian Process Mapping of Uncertain Building Models with GMM As Prior |
|
Zou, Qianqian | Leibniz University Hannover |
Brenner, Claus | Leibniz University Hannover |
Sester, Monika | Leibniz University Hannover, Institute of Cartography and Geoinf |
Keywords: Mapping, Probability and Statistical Methods, Range Sensing
Abstract: Mapping with uncertainty representation is required in many research domains, especially for localization. Although there are many investigations regarding the uncertainty of the pose estimation of an ego-robot with map information, the quality of the reference maps is often neglected. To avoid potential problems caused by the errors of maps and a lack of uncertainty quantification, an adequate uncertainty measure for the maps is required. In this paper, uncertain building models with abstract map surfaces using Gaussian Processes (GPs) are proposed to describe the map uncertainty in a probabilistic way. To reduce the redundant computation for simple planar objects, extracted facets from a Gaussian Mixture Model (GMM) are combined with an implicit GP map, also employing local GP-block techniques. The proposed method is evaluated on LiDAR point clouds of city buildings collected by a mobile mapping system. Compared to the performance of other methods such as Octomap, Gaussian Process Occupancy Map (GPOM) and Bayesian Generalized Kernel Inference (BGKOctomap), our method achieves a higher Precision-Recall AUC for the evaluated buildings.
|
|
16:30-18:00, Paper TuCT26-NT.7 | Add to My Program |
Occupancy Grid Mapping without Ray-Casting for High-Resolution LiDAR Sensors |
|
Cai, Yixi | University of Hong Kong |
Kong, Fanze | The University of Hong Kong |
Ren, Yunfan | The University of Hong Kong |
Zhu, Fangcheng | The University of Hong Kong |
Lin, Jiarong | The University of Hong Kong |
Zhang, Fu | University of Hong Kong |
Keywords: Mapping, Range Sensing, Aerial Systems: Perception and Autonomy, LiDAR Perception
Abstract: This article presents an efficient occupancy mapping framework for high-resolution LiDAR sensors, termed D-Map. The framework introduces three main novelties to address the computational efficiency challenges of occupancy mapping. Firstly, we use a depth image to determine the occupancy state of regions instead of the traditional ray casting method. Secondly, we introduce an efficient on-tree update strategy on a tree-based map structure. Thirdly, we remove known grids from the map at each update by leveraging the low false alarm rate of LiDAR sensors. To support our design, we provide theoretical analyses of the accuracy of the depth image projection and time complexity of occupancy updates. Furthermore, we conduct extensive benchmark experiments on various LiDAR sensors in both public and private datasets. Our framework demonstrates superior efficiency in comparison with other state-of-the-art methods while maintaining comparable mapping accuracy and high memory efficiency. We demonstrate two real-world applications of D-Map for real-time occupancy mapping using a high-resolution LiDAR. In addition, we open-source the implementation of D-Map on GitHub to
|
|
16:30-18:00, Paper TuCT26-NT.8 | Add to My Program |
RH-Map: Online Map Construction Framework of Dynamic Object Removal Based on 3D Region-Wise Hash Map Structure |
|
Yan, Zihong | Tsinghua University |
Wu, Xiaoyi | Harbin Institute of Technology, Shenzhen |
Jian, Zhuozhu | Tsinghua University |
Lan, Bin | Tsinghua University |
Wang, Xueqian | Center for Artificial Intelligence and Robotics, Graduate School |
Keywords: Mapping, Range Sensing, Autonomous Vehicle Navigation
Abstract: Mobile robots navigating in outdoor environments frequently encounter the issue of undesired traces left by dynamic objects and manifested as obstacles on map, impeding robots from achieving accurate localization and effective navigation. To tackle the problem, a novel map construction framework based on 3D region-wise hash map structure (RH-Map) is proposed, consisting of front-end scan refresh and back-end removal modules, which realizes real-time map construction and online dynamic object removal (DOR). First, a two-layer 3D region-wise hash map structure of map management is employed for effective online DOR. Then, in scan refresh, region-wise ground plane estimation (R-GPE) is proposed for incrementally estimating and preserving ground information, and Scan-to-Map Removal (S2M-R) is proposed to discriminate and remove dynamic objects. Moreover, the lightweight back-end removal module maintaining keyframes is proposed for further DOR. As experimentally verified on SemanticKITTI, our proposed framework yields promising performance on online DOR of map construction compared with state-of-the-art methods. We also validate the proposed framework in real-world environments. The source code is released to the community: https://github.com/YZH-bot/RH-Map.
|
|
16:30-18:00, Paper TuCT26-NT.9 | Add to My Program |
Photometric LiDAR and RGB-D Bundle Adjustment |
|
Di Giammarino, Luca | Sapienza University of Rome |
Giacomini, Emanuele | Sapienza University of Rome |
Brizi, Leonardo | Sapienza University of Rome |
Salem, Omar Ashraf Ahmed Khairy | Sapienza University |
Grisetti, Giorgio | Sapienza University of Rome |
Keywords: Mapping, Range Sensing, SLAM
Abstract: The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of Simultaneous Localization and Mapping (SLAM) systems. To achieve this, the gold standard is Bundle Adjustment (BA). Modern 3D LiDARs now retain higher resolutions that enable the creation of point cloud images resembling those taken by conventional cameras. Nevertheless, the typical effective global refinement techniques employed for RGB-D sensors are not widely applied to LiDARs. This paper presents a novel BA photometric strategy that accounts for both RGB-D and LiDAR in the same way. Our work can be used on top of any SLAM/GNSS estimate to improve and refine the initial trajectory. We conducted different experiments using these two depth sensors on public benchmarks. Our results show that our system performs on par or better compared to other state-of-the-art ad-hoc SLAM/BA strategies, free from data association and without making assumptions about the environment. In addition, we present the benefit of jointly using RGB-D and LiDAR within our unified method. We finally release an open-source CUDA/C++ implementation.
|
|
TuCT27-NT Oral Session, NT-G2 |
Add to My Program |
Grasping III |
|
|
Chair: Kumar, Vikash | Meta AI |
Co-Chair: Ye, Qi | Zhejiang University |
|
16:30-18:00, Paper TuCT27-NT.1 | Add to My Program |
InterRep: A Visual Interaction Representation for Robotic Grasping |
|
Cui, Yu | Zhejiang University |
Ye, Qi | Zhejiang University |
Liu, Qingtao | Zhejiang University |
Chen, Anjun | Zhejiang University |
Li, Gaofeng | Zhejiang University |
Chen, Jiming | Zhejiang University |
Keywords: Grasping, Representation Learning, Reinforcement Learning
Abstract: Recently, pre-trained vision models have gained significant attention in motor control, showcasing impressive performance across diverse robotic learning tasks. While previous works predominantly concentrate on the significance of the pre-training phase, the equally important task of extracting more effective representations based on existing pre-trained visual models remains unexplored. To better leverage the representation capabilities of pre-trained models for robotic grasping, we propose InterRep, a novel interaction representation method that possesses not only the strengths of pre-trained models, known for their robustness in noisy environments and their proficiency in recognizing essential features, but also the capacity of capturing dynamic interaction details and local geometric features during the grasping process. Based on the novel representation, we introduce a deep reinforcement learning method to learn generalizable grasping policies. The experimental results demonstrate that our proposed representation outperforms the baselines in terms of both training speed and generalization. For the generalized grasping tasks with dexterous robotic hands, our method boasts a success rate nearly 20% higher than methods using the global features of the entire image from pre-trained models. In addition, our proposed representation method demonstrates promising performance when applied to a different robotic hand and task. It also exhibits excellent performance on real robots with a success rate of 70%.
|
|
16:30-18:00, Paper TuCT27-NT.2 | Add to My Program |
MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation |
|
Lancaster, Patrick | Meta AI |
Hansen, Nicklas | University of California San Diego |
Rajeswaran, Aravind | Meta AI |
Kumar, Vikash | Meta AI |
Keywords: Reinforcement Learning, Sensorimotor Learning, Imitation Learning
Abstract: Robotic systems that aspire to operate in uninstrumented real-world environments must perceive the world directly via onboard sensing. Vision-based learning systems aim to eliminate the need for environment instrumentation by building an implicit understanding of the world based on raw pixels, but navigating the contact-rich high-dimensional search space from solely sparse visual reward signals significantly exacerbates the challenge of exploration. The applicability of such systems is thus typically restricted to simulated or heavily engineered environments since agent exploration in the real-world without the guidance of explicit state estimation and dense rewards can lead to unsafe behavior and safety faults that are catastrophic. In this study, we isolate the root causes behind these limitations to develop a system, called MoDem-V2, capable of learning contact-rich manipulation directly in the uninstrumented real world. Building on the latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills directly in the real world. We identify key ingredients for leveraging demonstrations in model learning while respecting real-world safety considerations -- exploration centering, agency handover, and actor-critic ensembles. We empirically demonstrate the contribution of these ingredients in four complex visuo-motor manipulation problems in both simulation and the real world. To the best of our knowledge, our work presents the first successful system for demonstration-augmented visual MBRL trained directly in the real world. Visit sites.google.com/view/modemv2 for videos and more details.
|
|
16:30-18:00, Paper TuCT27-NT.3 | Add to My Program |
Towards Feasible Dynamic Grasping: Leveraging Gaussian Process Distance Field, SE(3) Equivariance, and Riemannian Mixture Models |
|
Choi, Ho Jin | University of Pennsylvania |
Figueroa, Nadia | University of Pennsylvania |
Keywords: Grasping, Machine Learning for Robot Control, Perception for Grasping and Manipulation
Abstract: This paper introduces a novel approach to improve robotic grasping in dynamic environments by integrating Gaussian Process Distance Fields (GPDF), SE(3) equivariant networks, and Riemannian Mixture Models. The aim is to enable robots to grasp moving objects effectively. Our approach comprises three main components: object shape reconstruction, grasp sampling, and implicit grasp pose selection. GPDF accurately models the shape of objects, which is essential for precise grasp planning. SE(3) equivariance ensures that the sampled grasp poses are equivariant to the object's pose changes, enhancing robustness in dynamic scenarios. Riemannian Gaussian Mixture Models are employed to assess reachability, providing a feasible and adaptable grasping strategies. Feasible grasp poses are targeted by novel task or joint space reactive controllers formulated using Gaussian Mixture Models and Gaussian Processes. This method resolves the challenge of discrete grasp pose selection, enabling smoother grasping execution. Experimental validation confirms the effectiveness of our approach in generating feasible grasp poses and achieving successful grasps in dynamic environments. By integrating these advanced techniques, we present a promising solution for enhancing robotic grasping capabilities in real-world scenarios.
|
|
16:30-18:00, Paper TuCT27-NT.4 | Add to My Program |
A Surprisingly Efficient Representation for Multi-Finger Grasping |
|
Yan, Hengxu | Shanghai Jiao Tong University |
Fang, Hao-Shu | Shanghai Jiao Tong University |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Grasping, Multifingered Hands, Grippers and Other End-Effectors
Abstract: The problem of grasping objects using a multi-finger hand has received significant attention in recent years. However, it remains challenging to handle a large number of unfamiliar objects in real and cluttered environments. In this work, we propose a representation that can be effectively mapped to the multi-finger grasp space. Based on this representation, we develop a simple decision model that generates accurate grasp quality scores for different multi-finger grasp poses using only hundreds to thousands of training samples. We demonstrate that our representation performs well on a real robot and achieves a success rate of 78.64% after training with only 500 real-world grasp attempts and 87% with 4500 grasp attempts. Additionally, we achieve a success rate of 84.51% in a dynamic human-robot handover scenario using a multi-finger hand.
|
|
16:30-18:00, Paper TuCT27-NT.5 | Add to My Program |
GrainGrasp: Dexterous Grasp Generation with Fine-Grained Contact Guidance |
|
Zhao, Fuqiang | Dalian University of Technology |
Tsetserukou, Dzmitry | Skolkovo Institute of Science and Technology |
Liu, Qian | Dalian University of Technology |
Keywords: Grasping, Multifingered Hands, Perception for Grasping and Manipulation
Abstract: One goal of dexterous robotic grasping is to allow robots to handle objects with the same level of flexibility and adaptability as humans. However, it remains a challenging task to generate an optimal grasping strategy for dexterous hands, especially when it comes to delicate manipulation and accurate adjustment the desired grasping poses for objects of varying shapes and sizes. In this paper, we propose a novel dexterous grasp generation scheme called textbf{GrainGrasp} that provides fine-grained contact guidance for each fingertip. In particular, we employ a generative model to predict separate contact maps for each fingertip on the object point cloud, effectively capturing the specifics of finger-object interactions. In addition, we develop a new dexterous grasping optimization algorithm that solely relies on the point cloud as input, eliminating the necessity for complete mesh information of the object. By leveraging the contact maps of different fingertips, the proposed optimization algorithm can generate precise and determinable strategies for human-like object grasping. Experimental results confirm the efficiency of the proposed scheme.
|
|
16:30-18:00, Paper TuCT27-NT.6 | Add to My Program |
Regrasping on Printed Circuit Boards with the Smart Suction Cup |
|
Lee, Jungpyo | University of California, Berkeley |
Sun, Zheng | The Chinese University of Hong Kong |
Dong, Zhipeng | Northeastern University |
Chen, Fei | The Chinese University of Hong Kong |
Stuart, Hannah | UC Berkeley |
Keywords: Grasping, Force and Tactile Sensing, Grippers and Other End-Effectors
Abstract: The disposal of waste electrical and electronic equipment (WEEE) presents a sustainability challenge, particularly for waste printed circuit boards (PCBs). PCBs are challenging to sort out from other waste materials in part because traditional industrial end-effectors struggle to reliably grip these irregularly shaped objects with unmodeled surface-mounted components. Vision-based separators, while effective for object categorization, face challenges with identifying precise grasp points on PCB surfaces. This paper studies regrasping control to enhance suction cup grasping performance on PCBs, addressing issues arising from uneven surfaces and intricate features that interfere with suction sealing. We categorize PCBs into two recycling levels – with large surface features intact or removed – and conduct experiments on both stationary and conveyor belt setups with realistic vision-based grasp planners. Results show that jumping regrasping improves pick-and-place success rate. Haptically driven jumping – using the Smart Suction Cup – is especially useful for unprocessed waste PCBs with large surface mount parts. The proposed method offers a promising solution to enhance the efficiency and reliability of robotic grasping in recycling applications.
|
|
16:30-18:00, Paper TuCT27-NT.7 | Add to My Program |
Anthropomorphic Grasping with Neural Object Shape Completion |
|
Hidalgo Carvajal, Diego Xavier | Technical University of Munich |
Chen, Hanzhi | Technical University of Munich (TUM) |
Bettelani, Gemma Carolina | Technical University of Munich |
Jung, Jaesug | Technical University of Munich |
Zavaglia, Melissa | Technische Universität München |
Busse, Laura | Division of Neuroscience, Faculty of Biology, LMU Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Leutenegger, Stefan | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Grasping, Multifingered Hands, Deep Learning in Grasping and Manipulation
Abstract: The progressive prevalence of robots in human suited environments has given rise to a myriad of object manipulation techniques, where dexterity plays a paramount role. It is well-established that humans exhibit extraordinary dexterity when handling objects. Such dexterity seems to derive from a robust understanding of object properties (such as weight, size, and shape), as well as a remarkable capacity to interact with them. Hand postures commonly demonstrate the influence of specific regions on objects that need to be grasped, especially when objects are partially visible. In this work, we leverage human-like object understanding by reconstructing and completing their full geometry from partial observations, and manipulating them using a 7-DoF anthropomorphic robot hand. Our approach has significantly improved the grasping success rates of baselines with only partial reconstruction by nearly 30% and achieved over 150 successful grasps with three different object categories. This demonstrates our approach’s consistent ability to predict and execute grasping postures based on the completed object shapes from various directions and positions in real-world scenarios. Our work opens up new possibilities for enhancing robotic applications that require precise grasping and manipulation skills of real-world reconstructed objects.
|
|
16:30-18:00, Paper TuCT27-NT.8 | Add to My Program |
Statistical Stratification and Benchmarking of Robotic Grasping Performance |
|
Denoun, Brice | The Shadow Robot Company |
Hansard, Miles | Queen Mary University of London |
Leon, Beatriz | Shadow Robot Company |
Jamone, Lorenzo | Queen Mary University London |
Keywords: Grasping, Performance Evaluation and Benchmarking, Probability and Statistical Methods, Dexterous Manipulation
Abstract: Robotic grasping is fundamental to many real-world applications, and new approaches must be systematically evaluated. However, in most cases, the performance of a specific approach is assessed by simply counting the number of successful attempts in a given task, and this success rate is then compared to those of other solutions, without taking into account the random variability across different experiments (e.g. due to sensor noise, or variations in object placement). In order to address this issue, we classify the observed performance into qualitatively ordered outcomes, thereby stratifying the results. We then show how to analyse these results, in a statistical framework which accounts for the variability between experiments. The advantages of our approach are demonstrated in the practical comparison of four grasp planning algorithms. In particular, we show that the proposed approach allows us to carry out several distinct evaluations from a single set of experiments, without having to repeat the data collection process. We demonstrate that differences between the algorithms, which would not be apparent from overall success rates, can be identified and evaluated.
|
|
16:30-18:00, Paper TuCT27-NT.9 | Add to My Program |
The Hydra Hand: A Mode-Switching Underactuated Gripper with Precision and Power Grasping Modes |
|
Chappell, Digby | Imperial College London |
Bello, Fernando | Imperial College London |
Kormushev, Petar | Imperial College London |
Rojas, Nicolas | Imperial College London |
Keywords: Grasping, Multifingered Hands, Grippers and Other End-Effectors
Abstract: Human hands are able to grasp a wide range of object sizes, shapes, and weights, achieved via reshaping and altering their apparent stiffness from grasping with compliant power to with rigid precision. Achieving similar versatility in robotic hands remains a challenge, which has often been addressed by adding extra controllable degrees of freedom, tactile sensors, or specialised extra grasping hardware, at the cost of control complexity and robustness. We introduce a novel reconfigurable four-fingered two-actuator underactuated gripper---the Hydra Hand---that switches between compliant power and rigid precision grasps using a single motor, while generating grasps via a single hydraulic actuator---exhibiting adaptive grasping between finger pairs, enabling the power grasping of two objects simultaneously. The mode switching mechanism and the hand's kinematics are presented and analysed, and performance is tested on two grasping benchmarks: one focused on rigid objects, and the other on items of clothing. The Hydra Hand is shown to excel at grasping large and irregular objects and small objects with its respective compliant power and rigid precision configurations. The hand's versatility is then showcased by executing the challenging manipulation task of safely grasping and placing a bunch of grapes, and then plucking a single grape from the bunch.
|
|
TuCT28-NT Oral Session, NT-G4 |
Add to My Program |
In-Hand Manipulation |
|
|
Chair: Tao, Lingfeng | Oklahoma State University |
Co-Chair: Ciocarlie, Matei | Columbia University |
|
16:30-18:00, Paper TuCT28-NT.1 | Add to My Program |
Quasi-Static Soft Fixture Analysis of Rigid and Deformable Objects |
|
Dong, Yifei | KTH |
Pokorny, Florian T. | KTH Royal Institute of Technology |
Keywords: Manipulation Planning, Grasping, Task and Motion Planning
Abstract: We present a sampling-based approach to reasoning about the caging-based manipulation of rigid and a simplified class of deformable 3D objects subject to energy constraints. Towards this end, we propose the notion of soft fixtures extending earlier work on energy-bounded caging to include a broader set of energy function constraints, such as gravitational and elastic potential energy of 3D deformable objects. Previous methods focused on establishing provably correct algorithms to compute lower bounds or analytically exact estimates of escape energy for a very restricted class of known objects with low-dimensional configuration spaces, such as planar polygons. We instead propose a practical sampling-based approach that is applicable in higher-dimensional configuration spaces, but only produces a sequence of upper-bound estimates that, however, appear to converge rapidly to actual escape energy. We present 8 simulation experiments demonstrating the applicability of our approach to various complex quasi-static manipulation scenarios. Quantitative results indicate the effectiveness of our approach in providing upper-bound estimates for escape energy in quasi-static manipulation scenarios. Two real-world experiments also show that the computed normalized escape energy estimates appear to correlate strongly with the probability of escape of an object under randomized pose perturbation.
|
|
16:30-18:00, Paper TuCT28-NT.2 | Add to My Program |
In-Hand Rolling Manipulation Based on Ball-On-Cloth System |
|
Ichikura, Hinano | Osaka University |
Higashimori, Mitsuru | Osaka University |
Keywords: In-Hand Manipulation, Flexible Robotics, Grippers and Other End-Effectors
Abstract: This paper presents a novel in-hand rolling manipulation method in which a ball on a cloth attached to fingertips is controlled using flexible and adaptive deformation of the cloth. First, an analytical model of the ball-on-cloth system is introduced. The shape of the cloth is simplified, and the rolling constraint of the ball on the cloth is defined focusing on the lowest point of the ball. Next, the relationship between the input to the cloth anchor point and the position of the lowest point of the ball is expressed by a linear approximation. Then, the input to generate the desired rolling orbit is designed. Next, as an example of utilizing the rolling orbits, a manipulation method to rotate the ball around a vertical axis is developed. Finally, a multi-fingered hand with a piece of cloth attached to the fingertips is developed, and the effectiveness of the proposed system is experimentally verified.
|
|
16:30-18:00, Paper TuCT28-NT.3 | Add to My Program |
A Linkage-Driven Underactuated Robotic Hand for Adaptive Grasping and In-Hand Manipulation (I) |
|
Li, Guotao | Institute of Automation Chinese Academy of Sciences |
Liang, Xu | Institute of Automation, Chinese Academy of Sciences |
Gao, Yifan | North China University of Technology |
Su, Tingting | North China University of Technology |
Liu, Zhijie | Beihang University |
Hou, Zeng-Guang | Chinese Academy of Science |
Keywords: Multifingered Hands, In-Hand Manipulation, Grasping
Abstract: The development of robotic hand that can imitate human movements has always been an important research topic. In this paper, a linkage-driven underactuated three-finger hand is proposed to imitate the flexion/extension (f/e) and abduction/adduction (a/a) motions of human hand. The robotic hand has three identical underactuated fingers, each of which contains an underactuated planar linkage, a spherical four-bar mechanism, and a set of bevel gears. The spherical four-bar mechanism is designed to provide 2-degree-of-freedom actuation, driving the f/e and a/a motions of the proximal joint simultaneously. Based on screw theory, the kinematic model of the spherical mechanism is established, and the maximum available workspace index (MAW) of the spherical mechanism is proposed to evaluate the workspace with the same adduction and abduction angle ranges. The effects of the parameters of the spherical mechanism on the MAW and the transmission efficiency are obtained, and the parameters of the spherical mechanism are optimized. The optimization results show that the MAW of the spherical mechanism can be increased by up to 3.5 times. Finally, experiments are carried out to show the proposed robotic hand can perform simultaneous adaptive grasping and in-hand manipulation.
|
|
16:30-18:00, Paper TuCT28-NT.4 | Add to My Program |
Curriculum-Based Sensing Reduction in Simulation to Real-World Transfer for In-Hand Manipulation |
|
Tao, Lingfeng | Oklahoma State University |
Zhang, Jiucai | Guangzhou Automotive Group R&D Center, Silicon Valley |
Zheng, Qiaojie | Colorado School of Mines |
Zhang, Xiaoli | Colorado School of Mines |
Keywords: In-Hand Manipulation, Multifingered Hands, Reinforcement Learning
Abstract: Simulation to Real-World Transfer allows affordable and fast training of learning-based robots for manipulation tasks using Deep Reinforcement Learning methods. Currently, Asymmetric Actor-Critic approaches are used for Sim2Real to reduce the rich idealized features in simulation to the accessible ones in the real world. However, the feature reduction from the simulation to the real world is conducted through an empirically defined one-step curtail. Small feature reduction does not sufficiently remove the actor’s features, which may still cause difficulty setting up the physical system, while large feature reduction may cause difficulty and inefficiency in policy training. To address this issue, we proposed Curriculum-based Sensing Reduction to enable the actor to start with the same rich feature space as the critic and then get rid of the hard-to-extract features step-by-step for higher training performance and better adaptation for real-world feature space. The reduced features are replaced with random signals from a Deep Random Generator to remove the dependency between the output and the removed features and avoid creating new dependencies. The methods are evaluated on the Allegro robot hand in a real-world in-hand manipulation experiment. The results show that our methods have faster training and higher task performance than baselines and can solve real-world tasks when selected tactile features are reduced.
|
|
16:30-18:00, Paper TuCT28-NT.5 | Add to My Program |
Geometric Fabrics: A Safe Guiding Medium for Policy Learning |
|
Van Wyk, Karl | NVIDIA |
Handa, Ankur | NVidia |
Makoviichuk, Viktor | NVIDIA |
Guo, Yijie | University of Michigan, Ann Arbor |
Allshire, Arthur | University of Toronto |
Ratliff, Nathan | NVIDIA |
Keywords: In-Hand Manipulation, Reinforcement Learning, Dynamics
Abstract: Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form textit{behavioral dynamics}. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.
|
|
16:30-18:00, Paper TuCT28-NT.6 | Add to My Program |
Robust In-Hand Manipulation with Extrinsic Contacts |
|
Liang, Boyuan | University of California, Berkeley |
Ota, Kei | Tokyo Institute of Technology |
Tomizuka, Masayoshi | University of California |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Keywords: In-Hand Manipulation, Dexterous Manipulation, Manipulation Planning
Abstract: We present in-hand manipulation tasks where a robot moves an object in grasp, maintains its external contact mode with the environment, and adjusts its in-hand pose simultaneously. The proposed manipulation task leads to complex contact interactions which can be very susceptible to uncertainties in kinematic and physical parameters. Therefore, we propose a robust in-hand manipulation method, which consists of two parts. First, an in-gripper mechanics model that computes a naive motion cone assuming all parameters are precise. Then, a robust planning method refines the motion cone to maintain desired contact mode regardless of parametric errors. Real world experiments were conducted to illustrate the accuracy of the mechanics model and the effectiveness of the robust planning framework in the presence of kinematics parameter errors.
|
|
16:30-18:00, Paper TuCT28-NT.7 | Add to My Program |
Dexterous In-Hand Manipulation by Guiding Exploration with Simple Sub-Skill Controllers |
|
Khandate, Gagan | Columbia University |
Mehlman, Cameron | Columbia University |
Wei, Xingsheng | Columbia University |
Ciocarlie, Matei | Columbia University |
Keywords: Dexterous Manipulation, In-Hand Manipulation, Sensorimotor Learning
Abstract: Recently, reinforcement learning has led to dexterous manipulation skills of increasing complexity. Nonetheless, learning these skills in simulation still exhibits poor sample-efficiency which stems from the fact these skills are learned from scratch without the benefit of any domain expertise. In this work, we aim to improve the sample efficiency of learning dexterous in-hand manipulation skills using controllers available via domain knowledge. To this end, we design simple sub-skill controllers and demonstrate improved sample efficiency using a framework that guides exploration toward relevant state space by following actions from these controllers. We are the first to demonstrate learning hard-to-explore finger-gaiting in-hand manipulation skills without the use of an exploratory reset distribution.
|
|
16:30-18:00, Paper TuCT28-NT.8 | Add to My Program |
Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing |
|
Yuan, Ying | Tsinghua University |
Che, Haichuan | University of California San Diego |
Qin, Yuzhe | UC San Diego |
Huang, Binghao | University of California, San Diego |
Yin, Zhao-Heng | University of California, Berkeley |
Lee, Kang-Won | Dongguk University |
Wu, Yi | Tsinghua University |
Lim, Soo-Chul | Dongguk University |
Wang, Xiaolong | UC San Diego |
Keywords: Dexterous Manipulation, Force and Tactile Sensing, Sensor Fusion
Abstract: Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance.
|
|
16:30-18:00, Paper TuCT28-NT.9 | Add to My Program |
Adaptive Fingers Coordination for Robust Grasp and In-Hand Manipulation under Disturbances and Unknown Dynamics |
|
Khadivar, Farshad | EPFL |
Billard, Aude | EPFL |
Keywords: Dexterous Manipulation, Grasping, Robust/Adaptive Control of Robotic Systems, Coupled Dynamical Systems
Abstract: We present a control framework for achieving a robust object grasp and manipulation in hand. In-hand manipulation remains a demanding task as the object is never stable and task success relies on carefully synchronizing the fingers' dynamics. Indeed, fingers must simultaneously generate motion while maintaining contact with the object and, by staying within the hand's frame, ensuring that the object remains manipulable. These challenges are exacerbated once the hand gets disturbed or when the internal dynamics of the manipulated object are unknown, such as when it is filled with liquid moving during manipulation. We present a control strategy based on coupled dynamical systems, whereby the fingers move in synchronization using an intermediate dynamic responsible for coordinating fingers. To adapt to changes in forces due to model uncertainties and unexpected disturbances, we employ an adaptive torque-controller combined with a joint impedance regulator that guarantees high tracking accuracy while adapting to dynamic changes. We validate the approach in multiple experiments on a 16 degrees-of-freedom robotic hand grasping and manipulating objects with different
|
|
TuCT29-NT Oral Session, NT-G5 |
Add to My Program |
Object Detection II |
|
|
Chair: Lu, Chris Xiaoxuan | University College London |
Co-Chair: Dayoub, Feras | The University of Adelaide |
|
16:30-18:00, Paper TuCT29-NT.1 | Add to My Program |
Robust 3D Object Detection from LiDAR-Radar Point Clouds Via Cross-Modal Feature Augmentation |
|
Deng, Jianning | University of Edinburgh |
Chan, King Wah Gabriel | Connecticut College |
Zhong, Hantao | University of Cambridge |
Lu, Chris Xiaoxuan | University of Edinburgh |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Visual Learning
Abstract: This paper presents a novel framework for robust 3D object detection from point clouds via cross-modal hallucination. Our proposed approach is agnostic to either hallucination direction between LiDAR and 4D radar. We introduce multiple alignments on both spatial and feature levels to achieve simultaneous backbone refinement and hallucination generation. Specifically, spatial alignment is proposed to deal with the geometry discrepancy for better instance matching between LiDAR and radar. The feature alignment step further bridges the intrinsic attribute gap between the sensing modalities and stabilizes the training. The trained object detection models can deal with difficult detection cases better, even though only single-modal data is used as the input during the inference stage. Extensive experiments on the View-of-Delft (VoD) dataset show that our proposed method outperforms the state-of-the-art (SOTA) methods for both radar and LiDAR object detection while maintaining competitive efficiency in runtime.
|
|
16:30-18:00, Paper TuCT29-NT.2 | Add to My Program |
Predicting Class Distribution Shift for Reliable Domain Adaptive Object Detection |
|
Chapman, Nicolas Harvey | Queensland University of Technology |
Dayoub, Feras | The University of Adelaide |
Browne, Will | Queensland University of Technology |
Lehnert, Christopher | Queensland University of Technology |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Visual Learning
Abstract: Unsupervised Domain Adaptive Object Detection (UDA-OD) uses unlabelled data to improve the reliability of robotic vision systems in open-world environments. Previous approaches to UDA-OD based on self-training have been effective in overcoming changes in the general appearance of images. However, shifts in a robot's deployment environment can also impact the likelihood that different objects will occur, termed class distribution shift. Motivated by this, we propose a framework for explicitly addressing class distribution shift to improve pseudo-label reliability in self-training. Our approach uses the domain invariance and contextual understanding of a pre-trained joint vision and language model to predict the class distribution of unlabelled data. By aligning the class distribution of pseudo-labels with this prediction, we provide weak supervision of pseudo-label accuracy. To further account for low quality pseudo-labels early in self-training, we propose an approach to dynamically adjust the number of pseudo-labels per image based on model confidence. Our method outperforms state-of-the-art approaches on several benchmarks, including a 4.7 mAP improvement when facing challenging class distribution shift.
|
|
16:30-18:00, Paper TuCT29-NT.3 | Add to My Program |
LSSAttn: Towards Dense and Accurate View Transformation for Multi-Modal 3D Object Detection |
|
Jiang, Qi | Shanghai Jiaotong University |
Sun, Hao | National University of Singapore |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Fusing the camera and LiDAR information in the unified BEV representation serves as the elegant paradigm for the 3D detection tasks. Current multi-modal fusion methods in BEV can be categorized into LSS-based and Transformer-based in terms of their view transformation. The former leverages inaccurate depth prediction and massive pseudo points for perspective-to-BEV transformation while the latter only fetches sparse image features to the BEV representation. To overcome their shortcomings, an optimized view transformation is proposed, which can be easily modulated into the LSS-based methods. The proposed module capitalizes on the LSS mechanism to establish dense associations between perspective pixels and BEV grids. It utilizes the attention mechanism to compute similarity scores for each associated pair during feature aggregation. Starting from the BEVFusion baseline, we further introduce (1) cross-attention within the associated subsets to transfer image features into the BEV, and (2)a multi-scale feature fusion mechanism for LSS-based view transformation. Extensive experiments on nuScenes validate the effectiveness and efficiency of our proposed module, which achieves an increase of 1.3% in mAP compared to the baseline model.
|
|
16:30-18:00, Paper TuCT29-NT.4 | Add to My Program |
Learning Temporal Cues by Predicting Objects Move for Multi-Camera 3D Object Detection |
|
Moon, Seokha | Korea University |
Park, Hongbeen | Korea University |
Lee, Jaekoo | Kookmin University |
Kim, Jinkyu | Korea University |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: In autonomous driving and robotics, there is a growing interest in utilizing short-term historical data to enhance multi-camera 3D object detection, leveraging the continuous and correlated nature of input video streams. Recent work has focused on spatially aligning BEV-based features over timesteps. However, this is often limited as its gain does not scale well with long-term past observations. To address this, we advocate for supervising a model to predict objects' poses given past observations, thus explicitly guiding to learn objects' temporal cues. To this end, we propose a model called DAP (Detection After Prediction), consisting of a two-branch network: (i) a branch responsible for forecasting the current objects' poses given past observations and (ii) another branch that detects objects based on the current and past observations. The features predicting the current objects from branch (i) is fused into branch (ii) to transfer predictive knowledge. We conduct extensive experiments with the large-scale nuScenes datasets, and we observe that utilizing such predictive information significantly improves the overall detection performance. Our model can be used plug-and-play, showing consistent performance gain.
|
|
16:30-18:00, Paper TuCT29-NT.5 | Add to My Program |
Improved Yolov5: HIC-Yolov5 for Small Object Detection |
|
Tang, Shiyi | Ocean University of China |
Zhang, Shu | Ocean University of China |
Fang, Yini | Hong Kong University of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Aerial Systems: Applications
Abstract: Small object detection has been a challenging problem in the field of object detection. There has been previous work to make improvements specifically for small objects, such as adding several attention blocks or changing the whole structure of feature fusion networks. However, there is still space of improvement and the computation cost is large, which is bad for real-time object detection. An improved Yolov5 model: HIC-Yolov5 is proposed to address the aforementioned problems. Firstly, by adding an additional prediction head for small objects, the higher-resolution feature maps could be directly used to detect small targets. Secondly, an involution block is adopted between the backbone and neck to increase channel information of the feature map. Moreover, an attention mechanism named CBAM is applied at the end of the backbone, thus not only decreasing the computation cost compared with previous works but also emphasizing the important information in both channel and spatial domain. Finally, it is proved that the improved Yolov5 algorithm improved mAP@[.5:.95] by 6.42% and mAP@0.5 by 9.38% on VisDrone-2019-DET dataset.
|
|
16:30-18:00, Paper TuCT29-NT.6 | Add to My Program |
CLIPUNetr: Assisting Human-Robot Interface for Uncalibrated Visual Servoing Control with CLIP-Driven Referring Expression Segmentation |
|
Jiang, Chen | University of Alberta |
Yang, Yuchen | University of Alberta |
Jagersand, Martin | University of Alberta |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Visual Servoing
Abstract: The classical human-robot interface in uncalibrated image-based visual servoing (UIBVS) relies on either human annotations or semantic segmentation with categorical labels. Both methods fail to match natural human communication and convey rich semantics in manipulation tasks as effectively as natural language expressions. In this paper, we tackle this problem by using referring expression segmentation, which is a prompt-based approach, to provide more in-depth information for robot perception. To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network. CLIPUNetr leverages CLIP's strong vision-language representations to segment regions from referring expressions, while utilizing its ``U-shaped'' encoder-decoder architecture to generate predictions with sharper boundaries and finer structures. Furthermore, we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to control robots in real-world environments. In experiments, our method improves boundary and structure measurements by an average of 120% and can successfully assist real-world UIBVS control in an unstructured manipulation environment.
|
|
16:30-18:00, Paper TuCT29-NT.7 | Add to My Program |
C2FDrone: Coarse-To-Fine Drone-To-Drone Detection Using Vision Transformer Networks |
|
Rebbapragada, Sairam | IIT Hyderabad |
Panda, Pranoy | Fujitsu Research of India |
Balasubramanian, Vineeth | Indian Institute of Technology, Hyderabad |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning Methods, Swarm Robotics
Abstract: A vision-based drone-to-drone detection system offers a cost-effective solution for a range of applications, including collision avoidance, countering hostile drones, and enhancing search-and-rescue operations. However, drone-to-drone detection presents a more intricate set of challenges compared to regular object detection. These challenges encompass the need to detect extremely small-sized objects, contend with strong distortion, handle severe occlusion, operate in uncontrolled environments, and execute real-time processing. While current methods attempt to address these issues by integrating multi-scale feature fusion and temporal information, we propose that these techniques may not be sufficiently equipped to handle extreme blur and minuscule objects. Instead, we put forth a novel coarse-to-fine detection strategy based on vision transformers to achieve precise drone detection. We assess the effectiveness of our approach through a series of comprehensive experiments conducted on three challenging drone-to-drone detection datasets. Our results demonstrate notable improvements, with F1 score enhancements of 7%, 3%, and 1% on the FL-Drones, AOT, and NPS-Drones datasets, respectively. Furthermore, we showcase its real-time processing capability by deploying our model on an edge-computing device. We will make our code repository publicly available.
|
|
16:30-18:00, Paper TuCT29-NT.8 | Add to My Program |
Better Monocular 3D Detectors with LiDAR from the Past |
|
You, Yurong | Cornell University |
Phoo, Cheng Perng | Cornell University |
Diaz-Ruiz, Carlos | Cornell University |
Luo, Katie | Cornell University |
Chao, Wei-Lun | Cornell University |
Campbell, Mark | Cornell University |
Hariharan, Bharath | Cornell University |
Weinberger, Kilian | Cornell University |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, RGB-D Perception
Abstract: Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. Specifically, at inference time we assume that the camera-based detectors have access to multiple unlabeled LiDAR scans from past traversals at locations of interest (potentially from other high-end vehicles equipped with LiDAR sensors). Under this setup, we proposed a novel, simple, and end-to-end trainable framework, termed AsyncDepth, to effectively extract relevant features from asynchronous LiDAR traversals of the same location for monocular 3D detectors. We show consistent and significant performance gain (up to 9 AP) across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
|
|
16:30-18:00, Paper TuCT29-NT.9 | Add to My Program |
A Metacognitive Approach to Out-Of-Distribution Detection for Segmentation |
|
Gummadi, Meghna | University of Pennsylvania |
Kent, Cassandra | University of Pennsylvania |
Schmeckpeper, Karl | University of Pennslyvania |
Eaton, Eric | University of Pennsylvania |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Semantic Scene Understanding
Abstract: Despite outstanding semantic scene segmentation in closed-worlds, deep neural networks segment novel instances poorly, which is required for autonomous agents acting in an open world. To improve out-of-distribution (OOD) detection for segmentation, we introduce a metacognitive approach in the form of a lightweight module that leverages entropy measures, segmentation predictions, and spatial context to characterize the segmentation model’s uncertainty and detect pixel-wise OOD data in real-time. Additionally, our approach incorporates a novel method of generating synthetic OOD data in context with in-distribution data, which we use to fine-tune existing segmentation models with maximum entropy training. This further improves the metacognitive module’s performance without requiring access to OOD data while enabling compatibility with established pre-trained models. Our resulting approach can reliably detect OOD instances in a scene, as shown by state-of-the-art performance on OOD detection for semantic segmentation benchmarks.
|
|
TuCT30-NT Oral Session, NT-G6 |
Add to My Program |
AI-Enabled Robotics II |
|
|
Chair: Gu, Jason | Dalhousie University |
Co-Chair: Honda, Kohei | Nagoya University |
|
16:30-18:00, Paper TuCT30-NT.1 | Add to My Program |
When to Replan? an Adaptive Replanning Strategy for Autonomous Navigation Using Deep Reinforcement Learning |
|
Honda, Kohei | Nagoya University |
Yonetani, Ryo | CyberAgent |
Nishimura, Mai | Omron Sinic X |
Kozuno, Tadashi | Omron Sinic X |
Keywords: AI-Enabled Robotics, Reinforcement Learning, Integrated Planning and Learning
Abstract: The hierarchy of global and local planners is one of the most commonly utilized system designs in autonomous robot navigation. While the global planner generates a reference path from the current to goal locations based on the pre-built map, the local planner produces a kinodynamic trajectory to follow the reference path while avoiding perceived obstacles. To account for unforeseen or dynamic obstacles not present on the pre-built map, ``when to replan'' the reference path is critical for the success of safe and efficient navigation. However, determining the ideal timing to execute replanning in such partially unknown environments still remains an open question. In this work, we first conduct an extensive simulation experiment to compare several common replanning strategies and confirm that effective strategies are highly dependent on the environment as well as the global and local planners. Based on this insight, we then derive a new adaptive replanning strategy based on deep reinforcement learning, which can learn from experience to decide appropriate replanning timings in the given environment and planning setups. Our experimental results show that the proposed replanner can perform on par or even better than the current best-performing strategies in multiple situations regarding navigation robustness and efficiency.
|
|
16:30-18:00, Paper TuCT30-NT.2 | Add to My Program |
Resolving Loop Closure Confusion in Repetitive Environments for Visual SLAM through AI Foundation Models Assistance |
|
Li, Hongzhou | Sun Yat-Sen University |
Yu, Sijie | Sun Yat-Sen University |
Zhang, Shengkai | Wuhan University of Technology |
Tan, Guang | Sun Yat-Sen University |
Keywords: AI-Enabled Robotics, SLAM, Semantic Scene Understanding
Abstract: In visual SLAM (VSLAM) systems, loop closure plays a crucial role in reducing accumulated errors. However, VSLAM systems relying on low-level visual features often suffer from the problem of perceptual confusion in repetitive environments, where scenes in different locations are incorrectly identified as the same. Existing work has attempted to introduce object-level features or artificial landmarks. The former approach struggles to distinguish visually similar but different objects, while the latter is both time-consuming and labor-intensive. This paper introduces a novel loop closure detection method that leverages pretrained AI foundation models to extract rich semantic information about specific types of objects (e.g., door numbers), referred to as semantic anchors, that help to distinguish similar scenes better. In settings such as office buildings, hotels, and warehouses, this approach helps to improve the robustness of loop closure detection. We validate the effectiveness of our method through experiments conducted in both simulated and real-world environments.
|
|
16:30-18:00, Paper TuCT30-NT.3 | Add to My Program |
Prepare the Chair for the Bear! Robot Imagination of Sitting Affordance to Reorient Previously Unseen Chairs |
|
Meng, Xin | National University of Singapore |
Wu, Hongtao | Bytedance |
Ruan, Sipu | National University of Singapore |
Chirikjian, Gregory | National University of Singapore |
Keywords: AI-Enabled Robotics, Simulation and Animation, Manipulation Planning
Abstract: In this letter, a paradigm for the classification and manipulation of novel objects is established and demonstrated through a real example of chairs. Our approach leverages the robot's understanding of object stability, perceptibility, and affordance to prepare previously unseen and randomly oriented chairs for a teddy bear to sit on. The teddy bear is a proxy for an elderly person, hospital patient, or child. By autonomously reconstructing a complete model of the object and inserting it into a physical simulator (i.e., the robot's "imagination"), the robot assesses whether the object is a chair and determines how to reorient it properly to be used. Experiment results show that our method achieves a high success rate on the real robot task of chair preparation. Also, it outperforms several baseline methods on the task of upright pose prediction for chairs. The developed system can be easily transferred to a wide variety of application scenarios, and illustrates a broader paradigm in affordance-based reasoning.
|
|
16:30-18:00, Paper TuCT30-NT.4 | Add to My Program |
Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models |
|
Katara, Pushkal | Carnegie Mellon University |
Xian, Zhou | Carnegie Mellon University |
Fragkiadaki, Aikaterini | Carnegie Mellon University |
Keywords: AI-Based Methods, AI-Enabled Robotics
Abstract: Generalist robot manipulators need to learn a wide variety of manipulation skills across diverse environments. Current robot training pipelines rely on humans to provide kinesthetic demonstrations or to program simulation environments and to code up reward functions for reinforcement learning. Such human involvement is an important bottleneck in scaling up robot learning across diverse tasks and environments. We propose Generation to Simulation (Gen2Sim), a method for scaling up robot skill learning in simulation by automating the generation of 3D assets, task descriptions, task decompositions, and reward functions using large pre-trained generative models of language and vision. We generate 3D assets for simulation by lifting open-world 2D object-centric images to 3D using image diffusion models and querying LLMs to determine plausible physics parameters. Given URDF files of generated and human-developed assets, we chain-of-thought prompt LLMs to map these to relevant task descriptions, temporal decompositions, and corresponding python reward functions for reinforcement learning. We show Gen2Sim succeeds in learning policies for diverse long-horizon tasks, where reinforcement learning with non-temporally decomposed reward functions fails. Gen2Sim provides a viable path for scaling up reinforcement learning for robot manipulators in simulation, both by diversifying and expanding task and environment development, and by facilitating the discovery of reinforcement-learned behaviors through temporal task decomposition in RL. Our work contributes hundreds of simulated assets, tasks, and demonstrations, taking a step towards fully autonomous robotic manipulation skill acquisition in simulation.
|
|
16:30-18:00, Paper TuCT30-NT.5 | Add to My Program |
FLTRNN: Faithful Long-Horizon Task Planning for Robotics with Large Language Models |
|
Zhang, Jiatao | Zhejiang University |
Tang, Lanling | University of Chinese Academy of Sciences |
Song, Yufan | Zhejiang University |
Meng, Qiwei | The Chinese University of Hong Kong |
Qian, Haofu | Zhejiang University |
Shao, Jun | Zhejiang University |
Song, Wei | Zhejiang Lab |
Zhu, Shiqiang | Zhejiang University |
Gu, Jason | Dalhousie University |
Keywords: AI-Based Methods, AI-Enabled Robotics, Agent-Based Systems
Abstract: Recent planning methods based on Large Language Models typically employ the In-Context Learning paradigm. Complex long-horizon planning tasks require more context(including instructions and demonstrations) to guarantee that the generated plan can be executed correctly. However, in such conditions, LLMs may overlook(unfaithful) the rules in the given context, resulting in the generated plans being invalid or even leading to dangerous actions. In this paper, we investigate the faithfulness of LLMs for complex long-horizon tasks. Inspired by human intelligence, we introduce a novel framework named FLTRNN. FLTRNN employs a language-based RNN structure to integrate task decomposition and memory management into LLM planning inference, which could effectively improve the faithfulness of LLMs and make the planner more reliable. We conducted experiments in VirtualHome household tasks. Results show that our model significantly improves faithfulness and success rates for complex long-horizon tasks.
|
|
16:30-18:00, Paper TuCT30-NT.6 | Add to My Program |
Drive Anywhere: Generalizable End-To-End Autonomous Driving with Multi-Modal Foundation Models |
|
Wang, Tsun-Hsuan | Massachusetts Institute of Technology |
Maalouf, Alaa | MIT |
Xiao, Wei | MIT |
Ban, Yutong | Massachusetts Institute of Technology |
Amini, Alexander | Massachusetts Institute of Technology |
Rosman, Guy | Massachusetts Institute of Technology |
Karaman, Sertac | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: AI-Enabled Robotics, Sensor-based Control, Autonomous Agents
Abstract: As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations.
|
|
16:30-18:00, Paper TuCT30-NT.7 | Add to My Program |
AutoTAMP: Autoregressive Task and Motion Planning with LLMs As Translators and Checkers |
|
Chen, Yongchao | Harvard University |
Arkin, Jacob | Massachusetts Institute of Technology |
Dawson, Charles | MIT |
Zhang, Yang | IBM |
Roy, Nicholas | Massachusetts Institute of Technology |
Fan, Chuchu | Massachusetts Institute of Technology |
Keywords: AI-Enabled Robotics, Task and Motion Planning, Semantic Scene Understanding
Abstract: For effective human-robot interaction, robots need to understand, plan, and execute complex, long-horizon tasks described by natural language. Recent advances in large language models (LLMs) have shown promise for translating natural language into robot action sequences for complex tasks. However, existing approaches either translate the natural language directly into robot trajectories or factor the inference process by decomposing language into task sub-goals and relying on a motion planner to execute each sub-goal. When complex environmental and temporal constraints are involved, inference over planning tasks must be performed jointly with motion plans using traditional task-and-motion planning (TAMP) algorithms, making factorization into subgoals untenable. Rather than using LLMs to directly plan task sub-goals, we instead perform few-shot translation from natural language task descriptions to an intermediate task representation that can then be consumed by a TAMP algorithm to jointly solve the task and motion plan. To improve translation, we automatically detect and correct both syntactic and semantic errors via autoregressive re-prompting, resulting in significant improvements in task completion. We show that our approach outperforms several methods using LLMs as planners in complex task domains. See our project website for prompts, videos, and code.
|
|
16:30-18:00, Paper TuCT30-NT.8 | Add to My Program |
ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation |
|
Yokoyama, Naoki | Georgia Institute of Technology |
Clegg, Alexander | Georgia Institute of Technology |
Truong, Joanne | The Georgia Institute of Technology |
Undersander, Eric | Facebook AI Research |
Yang, Tsung-Yen | META |
Arnaud, Sergio | Meta |
Ha, Sehoon | Georgia Institute of Technology |
Batra, Dhruv | Georgia Tech / Facebook AI Research |
Rai, Akshara | Facebook AI Research |
Keywords: AI-Enabled Robotics, Reinforcement Learning, Deep Learning Methods
Abstract: We present Adaptive Skill Coordination (ASC) – an approach for accomplishing long-horizon tasks like mobile pick-and-place (i.e., navigating to an object, picking it, navigating to another location, and placing it). ASC consists of three components – (1) a library of basic visuomotor skills (navigation, pick, place), (2) a skill coordination policy that chooses which skill to use when, and (3) a corrective policy that adapts pre-trained skills in out-of-distribution states. All components of ASC rely only on onboard visual and proprioceptive sensing, without requiring detailed maps with obstacle layouts or precise object locations, easing real-world deployment. We train ASC in simulated indoor environments, and deploy it zero-shot (without any real-world experience or fine-tuning) on the Boston Dynamics Spot robot in eight novel real-world environments (one apartment, one lab, two microkitchens, two lounges, one office space, one outdoor courtyard). In rigorous quantitative comparisons in two environments, ASC achieves near-perfect performance (59/60 episodes, or 98%), while sequentially executing skills succeeds in only 44/60 (73%) episodes. Extensive perturbation experiments show that ASC is robust to hand-off errors, changes in the environment layout, dynamic obstacles (e.g., people), and unexpected disturbances. Supplementary videos at adaptiveskillcoordination.github.io.
|
|
16:30-18:00, Paper TuCT30-NT.9 | Add to My Program |
Forgetting in Robotic Episodic Long-Term Memory |
|
Plewnia, Joana | Karlsruhe Institute of Technology |
Peller-Konrad, Fabian | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Cognitive Control Architectures, Cognitive Modeling, Embodied Cognitive Science
Abstract: Artificial cognitive architectures traditionally rely on complex memory models to encode, store, and retrieve information. However, the conventional practice of transferring all data from working memory (WM) to long-term memory (LTM) leads to high data volumes and challenges in efficient information processing and access. Deciding what information to retain or discard within a robot’s LTM is particularly challenging since knowledge about future data utilization is absent. Drawing inspiration from human forgetting this paper implements and evaluates novel forgetting techniques that allow consolidation in the robot’s LTM only when new information is encountered. The proposed approach combines fast filtering during data transfer to the robot’s LTM with slower yet more precise forgetting mechanisms that are periodically evaluated for offline data deletion inside the LTM. We compare different mechanisms, utilizing metrics such as data similarity, data age, and consolidation frequency. The efficacy of forgetting techniques is evaluated by comparing their performance in a task where two ARMAR robots search through their LTM for past object locations in episodic ego-centric images and robot state data. Experimental results show that our forgetting techniques significantly reduce the space requirements of a robot’s LTM while maintaining its capacity to successfully perform tasks relying on LTM information. Notably, similarity-based forgetting methods outperform frequency- and time-based approaches. The combination of online frequency-based, online similarity-based, offline similarity-based, and time-based decay methods shows superior performance compared to using individual forgetting strategies.
|
|
TuCT31-NT Oral Session, NT-G7 |
Add to My Program |
Intelligent and Flexible Manufacturing |
|
|
Chair: Liu, Lianqing | Shenyang Institute of Automation |
Co-Chair: Park, Yong-Lae | Seoul National University |
|
16:30-18:00, Paper TuCT31-NT.1 | Add to My Program |
Fixture Calibration with Guaranteed Bounds from a Few Correspondence-Free Surface Points |
|
Haugaard, Rasmus Laurvig | University of Southern Denmark |
Kim, Yitaek | University of Southern Denmark |
Iversen, Thorbjørn Mosekjær | The Maersk Mc-Kinney Moller Institute, University of Southern De |
Keywords: Calibration and Identification, Software Tools for Robot Programming, Industrial Robots
Abstract: Calibration of fixtures in robotic work cells is essential but also time consuming and error-prone, and poor calibration can easily lead to wasted debugging time in downstream tasks. Contact-based calibration methods let the user measure points on the fixture's surface with a tool tip attached to the robot's end effector. Most methods require the user to manually annotate correspondences on the CAD model, however, this is error-prone and a cumbersome user experience. We propose a correspondence-free alternative: The user simply measures a few points from the fixture's surface, and our method provides a tight superset of the poses which could explain the measured points. This naturally detects ambiguities related to symmetry and uninformative points and conveys this uncertainty to the user. Perhaps more importantly, it provides guaranteed bounds on the pose. The computation of such bounds is made tractable by the use of a hierarchical grid on SE(3). Our method is evaluated both in simulation and on a real collaborative robot, showing great potential for easier and less error-prone fixture calibration.
|
|
16:30-18:00, Paper TuCT31-NT.2 | Add to My Program |
Data-Driven Virtual Sensing for Probabilistic Condition Monitoring of Solenoid Valves (I) |
|
Vantilborgh, Victor | Ghent University |
Lefebvre, Tom | Ghent University |
Crevecoeur, Guillaume | Ghent University |
Keywords: Manufacturing, Maintenance and Supply Chains, Factory Automation
Abstract: There is an emerging industrial demand for predictive maintenance algorithms that exhibit high levels of predictive accuracy. Such condition monitoring tools must estimate dynamic quantities, such as Remaining Useful Lifetime (RUL) and the State of Health (SOH), based on a, typically, restricted set of measurements that can be obtained in an operational setting. These quantities exhibit inherent stochasticity and can only be approximately determined a posteriori to system failure. This paper proposes a generic prognostic tool for probabilistic condition monitoring of mechatronic systems, with the aim to improve the probabilistic prediction of condition metrics, specifically RUL and SOH. Therefore we propose to identify a Hidden Markov Model (HMM) from a fully instrumented measurement set, that is only available for a restricted set of run-to-failure experiments, typically gathered in an R&D setting. Although being artificial and retrospectively constructed metrics, we interpret RUL and SOH as physical measurements with the purpose to identify accurate degradation dynamics. Once the degradation model is identified, we practice the mathematical flexibility of the HMM framework to estimate several of the no longer available dynamic quantities of interest in real-time, from the limited set of measurements that are available in an operational setting. This modelling paradigm is known as virtual sensing.
|
|
16:30-18:00, Paper TuCT31-NT.3 | Add to My Program |
Digital Robot Judge (DR.J): Building a Task-Centric Performance Database of Real-World Manipulation with Electronic Task Boards |
|
So, Peter | Technical University of Munich |
Sarabakha, Andriy | Nanyang Technological University |
Wu, Fan | Technical University of Munich |
Culha, Utku | Technical University of Munich |
Abu-Dakka, Fares | Mondragon University |
Haddadin, Sami | Technical University of Munich |
Keywords: Factory Automation, Intelligent and Flexible Manufacturing, Sensor Networks
Abstract: Robotics aims to develop manipulation skills approaching human performance. However, skill complexity is often over- or underestimated based on individual experience, and the real-world performance gap is difficult or expensive to measure through in-person competitions. To bridge this gap, we propose a compact, Internet-connected, electronic task board to measure manipulation performance remotely; we call it the digital robot judge, or “DR.J.” By detecting key events on the board through performance circuitry, DR.J provides an alternative to transporting equipment to in-person competitions and serves as a portable test and data generation system that captures and grades performances, making comparisons less expensive. Data collected are automatically published on a web dashboard (WD) that provides a living performance benchmark. We share the results of a proof-of-concept electronic task board with industry-inspired tasks used in an international competition in 2021 and 2022 to benchmark localization, insertion, and disassembly tasks. We present data from 10 DR.J task boards, describe a method for deriving the relative task complexity (RTC) from timing data, and compare robot solutions with a human performer. In the best case, robots performed 9x faster than humans in specialized tasks but achieved only 16% of human speed across the full set of tasks. Finally, we present the design and software to replicate the electronic task board to promote task-centric benchmarking.
|
|
16:30-18:00, Paper TuCT31-NT.4 | Add to My Program |
Semi-Analytical Design of PDE Endpoint Controller for Flexible Manipulator with Non-Homogenous Boundary Conditions (I) |
|
Yaqubi, Sadeq | Tampere University |
Tahamipourzarandi, Seyedmohammad | Tampere University |
Mattila, Jouni | Tampere University |
Keywords: Flexible Robotics, Nonholonomic Mechanisms and Systems
Abstract: This study proposes a new semi-analytical design and implementation method for nonlinear partial differential equation (PDE) control of a flexible manipulator. The proposed scheme considers the effects of the boundary input force and gravity on the payload, which results in non-homogenous boundary conditions. This objective is achieved based on a model transformation scheme for homogenizing boundary conditions, obtaining semi-analytical solutions for the corresponding PDE model. Model transformation is assigned as a hybrid exponential– polynomial function whose coefficients are conveniently calculable without the need for any additional boundary condition measurements. This eliminates the need to use intensive numerical solvers—for example, methods based on finite element analysis— and allows the implementation of sophisticated PDE control schemes considering fully nonlinear PDE models with high computation speed. The presented controller is robust to parametric model uncertainty due to its adaptive design. The precision and efficiency of calculating distributed states using the proposed model transformation are demonstrated based on experimental data for the flexible manipulator with respect to the ground truth camera-based motion capture system. Model transformation is also numerically implemented for the proposed nonlinear endpoint control method based on the original PDE model.
|
|
16:30-18:00, Paper TuCT31-NT.5 | Add to My Program |
Automated Sewing System Enabled by Machine Vision for Smart Garment Manufacturing |
|
Ku, Subyeong | Seoul National University |
Choi, HyunWoong | Seoul National University |
Kim, Ho-Young | Seoul National University |
Park, Yong-Lae | Seoul National University |
Keywords: Intelligent and Flexible Manufacturing, Computer Vision for Manufacturing, Factory Automation
Abstract: This paper presents an automated sewing system de- signed for smart garment manufacturing, incorporating machine vision capabilities into a custom-built sewing machine. The vision system captures an image of the fabric pattern placed between two acrylic plates with a small opening, utilizing a deep learning model to detect and segment the opening, which represents the area of interest on the plate. Subsequently, a specialized algorithm detects a narrow seam line within the segmented image and generates a stitching path alongside the seam line, ensuring a consistent distance. The sewing machine then accurately stitches along the generated path automatically. The vision system utilized in this study achieves a spatial resolution of 68 um per pixel. The custom-built sewing machine, controlled by an external computer, exhibits a spatial resolution of 10 um, a translation speed of 60 mm/s, and an adjustable stitching interval ranging from 1 mm to 5 mm. The subsystems and components are interconnected using the Robot Operating System (ROS), enabling seamless communication and integration. The proposed system eliminates the need for human intervention, facilitating automated garment production. This innovative system is expected to play a critical role in realizing the vision of smart garment manufacturing.
|
|
16:30-18:00, Paper TuCT31-NT.6 | Add to My Program |
Segmentation and Coverage Planning of Freeform Geometries for Robotic Surface Finishing |
|
Schneyer, Stefan | German Aerospace Center (DLR) |
Sachtler, Arne | Technical University of Munich (TUM) |
Eiband, Thomas | German Aerospace Center (DLR) |
Nottensteiner, Korbinian | German Aerospace Center (DLR) |
Keywords: Intelligent and Flexible Manufacturing, Contact Modeling, Motion and Path Planning
Abstract: Surface finishing such as grinding or polishing is a time-consuming task, involves health risks for humans and is still largely performed by hand. Due to the high curvatures of complex geometries, different areas of the surface cannot be optimally reached by a simple strategy using a tool with a relatively large and flat finishing disk. In this paper, a planning method is presented that uses a variable contact point on the finishing disk as an additional degree of freedom. Different strategies for covering the workpiece surface are used to optimize the surface finishing process and ensure the coverage of concave areas. Therefore, an automatic segmentation method is developed to find areas with a uniform machining strategy based on the exact tool and workpiece geometry. Further, a method for planning coverage paths is presented, in which the contact area is modeled to realize an adaptive spacing between path lines. The approach was evaluated in simulation and practical experiments on the DLR SARA robot. The results show high coverage for complex freeform geometry and that adaptive spacing can optimize the overall process by reducing uncovered gaps and overlaps between coverage lines.
|
|
16:30-18:00, Paper TuCT31-NT.7 | Add to My Program |
Integrating Robot Assignment and Maintenance Management: A Multi-Agent Reinforcement Learning Approach for Holistic Control |
|
Bhatta, Kshitij | University of Virginia |
Chang, Qing | University of Virginia |
Keywords: Intelligent and Flexible Manufacturing, Manufacturing, Maintenance and Supply Chains, Planning, Scheduling and Coordination
Abstract: Modern manufacturing requires effective integration of production control and maintenance scheduling to improve productivity and quality. However, there have been few studies on this integrated control due to a lack of a comprehensive manufacturing system model. In response to this challenge, this paper presents a mathematical model framework for a mobile multi-skilled robot-operated manufacturing system that integrates three essential control aspects: robot assignment, maintenance scheduling, and product quality. To demonstrate the effectiveness of this approach, a control problem is solved in the Decentralized Partially Observable Markov Decision Process (Dec-POMDP) framework. Results show that the proposed integrated model outperforms models that consider only system-level parameters, as well as those that only address maintenance scheduling and quality-related parameters.
|
|
16:30-18:00, Paper TuCT31-NT.8 | Add to My Program |
Towards Fault-Tolerant Deployment of Mobile Robot Navigation in the Edge: An Experimental Study |
|
Mirus, Florian | Intel Labs |
Pasch, Frederik | Intel |
Scholl, Kay-Ulrich | Intel |
Keywords: Robot Safety, Failure Detection and Recovery, Industrial Robots
Abstract: Modern algorithms allow robots to reach a greater level of autonomy and fulfill more challenging tasks. However, on-board limitations regarding computational and battery resources are hindering factors regarding the deployment of such algorithms particularly on mobile robots. Although offloading a majority of the algorithmic components to the edge or even cloud offers an attractive option to leverage massive computing power in robotics applications, safety and reliability remain critical issues. This paper presents a minimalistic safety fallback mechanism when offloading mobile robot navigation to the edge, that ensures safe and collision-free navigation even in the presence of failures in the connection between the on-board and edge-devices. We show the effectiveness of our approach through extensive testing in three different relevant scenarios in a simulated warehouse environment. Our experiments demonstrate the effects of different fallback strategies and show how our proposed approach is able to ensure safety while allowing the robot to continue its mission during an interrupted connection and thus avoiding unnecessary downtime.
|
|
TuCT32-NT Oral Session, NT-G8 |
Add to My Program |
Intelligent Transportation Systems III |
|
|
Chair: Li, Dachuan | Tsinghua University |
Co-Chair: Tumova, Jana | KTH Royal Institute of Technology |
|
16:30-18:00, Paper TuCT32-NT.1 | Add to My Program |
Prompting Multi-Modal Tokens to Enhance End-To-End AutonomousDriving Imitation Learning with LLMs |
|
Duan, Yiqun | University of Technolgoy Sydney |
Zhang, Qiang | The Hong Kong University of Science and Technology (Guangzhou) |
Xu, Renjing | The Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Intelligent Transportation Systems
Abstract: The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting a `pure-language' strategy. In this research, we propose a hybrid End-to-End learning framework for autonomous driving by combining basic driving imitation learning with LLMs based on multi-modality prompt tokens. Instead of simply converting perception results from the separated train model into pure language input, our novelty lies in two aspects. 1) The end-to-end integration of visual and LiDAR sensory input into learnable multi-modality tokens, thereby intrinsically alleviating description bias by separated pre-trained perception models. 2) Instead of directly letting LLMs drive, this paper explores a hybrid setting of letting LLMs help the driving model correct mistakes and complicated scenarios. The results of our experiments suggest that the proposed methodology can attain driving scores of 49.21%, coupled with an impressive route completion rate of 91.34% in the offline evaluation conducted via CARLA. These performance metrics are comparable to the most advanced driving models.
|
|
16:30-18:00, Paper TuCT32-NT.2 | Add to My Program |
Efficient and Differentiable Joint Conditional Prediction and Cost Evaluation for Tree-Structured Planning in Autonomous Driving |
|
Huang, Zhiyu | Nanyang Technological University |
Karkus, Peter | NVIDIA |
Ivanovic, Boris | NVIDIA |
Chen, Yuxiao | Nvidia Research |
Pavone, Marco | Stanford University |
Lv, Chen | Nanyang Technological University |
Keywords: Intelligent Transportation Systems, Deep Learning Methods
Abstract: Motion prediction and cost evaluation are vital components in the decision-making system of autonomous ve-hicles. However, existing methods often ignore the importance of cost learning and treat them as separate modules. In this study, we employ a tree-structured policy planner and propose a dif-ferentiable joint training framework for both ego-conditioned prediction and cost models, resulting in a direct improvement of the final planning performance. For conditional prediction, we introduce a query-centric Transformer model that performs efficient ego-conditioned motion prediction. For planning cost, we propose a learnable context-aware cost function with latent interaction features, facilitating differentiable joint learning. We validate our proposed approach using the real-world nuPlan dataset and its associated planning test platform. Our frame-work not only matches state-of-the-art planning methods but outperforms other learning-based methods in planning quality, while operating more efficiently in terms of runtime. We show that joint training delivers significantly better performance than separate training of the two modules. Additionally, we find that tree-structured policy planning outperforms the conventional single-stage planning approach.
|
|
16:30-18:00, Paper TuCT32-NT.3 | Add to My Program |
SIMMF: Semantics-Aware Interactive Multiagent Motion Forecasting for Autonomous Vehicle Driving |
|
Krishnan Nivash, Vidyaa | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Intelligent Transportation Systems, Deep Learning Methods
Abstract: Autonomous vehicles require motion forecasting of their surrounding multi-agents (pedestrians and vehicles) to make optimal decisions for navigation. The existing methods focus on techniques to utilize the positions and velocities of these agents and fail to capture semantic information from the scene. Moreover, to mitigate the increase in computational complexity associated with the number of agents in the scene, some works leverage Euclidean distance to prune far-away agents. However, distance-based metric alone is insufficient to select relevant agents and accurately perform their predictions. To resolve these issues, we propose the Semantics-aware Interactive Multiagent Motion Forecasting (SIMMF) method to capture semantics along with spatial information and optimally select relevant agents for motion prediction. Specifically, we achieve this by implementing a semantic-aware selection of relevant agents from the scene and passing them through an attention mechanism to extract global encodings. These encodings along with agents’ local information, are passed through an encoder to obtain time-dependent latent variables for a motion policy predicting the future trajectories. Our results show that the proposed approach outperforms state-of-the-art baselines and provides more accurate and scene-consistent predictions. The demonstration video is available at: https://youtu.be/Wjla071BPBA
|
|
16:30-18:00, Paper TuCT32-NT.4 | Add to My Program |
CausalAgents: A Robustness Benchmark for Motion Forecasting |
|
Sun, Liting | University of California, Berkeley |
Roelofs, Rebecca | Google Research, Brain Team |
Caine, Ben | Google Research |
Refaat, Khaled | Waymo |
Sapp, Benjamin | Waymo |
Ettinger, Scott | Waymo |
Chai, Wei | Waymo |
Keywords: Intelligent Transportation Systems, Deep Learning Methods, Performance Evaluation and Benchmarking
Abstract: As machine learning models become increasingly prevalent in motion forecasting for autonomous vehicles (AVs), it is critical to ensure that model predictions are safe and reliable. In this paper, we examine the robustness of motion forecasting to non-causal perturbations. We construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data. Specifically, we conduct an extensive labeling effort to identify causal agents, or agents whose presence influences human drivers' behavior, in the Waymo Open Motion Dataset (WOMD), and we use these labels to perturb the data by deleting non-causal agents from the scene. We evaluate a diverse set of state-of-the-art deep-learning models on our proposed benchmark and find that all evaluated models exhibit large shifts under non-causal perturbation: we observe a surprising 25-38% relative change in minADE as compared to the original. In addition, we investigate techniques to improve model robustness, including increasing the training dataset size and using targeted data augmentations that randomly drop non-causal agents throughout training. Finally, we release the causal agent labels as an extension to WOMD and the robustness benchmarks to aid the community in building more reliable and safe deep-learning models for motion forecasting.
|
|
16:30-18:00, Paper TuCT32-NT.5 | Add to My Program |
Highway-Driving with Safe Velocity Bounds on Occluded Traffic |
|
Nyberg, Truls | KTH Royal Institute of Technology |
van Haastregt, Jonne | KTH Royal Institute of Technology |
Tumova, Jana | KTH Royal Institute of Technology |
Keywords: Intelligent Transportation Systems, Formal Methods in Robotics and Automation, Collision Avoidance
Abstract: Limited visibility and sensor occlusions pose pressing safety challenges for advanced driver-assistance systems (ADAS) and autonomous vehicles (AVs). In this work, our pursuit was to strike a balance: a method that ensures safety in occluded scenarios while preventing overly cautious behavior. We argue that such approaches are crucial for AVs' future, particularly when navigating alongside human drivers on highways at high speeds. To this end, we used reachability analysis to find safe velocity bounds on occluded traffic participants. Compared to state-of-the-art methods, we achieved velocity increases in more than 60% of the 230 cut-in scenarios from the highD dataset, without sacrificing safety.
|
|
16:30-18:00, Paper TuCT32-NT.6 | Add to My Program |
Generalizing Cooperative Eco-Driving Via Multi-Residual Task Learning |
|
Jayawardana, Vindula | Massachusetts Institute of Technology |
Li, Sirui | MIT |
Wu, Cathy | MIT |
Farid, Yashar | Toyota North America |
Oguchi, Kentaro | Toyota InfoTechnology Center, USA |
Keywords: Intelligent Transportation Systems, Reinforcement Learning, Motion and Path Planning
Abstract: Conventional control, such as model-based control, is commonly utilized in autonomous driving due to its efficiency and reliability. However, real-world autonomous driving contends with a multitude of diverse traffic scenarios that are challenging for these planning algorithms. Model-free Deep Reinforcement Learning (DRL) presents a promising avenue in this direction, but learning DRL control policies that generalize to multiple traffic scenarios is still a challenge. To address this, we introduce Multi-residual Task Learning (MRTL), a generic learning framework based on multi-task learning that, for a set of task scenarios, decomposes the control into nominal components that are effectively solved by conventional control methods and residual terms which are solved using learning. We employ MRTL for fleet-level emission reduction in mixed traffic using autonomous vehicles as a means of system control. By analyzing the performance of MRTL across nearly 600 signalized intersections and 1200 traffic scenarios, we demonstrate that it emerges as a promising approach to synergize the strengths of DRL and conventional methods in generalizable control.
|
|
16:30-18:00, Paper TuCT32-NT.7 | Add to My Program |
Approximate Multiagent Reinforcement Learning for On-Demand Urban Mobility Problem on a Large Map |
|
Garces, Daniel | Harvard University |
Bhattacharya, Sushmita | Harvard University |
Bertsekas, Dimitri | MIT |
Gil, Stephanie | Harvard University |
Keywords: Intelligent Transportation Systems, Reinforcement Learning, Multi-Robot Systems
Abstract: In this paper, we focus on the autonomous multiagent taxi routing problem for a large urban environment where the location and number of future ride requests are unknown a-priori, but can be estimated by an empirical distribution. Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy. In the routing setting, a policy is stable if its execution keeps the number of outstanding requests uniformly bounded over time. Although, rollout-based approaches are well-suited for learning cooperative multiagent policies with considerations for future demand, applying such methods to a large urban environment can be computationally expensive due to the large number of taxis required for stability. In this paper, we aim to address the computational bottleneck of multiagent rollout by proposing an approximate multiagent rollout-based two phase algorithm that reduces computational costs, while still achieving a stable near-optimal policy. Our approach partitions the graph into sectors based on the predicted demand and the maximum number of taxis that can run sequentially given the user's computational resources. The algorithm then applies instantaneous assignment (IA) for re-balancing taxis across sectors and a sector-wide multiagent rollout algorithm that is executed in parallel for each sector. We provide two main theoretical results: 1) characterize the number of taxis m that is sufficient for IA to be stable; 2) derive a necessary condition on m to maintain stability for IA as time goes to infinity. Our numerical results show that our approach achieves stability for an m that satisfies the theoretical conditions. We also empirically demonstrate that our proposed two phase algorithm has equivalent performance to the one-at-a-time rollout over the entire map, but with significantly lower runtimes.
|
|
16:30-18:00, Paper TuCT32-NT.8 | Add to My Program |
Continual Driving Policy Optimization with Closed-Loop Individualized Curricula |
|
Niu, Haoyi | Tsinghua University |
Xu, Yizhou | Tsinghua University |
Jiang, Xingjian | Tsinghua University |
Hu, Jianming | Tsinghua University |
Keywords: Intelligent Transportation Systems, Autonomous Agents, Reinforcement Learning
Abstract: The safety of autonomous vehicles (AV) has been a long-standing top concern, stemming from the absence of rare and safety-critical scenarios in the long-tail naturalistic driving distribution. To tackle this challenge, a surge of research in scenario-based autonomous driving has emerged, with a focus on generating high-risk driving scenarios and applying them to conduct safety-critical testing of AV models. However, limited work has been explored on the reuse of these extensive scenarios to iteratively improve AV models. Moreover, it remains intractable and challenging to filter through gigantic scenario libraries collected from other AV models with distinct behaviors, attempting to extract transferable information for current AV improvement. Therefore, we develop a continual driving policy optimization framework featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into a set of standardized sub-modules for flexible implementation choices: AV Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a collision prediction task, where it estimates the chance of AV failures in these scenarios at each iteration. Subsequently, by re-sampling from historical scenarios based on these failure probabilities, CLIC tailors individualized curricula for downstream training, aligning them with the evaluated capability of AV. Accordingly, CLIC not only maximizes the utilization of the vast pre-collected scenario library for closed-loop driving policy optimization but also facilitates AV improvement by individualizing its training with more challenging cases out of those poorly organized scenarios. Experimental results clearly indicate that CLIC surpasses other curriculum-based training strategies, showing substantial improvement in managing risky scenarios, while still maintaining proficiency in handling simpler cases.
|
|
16:30-18:00, Paper TuCT32-NT.9 | Add to My Program |
Task-Driven Domain-Agnostic Learning with Information Bottleneck for Autonomous Steering |
|
Shen, Yu | University of Maryland |
Zheng, Laura | University of Maryland, College Park |
Zhou, Tianyi | University of Maryland, College Park |
Lin, Ming C. | University of Maryland at College Park |
Keywords: Transfer Learning
Abstract: Environments for autonomous driving can vary from place to place, leading to challenges in designing a learning model for a new scene. Transfer learning can leverage knowledge from a learned domain to a new domain with limited data. In this work, we focus on end-to-end autonomous driving as the target task, consisting of both perception and control. We first utilize information bottleneck analysis to build a causal graph that defines our framework and the loss function; then we propose a novel domain-agnostic learning method for autonomous steering based on our analysis of training data, network architecture, and training paradigm. Experiments show that our method outperforms other SOTA methods.
|
|
TuCL-EX Poster Session, Exhibition Hall |
Add to My Program |
Late Breaking Results Poster III |
|
|
|
16:30-18:00, Paper TuCL-EX.1 | Add to My Program |
A Graph-Based Planner for Scalable Ergodic Inspection |
|
Wong, Benjamin | University of Washington |
Paine, Tyler | Massachusetts Institute of Technology |
Devasia, Santosh | University of Washington |
Banerjee, Ashis | University of Washington |
Keywords: Task and Motion Planning, Task Planning, Robotics in Hazardous Fields
Abstract: In this work, we present a method for performing ergodic exploration on a region graph using rapid mixing Markov chain. This method enables ergodic planning in spaces with arbitrary topology and at any scale. We have demonstrated in simulation that the ergodic planner outperforms both maximum entropy exploration and random exploration in minimizing the maximum detection error in the context of anomaly detection in a confined space.
|
|
16:30-18:00, Paper TuCL-EX.2 | Add to My Program |
Active Pneumatic Control of Automated Picoinjection for Regulating Droplet Contents Inside Microfluidic Chips |
|
Wang, Jiahao | Shanghai University, School of Mechatronic Engineering and Autom |
Wang, Yue | Shanghai University |
Liu, Na | Shanghai University, Shanghai, China |
Zhong, Songyi | Shanghai University |
Li, Long | Shanghai University |
Zhang, Quan | Shanghai University |
Yue, Tao | Shanghai University |
Fukuda, Toshio | Nagoya University |
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation
Abstract: A method for picoinjection in a microfluidic chip via pneumatic control.The microfluidic chip consists of a fluidic channel layer, an elastic film layer, and a pneumatic control layer.Numerical simulation results show that the venturi structure injects into the droplet with less pressure. When the voltage was increased from 0.1V to 0.25V, the diameter of the injected droplet increased, and when the voltage signal continued to increase, the droplet diameter was unstable.The injected dose is affected by the injection phase velocity, the continuous phase velocity and the dispersed phase velocity.The preparation of calcium alginate microcapsules was achieved by this approach.
|
|
16:30-18:00, Paper TuCL-EX.3 | Add to My Program |
Development of Multifunctional Legged Locomotion System Consisting of Two X-Shaped Walkers |
|
Sedoguchi, Taiki | Japan Advanced Institute of Science and Technology |
Komori, Mikito | Japan Advanced Institute of Science and Technology |
Asano, Fumihiko | Japan Advanced Institute of Science and Technology |
Keywords: Legged Robots, Mechanism Design, Flexible Robotics
Abstract: The authors have been working on the wheel gait generation problem for planar X-shaped walkers, and recently showed that a telescopic-legged X-shaped walker can generate a stable wheel gait on a horizontal plane by asymmetrizing the impact posture. In this study, we propose a novel multifunctional legged locomotion system using the telescopic-legged X-shaped walker as the minimal component. The biggest feature is the ability to combine or separate the X-shaped walkers depending on the environment and work purpose. As a prototype, we are currently developing an experimental machine whose minimal component is a simplified X-shaped walker that omits degrees of freedom at the hip joint. We will report on the latest situation.
|
|
16:30-18:00, Paper TuCL-EX.4 | Add to My Program |
Single Actuated Amphibious Mechanism with Undulation Fin and Leg-Wheel Structure |
|
Tian, Yang | Ritsumeikan University |
Ma, Shugen | Hong Kong University of Science and Technology (Guangzhou) |
Ohira, Takeki | Ritsumeikan University |
Zhang, Guoteng | Shandong University |
Keywords: Mechanism Design, Field Robots, Search and Rescue Robots
Abstract: Amphibious robots have garnered considerable attention and interest due to their unique ability to navigate both on land and underwater. In the development of these versatile robots, a wide range of methods for underwater and land locomotion can be considered. A recent breakthrough involves creating a hybrid mechanism that combines various simple functions, enhancing the robot's adaptability to diverse environments. However, as the complexity of the mechanism increases, so does the need for additional actuators, leading to intricate control systems and reduced robustness. This paper introduces an innovative mechanism that integrates an undulation fin, legs, and wheels, all operated by a single actuator, enabling the robot to achieve amphibious capabilities. With an analysis of several simple function mechanisms, a concept of the fin-leg-wheel combination is proposed. Subsequently, the fin-wheel structure is thoroughly analyzed. Finally, a leg structure is designed to complement the fin-wheel structure. The developed robot prototype is then subjected to rigorous field experiments across various environmental conditions.
|
|
16:30-18:00, Paper TuCL-EX.5 | Add to My Program |
Pure Pursuit Path Tracking for Reversing Tractor-Trailer Mobile Robot |
|
Sagong, Uihun | University of Seoul |
Park, JeongHyun | University of Seoul |
Hwang, Myun Joong | University of Seoul |
Keywords: Motion and Path Planning
Abstract: The goal of this research is to achieve path tracking for a tractor-trailer mobile robot, which consists of a two-wheel differential drive (DD) type tractor and a passively connected two-wheel trailer. The pure pursuit algorithm is chosen for the path tracking method. However, there is an issue with the basic pure pursuit method, as it can cause a jack-knife problem when driving in reverse. To overcome this, a reverse strategy that uses a ‘virtual tractor’ and ‘virtual trailer’ is presented. Additionally, we introduce a strategy for switching between forward and reverse motions. To check the performance, the simulation was conducted on Gazebo simulator. After generating the path using ‘Hybrid A*’ algorithm, the pure pursuit algorithm with the proposed reverse strategy is implemented to track the path. It was confirmed that the robot followed the path and switched well between forward and reverse motions.
|
|
16:30-18:00, Paper TuCL-EX.6 | Add to My Program |
Adaptive Curriculum Learning Design Using Feedback for Collision Avoidance |
|
Hwang, Gyuyong | Tech University of Korea |
Choi, Jeongmin | Tech University of Korea |
Eoh, Gyuho | Tech University of Korea |
Keywords: Collision Avoidance, Reinforcement Learning, Deep Learning Methods
Abstract: This paper proposes an adaptive curriculum learning (CL) design method for collision avoidance using feedback from the deep reinforcement learning (DRL) training process. Previous research on DRL-based collision avoidance algorithms has encountered challenges such as long training times and difficulty in convergence due to sparse rewards. To address these issues, CL has been used to divide the target task into multiple subtasks for training. However, manual or random curriculum design often generates unnecessary subtasks that do not improve performance. Furthermore, a standardized curriculum design method for collision avoidance has not yet been presented. Therefore, this paper presents a method for learning collision avoidance through adaptive curriculum design that utilizes feedback from the training process. The proposed method differs from traditional CL in that the subtask is not predetermined before training. Instead, the curriculum is modified during training based on feedback obtained from validation environments. If a robot demonstrates high collision avoidance performance in a validation environment, it is validated in sequentially more challenging environments for rigorous evaluation. Conversely, if collision avoidance performance in the validation environment is low, the robot will either train in a new environment or train more in the existing environment, depending on the situation. Simulations and practical experiments were conducted for the proposed method.
|
|
16:30-18:00, Paper TuCL-EX.7 | Add to My Program |
Mobile Manipulator Motion Planner for Human-Robot Collaborative Task |
|
Choi, JungHyun | University of Seoul |
Sagong, Uihun | University of Seoul |
Hwang, Myun Joong | University of Seoul |
Keywords: Motion and Path Planning, Human-Robot Collaboration, Human Detection and Tracking
Abstract: A mobile manipulator with mobility and manipulation capabilities suits human-robot collaborative work. We propose a virtual impedance energy field motion planner for the mobile manipulator. Through this motion planner, non-holonomic mobile robot avoid obstacle and follows the defined path at the same time. Using manipulator impedance control, we check compliant motion caused by human force in GAZEBO simulation. it will be used in human-collaborative task.
|
|
16:30-18:00, Paper TuCL-EX.8 | Add to My Program |
Efficient Map Merging with Object-Plane Descriptors for Multi-Robot Systems |
|
Kim, Doyeon | Kumoh National Institute of Technology |
Lee, Heoncheol | Kumoh National Institute of Technology |
Keywords: Distributed Robot Systems, Industrial Robots
Abstract: This paper proposes an object-plane descriptor to enhance the efficiency of map-merging in complex indoor environments for multi-robot systems. Unlike traditional methods that rely on feature matching—prone to errors in dynamic settings—this study utilizes object detection and plane segmentation to extract information from objects and planes. This information is then projected onto a 2D image, forming a robust descriptor that identifies common objects and transformation relationships for map-merging. In experiments, two robots randomly searched for seven objects placed in the environment; they successfully matched six pairs of objects, while one pair was missed due to viewpoint differences between the robots. Future research will aim to overcome these viewpoint differences and integrate more information into the object matching process.
|
|
16:30-18:00, Paper TuCL-EX.9 | Add to My Program |
Estimating the 3D Location of the Burr for Robotic Deburring Task |
|
Lee, Dongwoo | University of Seoul |
Kim, Yeongmin | University of Seoul |
Hwang, Myun Joong | University of Seoul |
Keywords: Computer Vision for Manufacturing
Abstract: We suggest a method to estimate the position of burr that of the FRP parts for robotic deburring tasks. These parts, being black and glossy, posed limitations on acquiring precise 3D data through depth cameras. Furthermore, it is difficult to use the CAD model because of the heat shrinkage of the parts. In this paper, for robotic process automation, the location of burrs is estimated through methods using truncated SVD and feature point detection on 2D images. Finally, the method is verified through experiments.
|
|
16:30-18:00, Paper TuCL-EX.10 | Add to My Program |
Physical Backdoor Attack Can Jeopardize Driving with Vision-Large-Language Models |
|
Ni, Zhenyang | Shanghai Jiao Tong University |
Ye, Rui | Shanghai Jiao Tong University |
Wei, Yuxi | Shanghai Jiao Tong University |
Xiang, Zhen | University of Illinois, Urbana-Champaign |
Wang, Yanfeng | Shanghai Jiao Tong University |
Chen, Siheng | Shanghai Jiao Tong University |
Keywords: Motion and Path Planning, Autonomous Agents, AI-Based Methods
Abstract: Vision-Large-multimodal-models(VLMs) have great application prospects in autonomous driving. Despite the ability of VLMs to comprehend and make decisions in complex scenarios, their integration into safety-critical autonomous driving systems poses serious security risks. In this paper, we propose BadVLMDriver, the first backdoor attack against VLMs for autonomous driving that can be launched in practice using physical objects. Unlike existing backdoor attacks against VLMs that rely on digital modifications, BadVLMDriver uses common physical items, such as a football, to induce unsafe actions like sudden acceleration, highlighting a significant real-world threat to autonomous vehicle safety. To execute BadVLMDriver, we develop an automated pipeline utilizing natural language instructions to generate backdoor training samples with embedded malicious behaviors. This approach allows for flexible trigger and behavior selection, enhancing the stealth and practicality of the attack in diverse scenarios. We conduct extensive experiments to evaluate BadVLMDriver for two representative VLMs, five different trigger objects, and two types of malicious backdoor behaviors. BadVLMDriver achieves a 92% attack success rate in inducing a sudden acceleration when coming across a pedestrian holding a red balloon. Thus, BadVLMDriver not only demonstrates a critical security risk but also emphasizes the urgent need for developing robust defense mechanisms to protect against such vulnerabilities.
|
|
16:30-18:00, Paper TuCL-EX.11 | Add to My Program |
Actuation Constraints in Continuum Robotics Revisited: 2 Dof Manifold and Clarke Transform |
|
Grassmann, Reinhard M. | University of Toronto |
Senyk, Anastasiia | Ukrainian Catholic University |
Burgner-Kahrs, Jessica | University of Toronto |
Keywords: Tendon/Wire Mechanism, Formal Methods in Robotics and Automation, Modeling, Control, and Learning for Soft Robots
Abstract: A displacement-actuated continuum robot such as tendon/cable-actuated continuum robots operates under constraints intrinsic to its mechanical design. Each segment is typically actuated by three or four tendons/cables/bellows in a symmetrical configuration. Traditional approaches have limited the mechanical design space to simplify adherence to actuation constraints. However, exploring designs with an arbitrary number of tendons/cables/bellows raises significant questions about managing these constraints effectively. This poster reexamines the actuation constraint in tendon-driven continuum robots, revealing unexpected links to Kirchhoff’s law and the Clarke transformation. We identify a two-dimensional manifold within the joint space and propose a novel linear transformation. This transformation, a generalization of the Clarke transformation, maps between n tendons and two local variables, effectively disentangling the constraints imposed by the tendons.
|
|
16:30-18:00, Paper TuCL-EX.12 | Add to My Program |
Strain Sensor on Chip for Quantifying the Magnitudes of Tensile Stress on Cells |
|
Zhang, Yuyin | Shanghai University |
Wang, Yue | Shanghai University |
Liu, Na | Shanghai University, Shanghai, China |
Zhong, Songyi | Shanghai University |
Li, Long | Shanghai University |
Zhang, Quan | Shanghai University |
Yue, Tao | Shanghai University |
Fukuda, Toshio | Nagoya University |
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation
Abstract: We designed a cell stretching platform with a in-situ sensor, the sensors can accurately measure the mechanical stimulation caused by the deformation of the vacumm cavity exerted on cells. The platform was applied to human cardiomyocytes (AC16) with cyclic strain (5%, 10%, 15%, 20%, 25%), and we found that cyclic strain promoted cell growth, induced the arrangement of cells on the membrane to gradually unify in consistency, and stabilized at 15% amplitude stimulation, which was even more effective after 3 days of culture.
|
|
16:30-18:00, Paper TuCL-EX.13 | Add to My Program |
Multimodal Soft Optical Waveguide Sensor with Microstructured Core-Cladding Interface for Human-Robot Interaction |
|
Lee, Eunsu | Seoul National University |
Kim, Sungjin | Seoul National University |
Park, Yong-Lae | Seoul National University |
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Soft Robot Applications
Abstract: The increasing demand for human-robot interaction across diverse applications highlights the need for safe and reliable sensing mechanisms. Soft optical waveguide sensors, benefiting from their intrinsic compliance along with immunity to electromagnetic interference and electrical safety, present a promising solution for HRI sensing. However, current solutions face challenges in detecting bidirectional stimuli and involve high computational costs for processing multifunctional signals. To overcome these challenges, this paper introduces a multimodal soft optical waveguide sensor with a microstructured core-cladding interface. The optimal roughness for the microstructures was determined by monitoring the normalized intensity changes within the target sensing range to achieve the anisotropic response for different roughness values. Furthermore, three colored core blocks (red, green, and blue) were introduced to provide additional cues for local pressure, considering the sensor's application as an HRI interface. Leveraging the combination of these design features, the proposed sensor is capable of simultaneous measurement of various deformations—bending with directions, angles, and local pressure—all achieved with a simple thresholding algorithm at low computational cost. The poster further details the experimental characterization to assess its performance under bending and local pressure, along with the demonstration of the sensor’s application in various teleoperation tasks.
|
|
16:30-18:00, Paper TuCL-EX.14 | Add to My Program |
Steering and Shape-Locking Mechanism for Soft Growing Robots with an Accessible Working Channel |
|
Lee, Sanghun | Korea Advanced Institute of Science and Technology |
Kim, Nam Gyun | Korea Advanced Institute of Science and Technology |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications
Abstract: Soft growing robots offer numerous advantages, including the ability to navigate through tight spaces with their inherent flexibility and grow in length irrespective of external conditions. However, achieving high-curvature steering, shape locking, and securing the inner channel simultaneously presents a challenge. In this paper, we propose a novel mechanism that enables high-curvature steering and shape locking while simultaneously securing the inner channel. This is achieved by utilizing a main vine structure with lobed structured sub-vines.
|
|
16:30-18:00, Paper TuCL-EX.15 | Add to My Program |
Practical Methods for Reducing the Computational Time of Hybrid A* Algorithm |
|
Park, JeongHyun | University of Seoul |
Sagong, Uihun | University of Seoul |
Hwang, Myun Joong | University of Seoul |
Keywords: Motion and Path Planning, Kinematics, Wheeled Robots
Abstract: Hybrid A* algorithm is developed from A* algorithm and creates continuous path that the mobile robot can follow through considering the kinematics of robots. However, as the search space increases from 2D(𝑥,𝑦) to 3D (𝑥,𝑦,𝜃), the search time increases significantly. Therefore, we introduce three practical methods that can reduce search time which many open-sources do not use. First, we changed the heuristic function algorithm and removed the case where the calculation time diverges. Second, we changed the data type to save nodes. Third, we used subgoal strategy to skip unnecessary node searching. By applying all three methods, the operating time of Hybrid A* was greatly reduced.
|
|
16:30-18:00, Paper TuCL-EX.16 | Add to My Program |
Human Head Pose and Course Estimation for Robust Localization in Dynamic Environments |
|
Kang, Suhyeon | Kumoh National Institute of Technology |
Lee, Heoncheol | Kumoh National Institute of Technology |
Keywords: Recognition, Localization, Human-Centered Robotics
Abstract: This paper deals with robust robot localization with human head pose and course estimation in dynamic environments. General robot localization and path planning algorithms do not consider the dynamic objects, which can lead to incorrect localization. Therefore, in this paper, a robust localization method in dynamic environments is proposed by performing human head pose and course estimation and applying the output values. Finally, socially-compliant robot navigation for dynamic environments using the above localization results and leaning human behavior patterns is proposed.
|
|
16:30-18:00, Paper TuCL-EX.17 | Add to My Program |
Formulation of Principles in International Humanitarian Law to Reduce Harm to Civilians Caused by Robotic Weapon Systems |
|
Tsujita, Teppei | National Defense Academy of Japan |
Sakuma, Yutaka | National Defense Academy of Japan |
Yamada, Shunsuke | National Defense Academy |
Eto, Ryosuke | National Defense Academy of Japan |
Kurosaki, Masahiro | National Defense Academy of Japan |
Keywords: Ethics and Philosophy, Robotics in Hazardous Fields
Abstract: Unfortunately, armed conflicts have always occurred, and civilians have been sacrificed in many situations. In recent armed conflicts, the use of new weapons based on robotic technology, such as unmanned aerial vehicles, called robotic weapon systems (RWS), has increased. It is necessary to consider how to regulate these new weapons. International humanitarian law (IHL) has been established to regulate warfare and to reduce the damage as much as possible. This study examines a system that advises RWS operators to comply with IHL to reduce the harm to civilians caused by teleoperated RWS attacks. Based on the images obtained by the RWS, the system distinguishes between civilians and combatants and provides advice on prohibiting attacks on civilians or avoiding excessive damage to civilians. This paper deals with a situation in which a country's military forces guard an area around their base with civilian traffic with RWSs to prevent combatants of the other country from entering the base. Formulations of the principles regulating warfare in such a situation were proposed. Interpreting the IHL articles, we calculated an index of the certainty that the subject is a combatant and the degree of effort made to obtain this certainty, based on a confusion matrix of classifier between combatants and civilians. Numerical simulations were conducted to evaluate the civilian damage that could occur under the assumed conditions, and the validity of the proposed formulas was confirmed.
|
|
16:30-18:00, Paper TuCL-EX.18 | Add to My Program |
Target Position Regression from Navigation Instructions |
|
Hosomi, Naoki | Keio University |
Iioka, Yui | Keio University |
Hatanaka, Shumpei | Keio University |
Misu, Teruhisa | Honda Research Institute USA, Inc |
Yamada, Kentaro | Honda R&D Co., Ltd |
Sugiura, Komei | Keio University |
Keywords: Deep Learning Methods, Deep Learning for Visual Perception
Abstract: In this study, we develop a model aimed at controlling vehicles through user-friendly interactions. Specifically, we focus on regressing destinations from front camera images and instructions for navigation. For example, given the instruction “Pull over next to that traffic cone,” the model should predict an appropriate stopping point next to the traffic cone. This task is challenging because a destination is an arbitrary position on a road, which lacks distinct visual features for identification. Therefore, it is important to identify a landmark mentioned in the instruction and localize a destination relative to the identified landmark. In this study, we propose a novel approach, Target Position Regressor, for predicting destinations. We introduce Absolute-Relative Position loss, which explicitly addresses relative positional relationships between destinations and landmarks. We also introduce Target Position Localizer, which models relationships between destinations and multimodal features. We constructed a new dataset for this task that addresses the scarcity of datasets with annotated destinations on front camera images. We validated our model on the dataset, and it outperformed a baseline method for Root Mean Squared Error (RMSE), which is commonly used to evaluate regression tasks.
|
|
16:30-18:00, Paper TuCL-EX.19 | Add to My Program |
FFHFlow: A Flow-Based Variational Approach for Dexterous Grasp Synthesis in Real Time |
|
Feng, Qian | Technical University of Munich |
Feng, Jianxiang | Technical University of Munich (TUM) |
Chen, Zhaopeng | University of Hamburg |
Triebel, Rudolph | German Aerospace Center (DLR) |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Deep Learning in Grasping and Manipulation, Grasping, Probabilistic Inference
Abstract: Synthesizing diverse and accurate grasps with multi- fingered hands is an important but challenging task in robotics. Previous efforts focusing on generative modeling fall short on capturing the multi-modal high-dimensional grasp distribution precisely. To address this, we propose to exploit a special kind of a Deep Generative Model (DGM) based on Normalizing Flows (NFs). In contrast to the Variational Autoencoder (VAE), another often-used DGM, this counteracts typical pitfalls such as mode collapse and mis-specified priors. Specifically, we develop two novel grasp samplers given partially observed point clouds for a Five-Finger Hand (FFH) based on NFs, FFHFlow-cnf and FFHFlow-lvm. The first is based on a single Conditional Normalizing Flows (cNF), that can generate diverse grasps while being less generalized to the test objects due to a deficient latent representation. The second employs a novel flow-based Deep Latent Variable Models (DLVMs) to mitigate this problem. Comprehensive experiments in simulation and on the hardware showcase the benefits of our models through quantitative and qualitative results, demonstrating both better quality and diver- sity in grasp generation against the VAE baseline. Additionally, a run-time comparison is conducted along the ablation study, revealing the high potential of our proposed models for real-time applications.
|
|
16:30-18:00, Paper TuCL-EX.20 | Add to My Program |
A Non-Linear Model Predictive Task-Space Controller Satisfying Shape Constraints for Tendon-Driven Continuum Robots |
|
Hachen, Maximilian | University of Toronto |
Shentu, Chengnan | University of Toronto |
Lilge, Sven | University of Toronto Mississauga |
Burgner-Kahrs, Jessica | University of Toronto |
Keywords: Modeling, Control, and Learning for Soft Robots, Medical Robots and Systems
Abstract: Tendon-Driven Continuum Robots (TDCRs) have the potential to be used in applications such as minimally invasive surgery or industrial inspection tasks in hard to reach places. In these cases, the robot usually has to enter narrow and confined spaces for which control and path planning techniques must be proposed. A suitable controller must be able to handle the non-linear kinematics and potential kinematic redundancy of TDCRs. In this work, a Model Predictive Control (MPC) approach is proposed to solve this challenge. Combined with a local controller to compensate for disturbances, the proposed algorithm is tested in simulation and on a TDCR prototype. The results show that our MPC implementation is fast enough to control a TDCR at a rate of 30 Hz, ensuring both closed loop stability and convergence across all tested configurations, while also avoiding collisions between the robot’s geometry and its surrounding environment.
|
|
16:30-18:00, Paper TuCL-EX.21 | Add to My Program |
Self-Sensing Joints for Microrobots |
|
Gu, Panlong | Zhejiang University |
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots
Abstract: In current research on micro robots, with the help of high-precision transmission systems and actuators, micro robots can be controlled with considerable precision. Nevertheless, due to the lack of sensors with relevant scale, such robots cannot perceive the angle of key joints, thus compromising the controllability of the robot's transmission process. This paper design a self-sensing joint that can be integrated into micro-robotic mechanism. The joint utilizes cracks formed on a carbon nanotube film coated on the robot joint to perceive the bending angle. Additionally, a data augmentation method specifically tailored for this sensor is developed, enabling angle sensing with a MSE (mean square root error)of 0.15 degrees. Compared to existing solutions, this approach significantly enhances both precision and repeatability of the sensors.
|
|
16:30-18:00, Paper TuCL-EX.22 | Add to My Program |
Can Your SLAM Survive under Perturbations? Customizable Perturbation Synthesis for Robust SLAM Benchmarking |
|
Xu, Xiaohao | University of Michigan |
Zhang, Tianyi | Carnegie Mellon University |
Wang, Sibo | University of Michigan |
Chen, Yongqi | University of Michigan |
Li, Ye | University of Michigan |
Johnson-Roberson, Matthew | Carnegie Mellon University |
Huang, Xiaonan | University of Michigan |
Keywords: SLAM, Data Sets for SLAM, RGB-D Perception
Abstract: Robustness is a crucial factor for the successful deployment of robots in unstructured environments, particularly in the domain of Simultaneous Localization and Mapping (SLAM). Simulation-based benchmarks have emerged as a highly scalable approach for robustness evaluation compared to real-world data collection. However, crafting a challenging and controllable noisy world with diverse perturbations remains relatively under-explored. To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations. This pipeline incorporates customizable hardware setups, software components, and perturbed environments. In particular, we introduce comprehensive perturbation taxonomy along with a perturbation composition toolbox, allowing the transformation of clean simulations into challenging noisy environments. Utilizing the pipeline, we instantiate the Robust-SLAM benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced multi-modal SLAM models. Our extensive analysis uncovers the susceptibilities of existing SLAM models to real-world disturbance, despite their demonstrated accuracy in standard benchmarks. Our perturbation synthesis toolbox, SLAM robustness evaluation pipeline, and Robust-SLAM benchmark will be made publicly available at https://github.com/Xiaohao-Xu/SLAM-under-Perturbation/.
|
|
16:30-18:00, Paper TuCL-EX.23 | Add to My Program |
An Embedded Driving Style Recognition Approach: Leveraging Knowledge in Learning |
|
Zhang, Chaopeng | Beijing Institute of Technology |
Wang, Wenshuo | McGill University |
Ju, Zhiyang | Beijing Institute of Technology |
Chen, Zhaokun | Beijing Institute of Technology |
Venture, Gentiane | The University of Tokyo |
Xi, Junqiang | Beijing Institute of Technology |
Keywords: Human-Centered Automation, Human-Robot Collaboration, Design and Human Factors
Abstract: Online driving style recognition can enhance the customization of human-centric driving systems, thereby improving comfort, safety, and fuel economy. However, the limited performance of automotive-grade chips makes it highly challenging to compile and run complicated algorithms in real time. To overcome this bottleneck, this paper proposes an embedded method for recognizing driving styles, which is computationally efficient. This approach leverages experts' prior knowledge in learning algorithms and applies it to the electronic control unit (ECU) characterized by limited RAM. More specifically, the approach integrates knowledge-based rules and learning-based rules. The design of knowledge-based rules relies on the correlation between driving styles and vehicle dynamics. Learning-based rules are established as explicit hyperplanes extracted through hierarchical clustering and support vector machine analysis of the naturalistic driving behaviors exhibited by 100 drivers. These knowledge- and learning-based rules are then integrated into an embedded driving style recognition model. The resulting model is compiled into an executable file that operates within the vehicle's onboard ECU. The proposed method is validated through real vehicle testing in naturalistic driving settings, demonstrating a remarkable 94.4% level of subjective-objective consistency.
|
|
16:30-18:00, Paper TuCL-EX.24 | Add to My Program |
Remote Position Control of Hydraulically Driven Actuator Via Water Transmission in a Thin and Long Tube |
|
Yoshimura, Shuto | The University of Electro-Communications |
Nakamura, Yuki | The University of Electro-Communications |
Noda, Tomoyuki | ATR Computational Neuroscience Laboratories |
Nakata, Yoshihiro | The University of Electro-Communications |
Keywords: Hydraulic/Pneumatic Actuators, Motion Control, Sensor-based Control
Abstract: Using remotely operated robots is effective in environments where high radiation, dust prevalence, or underwater conditions prevent human access. However, protecting the electronic equipment on these robots has introduced challenges, including increased costs and design constraints. To solve this problem, we have been studying robots without onboard electronics by using fluidic actuation, which drives them through tubes from a remote location. By employing water—an incompressible fluid with low kinematic viscosity—as the working fluid, we can use thin, long, and flexible tubes, minimizing the physical impact on robot operation. We propose a remote position control method for water hydraulically driven actuators considering the transmission line. We developed an experimental setup using a tube with an inner diameter of 2.5 mm and a length of 20 m and validated the effectiveness of our method through experiments. Our method reduced positional errors by 90%.
|
|
16:30-18:00, Paper TuCL-EX.25 | Add to My Program |
Cooperative Object Map Building for Multi-Robot Systems |
|
Jang, Insik | Kumoh National Institute of Technology |
Lee, Heoncheol | Kumoh National Institute of Technology |
Keywords: Multi-Robot Systems, Cooperating Robots, Swarm Robotics
Abstract: Cooperative multi-robot systems are more efficient, flexible and fault tolerant than single-robots. To realize the advantages of cooperative multi-robot systems, environmental information must be integrated. In this research, multi-robots send and receive collectd environmental information by a mesh network. And then efficiently integrate the environmental information based on ICP(Iterative Closest Point) and NN(Nearest Neighbor) algorithms. The proposed method avoids iterations of the ICP algorithm. It also overcomes the inevitable mismatches caused by sensor errors between estimated object positions.
|
|
16:30-18:00, Paper TuCL-EX.26 | Add to My Program |
Adaptive Short-Range Focal Sweep: Continuous Autofocus for Magnified Object Tracking Based on High-Speed Vision |
|
Zhang, Tianyi | Chiba University |
Shimasaki, Kohei | Hiroshima University |
Ishii, Idaku | Hiroshima University |
Namiki, Akio | Chiba University |
Keywords: Visual Servoing, Sensor-based Control, Visual Tracking
Abstract: This poster introduces a novel continuous autofocus (C-AF) approach for magnified object tracking. It utilizes a high-speed camera and a focus-tunable liquid lens, performing focal sweeps around the object's focus position. During each time of focal sweep, multiple images with different focus positions are captured. By employing a focus measure algorithm, the image with the highest focus value is identified and output. What distinguishes our C-AF approach is it can adjust the focal sweep range adaptively, guided by the object depth result obtained through the depth-from-focus (DFF) technique. Thus, this approach can continuously obtain well-focused images of the object with less redundancy of the input images. In our experimental setup, the high-speed camera captures images at 500 fps, while the liquid lens operates focal sweeps at 50 Hz, resulting in the capture of 10 input images during each sweep. The experimental results demonstrate the effectiveness of our approach in achieving well-focused images at a stable frame rate of 50 fps. This capability holds significant promise for applications in magnified object tracking.
|
|
TuE-EX Expo Session, Exhibition Hall |
Add to My Program |
ICRA EXPO Day 1 |
|
|
Chair: Ravankar, Ankit A. | Tohoku University |
Co-Chair: Salazar Luces, Jose Victorio | Tohoku University |
|
13:30-17:00, Paper TuE-EX.1 | Add to My Program |
A Multimodal Two-Wheeled Robot with Mode-Switching Motions under Active Balancing Control |
|
Sun, Botian | Peking University |
Lang, Qinglin | Peking University |
Li, Minghe | Peking University |
Wang, Xuefeng | Peking University |
|
13:30-17:00, Paper TuE-EX.2 | Add to My Program |
Asymmetric Stiffness Control for Contact-Rich Task |
|
Kato, Yasuhiro | Saitama University |
Tsuji, Toshiaki | Saitama University |
|
13:30-17:00, Paper TuE-EX.3 | Add to My Program |
Development of a Mobile Robotic System for Remote Autonomous Inspection and Digitizing of Industrial Plants with Critical Infrastructure |
|
Cuellar, Francisco | Pontificia Universidad Catolica del Peru |
Cabrera Yi, Eduardo Augusto | Pontificia Universidad Católica del Perú |
Jara Rios, Jose Alonso | PONTIFICIA UNIVERSIDAD CATÓLICA DEL PERÚ |
Leiva, Martin | PUCP |
|
13:30-17:00, Paper TuE-EX.4 | Add to My Program |
Autonomous Robotic Assembly |
|
Ota, Kei | Tokyo Institute of Technology |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Jain, Siddarth | Mitsubishi Electric Research Laboratories (MERL) |
Yerazunis, William | Mitsubishi Electric Research Laboratory |
Corcodel, Radu | Mitsubishi Electric Research Laboratories |
Romeres, Diego | Mitsubishi Electric research laboratories |
|
13:30-17:00, Paper TuE-EX.5 | Add to My Program |
CyberRunner: An Inexpensive Research and Education Robotics Platform |
|
Ramachandran Venkatapathy, Aswin Karthik | ETH Zürich |
Bi, Thomas | ETH Zurich |
D'Andrea, Raffaello | ETHZ |
|
13:30-17:00, Paper TuE-EX.6 | Add to My Program |
Leveraging Robot Swarms for Biodiversity Monitoring and Conservation |
|
Notomista, Gennaro | University of Waterloo |
Mayya, Siddharth | Amazon Robotics |
|
13:30-17:00, Paper TuE-EX.7 | Add to My Program |
Design and Execution of Expressive Arm Motions with a 6 DOF Arm and Blender |
|
Mercader, Alexandra Léna Victoria | École de Technologie Supérieure de Montréal |
Rochette, Audrey | Universite du Quebec a Montreal |
Dussault, Geneviève | UQAM |
St-Onge, David | Ecole de Technologie Superieure |
|
13:30-17:00, Paper TuE-EX.8 | Add to My Program |
An EXPO Demo of CloudGripper - an Open Cloud Robotics Testbed |
|
Zahid, Muhammad | KTH Royal Institute of Technology |
Pokorny, Florian T. | KTH Royal Institute of Technology |
|
13:30-17:00, Paper TuE-EX.9 | Add to My Program |
Demonstration of HASHI: Highly Adaptable Seafood Handling Instrument for Manipulation in Industrial Settings |
|
Allison, Austin | Northeastern University |
Hanson, Nathaniel | Massachusetts Institute of Technology |
Wicke, Sebastian | Northeastern University |
Padir, Taskin | Northeastern University |
|
13:30-17:00, Paper TuE-EX.10 | Add to My Program |
Stereohaptic Vibration: Out-Of-Body Localization of Virtual Vibration Source through Multiple Vibrotactile Stimuli on the Forearms |
|
Ohara, Gen | Tohoku University |
Daiki, Kikuchi | Tohoku University |
Konyo, Masashi | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
|
13:30-17:00, Paper TuE-EX.11 | Add to My Program |
RENKEI: Connecting Sleep and Assistive Robotics for Elderly Care |
|
Breuss, Alexander | Sensory-Motor Systems Lab, Institute of Robotics and Intelligent Systems, ETH Zurich |
Manríquez-Cisterna, Ricardo | Tohoku University |
Gnarra, Oriella | ETH Zurich |
Fujs, Manuel | Sensory-Motor Systems Lab, Institute of Robotics and Intelligent Systems, ETH Zurich |
Peña Queralta, Jorge | ETH Zürich |
Ejtehadi, Mehdi | ETH Zürich |
Ravankar, Ankit A. | Tohoku University |
Salazar Luces, Jose Victorio | Tohoku University |
Paez Granados, Diego | ETH Zurich |
Hirata, Yasuhisa | Tohoku University |
Riener, Robert | Eidgenössische Technische Hochschule (ETH) Zürich |
|
13:30-17:00, Paper TuE-EX.12 | Add to My Program |
Supernumerary Robotic Limbs Integrated Onto Next Generation Space Suit Technology |
|
Ballesteros, Erik | Massachusetts Institute of Technology |
Lee, Sang-Yoep | Seoul National University |
Carpenter, Kalind | Jet Propulsion Laboratory |
Asada, Harry | MIT |
|
13:30-17:00, Paper TuE-EX.13 | Add to My Program |
Prototype of Adaptive Touch Walking Support Robot for Maximizing Human Physical Potential |
|
Terayama, Junya | Tohoku University |
Ravankar, Ankit A. | Tohoku University |
Salazar Luces, Jose Victorio | Tohoku University |
Hirata, Yasuhisa | Tohoku University |
|
13:30-17:00, Paper TuE-EX.14 | Add to My Program |
Learning Agile Bipedal Motions on a Quadrupedal Robot |
|
Li, Yunfei | Tsinghua University |
Li, Jinhan | Tsinghua University |
Fu, Wei | Tsinghua University |
Wang, Chenghao | Technical University of Munich |
Wu, Yi | Tsinghua University |
|
13:30-17:00, Paper TuE-EX.15 | Add to My Program |
Inertial-Only Positioning for Human and Car Localization |
|
Wang, Han | Nanyang Technological University, Singapore |
Yuan, Shenghai | NANYANG TECHNOLOGICAL UNIVERSITY |
Liu, Fen | Nanyang Technological University |
|
13:30-17:00, Paper TuE-EX.16 | Add to My Program |
Live Demonstration of Ringbot: Monocycle Robot with Legs |
|
Gim, Kevin | University of Illinois, Urbana-Champaign |
Kim, Joohyung | University of Illinois at Urbana-Champaign |
|
13:30-17:00, Paper TuE-EX.17 | Add to My Program |
Office Robot with Health Sensing Functions |
|
Yamamoto, Hana | Tohoku University |
Hanada, Hiroyasu | Tohoku University |
Ravankar, Ankit A. | Tohoku University |
Salazar Luces, Jose Victorio | Tohoku University |
Motoe, Masashige | Tohoku University |
Hirata, Yasuhisa | Tohoku University |
Ohta, Koichi | Mitsui Fudosan Co., Ltd |
Minoru, Nishida | Mitsui Fudosan Co., Ltd |
Maekawa, Makiyo | Mitsui Fudosan Co., Ltd |
|
13:30-17:00, Paper TuE-EX.18 | Add to My Program |
Demonstration of Real-Time Shape Estimation of Elastic Rods Based on Force Sensing |
|
Tokuyama, Terumi | University of Tsukuba |
Mochiyama, Hiromi | University of Tsukuba |
|
13:30-17:00, Paper TuE-EX.19 | Add to My Program |
Learning to Estimate Incipient Slip with Tactile Sensing to Gently Grasp Objects |
|
Willemet, Laurence | TU Delft |
Vitrani, Giuseppe | Delft University of Technology (TU Delft) |
Boonstra, Dirk-Jan | Delft University of Technology |
Wiertlewski, Michael | TU Delft |
|
13:30-17:00, Paper TuE-EX.20 | Add to My Program |
Touch and Tech: A Live Demonstration of Interactive Prosthesis Technology |
|
Herneth, Christopher | Technical University Munich |
Fatoni, Muhammad Hilman | Munich Institute Robotic and Machine Intelligence, Technical University of Munich |
Ganguly, Amartya | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
|
13:30-17:00, Paper TuE-EX.21 | Add to My Program |
Active Industrial Exoskeletons to Address Risk of Workplace Musculoskeletal Disorders |
|
Narayan, Ashwin | National University of Singapore |
Ofori, Seyram | National University of Singapore |
Bhattacharya, Shounak | National university of Singapore |
Sia, Cindy Ching Li | National University of Singapore |
Han, Shuaishuai | National University of Singapore |
Yu, Haoyong | National University of Singapore |
|
13:30-17:00, Paper TuE-EX.22 | Add to My Program |
Haptics-Net: Haptics-Mediated Swarm Rehabilitation Robotic System |
|
Sun, Chenyang | Southern university of science and technology |
Liu, Yudong | Southern University of Science and Technology |
Zhang, Mingming | Southern University of Science and Technology |
|
13:30-17:00, Paper TuE-EX.23 | Add to My Program |
Deep Reinforcement Learning Controller for a Furuta Pendulum |
|
Stępień, Maciej | AGH University of Krakow |
|
13:30-17:00, Paper TuE-EX.24 | Add to My Program |
Experience System of Physical Skills with a Collaborative Avatar Robot |
|
Nishimura, Takumi | Nagoya Institute of Technology |
Yukawa, Hikari | Nagoya Institute of Technology |
Minamizawa, Kouta | Keio University |
Tanaka, Yoshihiro | Nagoya Institute of Technology |
|
13:30-17:00, Paper TuE-EX.25 | Add to My Program |
ICRA Demonstration: Helix Soft Manipulator |
|
Stella, Francesco | EPFL |
Della Santina, Cosimo | TU Delft |
Hughes, Josie | EPFL |
|
13:30-17:00, Paper TuE-EX.26 | Add to My Program |
Johnbot -Swarm Dynamics of Simple Robots with Inherent Inhomogeneity |
|
Smith, John | University of Tokyo |
Ikegami, Takashi | University of Tokyo |
|
13:30-17:00, Paper TuE-EX.27 | Add to My Program |
Interactive Simulation of Dexterous Manipulation with Hand Tracking: Robot Hand Bolting and Multi-User Collaborative Simulation in VR |
|
Choi, Hyelim | Seoul National University |
Lee, Youngseon | Seoul National University |
Ji, Harim | Seoul National University |
Kim, Hyunsu | Seoul National University |
Heo, Jinuk | Seoul National University |
Park, Hyunreal | Seoul National University |
Lee, Somang | Seoul National University |
Lee, Minji | Seoul National University |
Lee, Jeongmin | Seoul National University |
Lee, Dongjun | Seoul National University |
| |