| |
Last updated on October 8, 2023. This conference program is tentative and subject to change
Technical Program for Monday October 2, 2023
|
MoAT1 Regular session, 140A |
Add to My Program |
Semantic Scene Understanding |
|
|
Chair: Weiland, James | University of Michigan |
Co-Chair: Simonin, Olivier | INSA De Lyon |
|
08:30-08:36, Paper MoAT1.1 | Add to My Program |
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data |
|
Zeller, Matthias | CARIAD SE |
Behley, Jens | University of Bonn |
Heidingsfeld, Michael | CARIAD SE |
Stachniss, Cyrill | University of Bonn |
Keywords: Semantic Scene Understanding, Deep Learning Methods
Abstract: Scene understanding is crucial for autonomous robots in dynamic environments for making future state predictions, avoiding collisions, and path planning. Camera and LiDAR perception made tremendous progress in recent years, but face limitations under adverse weather conditions. To leverage the full potential of multi-modal sensor suites, radar sensors are essential for safety critical tasks and are already installed in most new vehicles today. In this paper, we address the problem of semantic segmentation of moving objects in radar point clouds to enhance the perception of the environment with another sensor modality. Instead of aggregating multiple scans to densify the point clouds, we propose a novel approach based on the self-attention mechanism to accurately perform sparse, single-scan segmentation. Our approach, called Gaussian Radar Transformer, includes the newly introduced Gaussian transformer layer, which replaces the softmax normalization by a Gaussian function to decouple the contribution of individual points. To tackle the challenge of the transformer to capture long-range dependencies, we propose our attentive up- and downsampling modules to enlarge the receptive field and capture strong spatial relations. We compare our approach to other state-of-the-art methods on the RadarScenes data set and show superior segmentation quality in diverse environments, even without exploiting temporal information.
|
|
08:36-08:42, Paper MoAT1.2 | Add to My Program |
Mask-Based Panoptic LiDAR Segmentation for Autonomous Driving |
|
Marcuzzi, Rodrigo | University of Bonn |
Nunes, Lucas | University of Bonn |
Wiesmann, Louis | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Semantic Scene Understanding, Deep Learning Methods
Abstract: Autonomous vehicles need to understand their surroundings geometrically and semantically to plan and act appropriately in the real world. Panoptic segmentation of LiDAR scans provides a description of the surroundings by unifying semantic and instance segmentation. It is usually solved in a bottom-up manner, consisting of two steps. Predicting the semantic class for 3D each point, using this information to filter out ”stuff” points, and cluster the ”thing” points to obtain instance segmentation. The clustering is a post-processing step that often needs hyperparameter tuning, which usually does not adapt to instances of different sizes or different datasets. To this end, we propose MaskPLS, an approach to perform panoptic segmentation of LiDAR scans in an end-to-end manner by predicting a set of non-overlapping binary masks and semantic classes, fully avoiding the clustering step. As a result, each mask represents a single instance belonging to a ”thing” class or a complete ”stuff” class. Experiments on SemanticKITTI show that the end-to-end learnable mask generation leads to superior performance compared to state-of-the-art heuristic approaches.
|
|
08:42-08:48, Paper MoAT1.3 | Add to My Program |
SCENE: Reasoning about Traffic Scenes Using Heterogeneous Graph Neural Networks |
|
Schmidt, Julian | Mercedes-Benz AG, Ulm University |
Monninger, Thomas | Mercedes-Benz AG, University of Stuttgart |
Rupprecht, Jan | Mercedes-Benz AG |
Raba, David | Mercedes Benz AG |
Jordan, Julian | Mercedes-Benz AG |
Frank, Daniel | University of Stuttgart |
Staab, Steffen | University of Stuttgart |
Dietmayer, Klaus | University of Ulm |
Keywords: Semantic Scene Understanding, AI-Based Methods, Behavior-Based Systems
Abstract: Understanding traffic scenes requires considering heterogeneous information about dynamic agents and the static infrastructure. In this work we propose SCENE, a methodology to encode diverse traffic scenes in heterogeneous graphs and to reason about these graphs using a heterogeneous Graph Neural Network encoder and task-specific decoders. The heterogeneous graphs, whose structures are defined by an ontology, consist of different nodes with type-specific node features and different relations with type-specific edge features. In order to exploit all the information given by these graphs, we propose to use cascaded layers of graph convolution. The result is an encoding of the scene. Task-specific decoders can be applied to predict desired attributes of the scene. Extensive evaluation on two diverse binary node classification tasks show the main strength of this methodology: despite being generic, it even manages to outperform task-specific baselines. The further application of our methodology to the task of node classification in various knowledge graphs shows its transferability to other domains.
|
|
08:48-08:54, Paper MoAT1.4 | Add to My Program |
Prototypical Contrastive Transfer Learning for Multimodal Language Understanding |
|
Otsuki, Seitaro | Keio University |
Ishikawa, Shintaro | Keio University |
Sugiura, Komei | Keio University |
Keywords: Transfer Learning, Semantic Scene Understanding, Multi-Modal Perception for HRI
Abstract: Although domestic service robots are expected to assist individuals who require support, they cannot currently interact smoothly with people through natural language. For example, given the instruction "Bring me a bottle from the kitchen," it is difficult for such robots to specify the bottle in an indoor environment. Most conventional models have been trained on real-world datasets that are labor-intensive to collect, and they have not fully leveraged simulation data through a transfer learning framework. In this study, we propose a novel transfer learning approach for multimodal language understanding called Prototypical Contrastive Transfer Learning (PCTL), which uses a new contrastive loss called Dual ProtoNCE. We introduce PCTL to the task of identifying target objects in domestic environments according to free-form natural language instructions. To validate PCTL, we built new real-world and simulation datasets. Our experiment demonstrated that PCTL outperformed existing methods. Specifically, PCTL achieved an accuracy of 78.1%, whereas simple fine-tuning achieved an accuracy of 73.4%.
|
|
08:54-09:00, Paper MoAT1.5 | Add to My Program |
Re-Thinking Classification Confidence with Model Quality Quantification |
|
Pan, Yancheng | Peking University |
Zhao, Huijing | Peking University |
Keywords: Semantic Scene Understanding, Autonomous Agents
Abstract: Deep neural networks using for real-world classification task require high reliability and robustness. However, the Softmax output by the last layer of network is often over-confident. We propose a novel confidence estimation method by considering model quality for deep classification models. Two metrics, MQ-Repres and MQ-Discri are developed accordingly to evaluate the model quality, and also provide a new confidence estimation called MQ-Conf for online inference. We demonstrate the capability of the proposed method by the 3D semantic segmentation tasks using three different deep networks. Through confusion analysis and feature visualization we show the rationality and reliability of the model quality quantification method.
|
|
09:00-09:06, Paper MoAT1.6 | Add to My Program |
Self-Supervised Drivable Area Segmentation Using LiDAR’s Depth Information for Autonomous Driving |
|
Ma, Fulong | The Hong Kong University of Science and Technology |
Liu, Yang | The Hong Kong University of Science and Technology |
Wang, Sheng | Hong Kong University of Science and Technology |
Jin, Wu | UESTC |
Qi, Weiqing | HKUST |
Liu, Ming | Hong Kong University of Science and Technology |
Keywords: Semantic Scene Understanding, Perception for Grasping and Manipulation, Mapping
Abstract: Drivable area segmentation is an essential component of the visual perception system for autonomous driving vehicles. Recent efforts in deep neural networks have significantly improved semantic segmentation performance for autonomous driving. However, most DNN-based methods need a large amount of data to train the models, and collecting large-scale datasets with manually labeled ground truth is costly, tedious, time consuming and requires the availability of experts, making DNN-based methods often difficult to implement in real world applications. Hence, in this paper, we introduce a novel module named automatic data labeler (ADL), which leverages a deterministic LiDAR-based method for ground plane segmentation and road boundary detection to create large datasets suitable for training DNNs. Furthermore, since the data generated by our ADL module is not as accurate as the manually annotated data, we introduce uncertainty estimation to compensate for the gap between the human labeler and our ADL. Finally, we train the semantic segmentation neural networks using our automatically generated labels on the KITTI dataset and KITTI-CARLA dataset. The experimental results demonstrate that our proposed ADL method not only achieves impressive performance compared to manual labeling but also exhibits more robust and accurate results than both traditional methods and state-of-the-art self-supervised methods.
|
|
09:06-09:12, Paper MoAT1.7 | Add to My Program |
Vehicle Motion Forecasting Using Prior Information and Semantic-Assisted Occupancy Grid Maps |
|
Asghar, Rabbia | INRIA / Univ. Grenoble Alpes |
Diaz-Zapata, Manuel | Inria Grenoble |
Rummelhard, Lukas | INRIA |
Spalanzani, Anne | INRIA / Univ. Grenoble Alpes |
Laugier, Christian | INRIA |
Keywords: Semantic Scene Understanding, Deep Learning Methods, Autonomous Vehicle Navigation
Abstract: Motion prediction is a challenging task for autonomous vehicles due to uncertainty in the sensor data, the non-deterministic nature of future, and complex behavior of agents. In this paper, we tackle this problem by representing the scene as dynamic occupancy grid maps (DOGMs), associating semantic labels to the occupied cells and incorporating map information. We propose a novel framework that combines deep-learning-based spatio-temporal and probabilistic approaches to predict multimodal vehicle behaviors. Contrary to the conventional OGM prediction methods, evaluation of our work is conducted against the ground truth annotations. We experiment and validate our results on real-world NuScenes dataset and show that our model shows superior ability to predict both static and dynamic vehicles compared to OGM predictions. Furthermore, we perform an ablation study and assess the role of semantic labels and map in the architecture.
|
|
09:12-09:18, Paper MoAT1.8 | Add to My Program |
Enhance Local Feature Consistency with Structure Similarity Loss for 3D Semantic Segmentation |
|
Lin, Cheng-Wei | Department of Computer Science, National Yang Ming Chiao Tung Un |
Syu, Fang-Yu | Department of Computer Science, National Yang Ming Chiao Tung Un |
Pan, Yi-Ju | National Yang Ming Chiao Tung University |
Chen, Kuan-Wen | National Yang Ming Chiao Tung University |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Recently, many research studies have been carried out on using deep learning methods for 3D point cloud understanding. However, there is still no remarkable result on 3D point cloud semantic segmentation compared to those of 2D research. One important reason is that 3D data has higher dimensionality but lacks large datasets, which means that the deep learning model is difficult to optimize and easy to overfit. To overcome this, an essential method is to provide more priors to the learning of deep models. In this paper, we focus on semantic segmentation for point clouds in the real world. To provide priors to the model, we propose a novel loss function called Linearity and Planarity to enhance local feature consistency in the regions with similar local structure. Experiments show that the proposed method improves baseline performance on both indoor and outdoor datasets e.g. S3DIS and Semantic3D.
|
|
09:18-09:24, Paper MoAT1.9 | Add to My Program |
Lightweight Semantic Segmentation Network for Semantic Scene Understanding on Low-Compute Devices |
|
Son, Hojun | University of Michigan |
Weiland, James | University of Michigan |
Keywords: Semantic Scene Understanding, Embedded Systems for Robotic and Automation, Deep Learning for Visual Perception
Abstract: Semantic scene understanding is beneficial for mobile robots. Semantic information obtained through onboard cameras can improve robots’ navigation performance. However, obtaining semantic information on small mobile robots with constrained power and computation resources is challenging. We propose a new lightweight convolution neural network comparable to previous semantic segmentation algorithms for mobile applications. Our network achieved 73.06% on the Cityscapes validation set and 71.8% on the Cityscapes test set. Our model runs at 116 FPS with 1024x2048, 172 fps with 1024x1024, and 175 FPS with 720x960 on NVIDIA GTX 1080. We analyze a model size, which is defined as the summation of the number of floating operations and the number of parameters. The smaller model size enables tiny mobile robot systems that should operate multiple tasks simultaneously to work efficiently. Our model has the smallest model size compared to the real-time semantic segmentation convolution neural networks ranked on Cityscapes real-time benchmark and other high-performing, lightweight convolution neural networks. On the Camvid test set, our model achieved a mIoU of 73.29% with Cityscapes pre-training, which outperformed the accuracy of other lightweight convolution neural networks. For mobile applicability, we measured frame-per-second on different low-compute devices. Our model operates 35 FPS on Jetson Xavier AGX, 21 FPS on Jetson Xavier NX, and 14 FPS on a ROS ASUS gaming phone. 1024x2048 resolution is used for the Jetson devices, and 512x512 size is utilized for the measurement on the phone. Our network did not use extra datasets such as ImageNet, Coarse Cityscapes, and Mapillary. Additionally, we did not use TensorRT to achieve fast inference speed. Compared to other real-time and lightweight CNNs, our model achieved significantly more efficiency while balancing accuracy, inference speed, and model size.
|
|
09:24-09:30, Paper MoAT1.10 | Add to My Program |
LiDAR-SGMOS: Semantics-Guided Moving Object Segmentation with 3D LiDAR |
|
Gu, Shuo | Nanjing University of Science and Technology |
Yao, Suling | Nanjing University of Science and Technology |
Yang, Jian | Nanjing University of Science & Technology |
Xu, Chengzhong | University of Macau |
Kong, Hui | University of Macau |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning Methods
Abstract: Most of the existing moving object segmentation (MOS) methods regard MOS as an independent task, in this paper, we associate the MOS task with semantic segmentation, and propose a semantics-guided network for moving object segmentation (LiDAR-SGMOS). We first transform the range image and semantic features of the past scan into the range view of current scan based on the relative pose between scans. The residual image is obtained by calculating the normalized absolute difference between the current and transformed range images. Then, we apply a Meta-Kernel based cross scan fusion (CSF) module to adaptively fuse the range images and semantic features of current scan, the residual image and transformed features. Finally, the fused features with rich motion and semantic information are processed to obtain reliable MOS results. We also introduce a residual image augmentation method to further improve the MOS performance. Our method outperforms most LiDAR-MOS methods with only two sequential LiDAR scans as inputs on the SemanticKITTI MOS dataset.
|
|
09:30-09:36, Paper MoAT1.11 | Add to My Program |
Robust Fusion for Bayesian Semantic Mapping |
|
Morilla-Cabello, David | Universidad De Zaragoza |
Mur Labadia, Lorenzo | University of Zaragoza |
Martinez-Cantin, Ruben | University of Zaragoza |
Montijano, Eduardo | Universidad De Zaragoza |
Keywords: Semantic Scene Understanding, Mapping, Deep Learning for Visual Perception
Abstract: The integration of semantic information in a map allows robots to understand better their environment and make high-level decisions. In the last few years, neural networks have shown enormous progress in their perception capabilities. However, when fusing multiple observations from a neural network in a semantic map, its inherent overconfidence with unknown data gives too much weight to the outliers and decreases the robustness. To mitigate this issue we propose a novel robust fusion method to combine multiple Bayesian semantic predictions. Our method uses the uncertainty estimation provided by a Bayesian neural network to calibrate the way in which the measurements are fused. This is done by regularizing the observations to mitigate the problem of overconfident outlier predictions and using the epistemic uncertainty to weigh their influence in the fusion, resulting in a different formulation of the probability distributions. We validate our robust fusion strategy by performing experiments on photo-realistic simulated environments and real scenes. In both cases, we use a network trained on different data to expose the model to varying data distributions. The results show that considering the model's uncertainty and regularizing the probability distribution of the observations distribution results in a better semantic segmentation performance and more robustness to outliers, compared with other methods.
|
|
09:36-09:42, Paper MoAT1.12 | Add to My Program |
ConSOR: A Context-Aware Semantic Object Rearrangement Framework for Partially Arranged Scenes |
|
Ramachandruni, Kartik | Georgia Institute of Technology |
Zuo, Max | Georgia Institute of Technology |
Chernova, Sonia | Georgia Institute of Technology |
Keywords: Semantic Scene Understanding, Deep Learning Methods
Abstract: Object rearrangement is the problem of enabling a robot to identify the correct object placement in a complex environment. Prior work on object rearrangement has explored a diverse set of techniques for following user instructions to achieve some desired goal state. Logical predicates, images of the goal scene, and natural language descriptions have all been used to instruct a robot in how to arrange objects. In this work, we argue that burdening the user with specifying goal scenes is not necessary in partially-arranged environments, such as common household settings. Instead, we show that contextual cues from partially arranged scenes (i.e., the placement of some number of pre-arranged objects in the environment) provide sufficient context to enable robots to perform object rearrangement without any explicit user goal specification. We introduce ConSOR, a Context-aware Semantic Object Rearrangement framework that utilizes contextual cues from a partially arranged initial state of the environment to complete the arrangement of new objects, without explicit goal specification from the user. We demonstrate that ConSOR strongly outperforms two baselines in generalizing to novel object arrangements and unseen object categories. The code and data are available at https://github.com/kartikvrama/consor.
|
|
09:42-09:48, Paper MoAT1.13 | Add to My Program |
IDA: Informed Domain Adaptive Semantic Segmentation |
|
Chen, Zheng | Indiana University Bloomington |
Ding, Zhengming | Tulane University |
Gregory, Jason M. | US Army Research Laboratory |
Liu, Lantao | Indiana University |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: Mixup-based data augmentation has been validated to be a critical stage in the self-training framework for unsupervised domain adaptive semantic segmentation (UDA-SS), which aims to transfer knowledge from a well-annotated (source) domain to an unlabeled (target) domain. Existing self-training methods usually adopt the popular region-based mixup techniques with a random sampling strategy, which unfortunately ignores the dynamic evolution of different semantics across various domains as training proceeds. To improve the UDA-SS performance, we propose an Informed Domain Adaptation (IDA) model, a self-training framework that mixes the data based on class-level segmentation performance, which aims to emphasize small-region semantics during mixup. In our IDA model, the class-level performance is tracked by an expected confidence score (ECS). We then use a dynamic schedule to determine the mixing ratio for data in different domains. Extensive experimental results reveal that our proposed method is able to outperform the state-of-the-art UDA-SS method by a margin of 1.1 mIoU in the adaptation of GTA-V to Cityscapes and of 0.9 mIoU in the adaptation of SYNTHIA to Cityscapes.
|
|
09:48-09:54, Paper MoAT1.14 | Add to My Program |
Self-Supervised Learning for Panoptic Segmentation of Multiple Fruit Flower Species |
|
Siddique, Abubakar | Marquette University |
Tabb, Amy | USDA-ARS-AFRS |
Medeiros, Henry | University of Florida |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Incremental Learning
Abstract: Convolutional neural networks trained using manually generated labels are commonly used for semantic or instance segmentation. In precision agriculture, automated flower detection methods use supervised models and post-processing techniques that may not perform consistently as the appearance of the flowers and the data acquisition conditions vary. We propose a self-supervised learning strategy to enhance the sensitivity of segmentation models to different flower species using automatically generated pseudo-labels. We employ a data augmentation and refinement approach to improve the accuracy of the model predictions. The augmented semantic predictions are then converted to panoptic pseudo-labels to iteratively train a multi-task model. The self-supervised model predictions can be refined with existing post-processing approaches to further improve their accuracy. An evaluation on a multi-species fruit tree flower dataset demonstrates that our method outperforms state-of-the-art models without computationally expensive post-processing steps, providing a new baseline for flower detection applications.
|
|
MoAT2 Regular session, 140B |
Add to My Program |
Wearable and Assistive Devices |
|
|
Chair: Audu, Musa. L. | Case Western Reserve University |
Co-Chair: Kong, Kyoungchul | Korea Advanced Institute of Science and Technology |
|
08:30-08:36, Paper MoAT2.1 | Add to My Program |
Combined Admittance Control with Type II Singularity Evasion for Parallel Robots Using Dynamic Movement Primitives (I) |
|
Escarabajal, Rafael J. | Universidad Politécnica De Valencia |
Pulloquinga, José Luis | Universidad Politécnica De Valencia |
Valera, Angel | Universidad Politécnica De Valencia |
Mata, Vicente | Universidad Politécnica De Valencia |
Valles, Marina | Universitat Politècnica De València |
Castillo-García, Fernando J. | Universidad De Castilla-La Mancha |
Keywords: Rehabilitation Robotics, Parallel Robots, Compliance and Impedance Control, Dynamic Movement Primitives
Abstract: This paper addresses a new way of generating compliant trajectories for control using movement primitives to allow physical human-robot interaction where parallel robots (PRs) are involved. PRs are suitable for tasks requiring precision and performance because of their robust behavior. However, two fundamental issues must be resolved to ensure safe operation: i) the force exerted on the human must be controlled and limited, and ii) Type II singularities should be avoided to keep complete control of the robot. We offer a unified solution under the Dynamic Movement Primitives (DMP) framework to tackle both tasks simultaneously. DMPs are used to get an abstract representation for movement generation and are involved in broad areas such as imitation learning and movement recognition. For force control, we design an admittance controller intrinsically defined within the DMP structure, and subsequently, the Type II singularity evasion layer is added to the system. Both the admittance controller and the evader exploit the dynamic behavior of the DMP and its properties related to invariance and temporal coupling, and the whole system is deployed in a real PR meant for knee rehabilitation. The results show the capability of the system to perform safe rehabilitation exercises.
|
|
08:36-08:42, Paper MoAT2.2 | Add to My Program |
A Handle Robot for Providing Bodily Support to Elderly Persons |
|
Bolli, Roberto | MIT |
Bonato, Paolo | Harvard Medical School |
Asada, Harry | MIT |
Keywords: Physically Assistive Devices, Human-Robot Collaboration, Domestic Robotics
Abstract: Age-related loss of mobility and an increased risk of falling remain major obstacles for older adults to live independently. Many elderly people lack the coordination and strength necessary to perform activities of daily living, such as getting out of bed or stepping into a bathtub. A traditional solution is to install grab bars around the home. For assisting in bathtub transitions, grab bars are fixed to a bathroom wall. However, they are often too far to reach and stably support the user; the installation locations of grab bars are constrained by the room layout and are often suboptimal. In this paper, we present a mobile robot that provides an older adult with a handlebar located anywhere in space - “Handle Anywhere”. The robot consists of an omnidirectional mobile base attached to a repositionable handlebar. We further develop a methodology to optimally place the handle to provide the maximum support for the elderly user while performing common postural changes. A cost function with a trade-off between mechanical advantage and manipulability of the user’s arm was optimized in terms of the location of the handlebar relative to the user. The methodology requires only a sagittal plane video of the elderly user performing the postural change, and thus is rapid, scalable, and uniquely customizable to each user. A proof-of-concept prototype was built, and the optimization algorithm for handle location was validated experimentally.
|
|
08:42-08:48, Paper MoAT2.3 | Add to My Program |
A Hybrid FNS Generator for Human Trunk Posture Control with Incomplete Knowledge of Neuromusculoskeletal Dynamics |
|
Bao, Xuefeng | Case Western Reserve University |
Friederich, Aidan | Case Western Reserve University |
Triolo, Ronald | Case Western Reserve University |
Audu, Musa. L. | Case Western Reserve University |
Keywords: Rehabilitation Robotics, Modeling and Simulating Humans, Motion Control
Abstract: The trunk movements of an individual paralyzed by spinal cord injury (SCI) can be restored by Functional Neuromuscular Stimulation (FNS), a technique that applies low-level current to motor nerves to activate the muscles generating torques, and thus, produce trunk motions. FNS can be modulated to control trunk movements. However, a stabilizing modulation policy (i.e., control law) is difficult to derive due to the complexity of neuromusculoskeletal dynamics, which consist of skeletal dynamics (i.e., multi-joint rigid body dynamics) and neuromuscular dynamics (i.e., a highly nonlinear, nonautonomous, and input redundant dynamics). Therefore, an FNS-based control method that can stabilize the trunk without knowing the accurate skeletal and neuromuscular dynamics is desired. This work proposed an FNS generator, which consists of a robust nonlinear controller (RNC) that provides stabilizing torque command and an artificial neural network (ANN)- based torque-to-activation (T-A) map to ensure that the muscle generates the stabilizing torque to the skeleton. Due to the robustness and learning capability of this control framework, full knowledge of the trunk neuromusculoskeletal dynamics is not required. The proposed control framework has been tested in a simulation environment where an anatomically realistic 3D musculoskeletal model of the human trunk was manipulated to follow a time-varying reference that moves in the anterior-posterior and medial-lateral directions. From the results, it can be seen that the trunk motion converges to a satisfactory trajectory while the ANN is being updated. The results suggest the potential of this control framework for trunk tracking tasks in a clinical application.
|
|
08:48-08:54, Paper MoAT2.4 | Add to My Program |
Insole-Type Walking Assist Device Capable of Inducing Inversion-Eversion of the Ankle Angle to the Neutral Position |
|
Itami, Taku | Aoyama Gakuin University |
Date, Kazuki | Aoyama Gakuin University |
Ishii, Yuuta | Aoyama Gakuin University |
Yoneyama, Jun | Aoyama Gakuin University |
Aoki, Takaaki | Gifu University |
Keywords: Prosthetics and Exoskeletons, Robotics and Automation in Life Sciences, Body Balancing
Abstract: In recent years, the aging of society has become a serious problem, especially in developed countries. Walking is an important element in extending healthy life expectancy in old age. In particular, induction of proper ankle joint alignment at heel contact is important during the gait cycle from the perspective of smooth weight transfer and reduction of burden on the knees and hip. In this study, we focus on the behavior of the ankle joint at heel contact and propose an insole-type assist device that can induce the ankle angle inversion/eversion rotation. The proposed device has tilting of the heel part from left to right in response to the rotation of a stepping motor, and an inertial sensor mounted inside controls the heel part to always maintain a horizontal position. The effectiveness of the proposed device is verified by evaluating the amount of lateral thrust of the knee joint of six healthy male subjects during a foot-stepping motion using motion capture system. The results showed that the amount of lateral thrust is significantly reduced by wearing the device with control.
|
|
08:54-09:00, Paper MoAT2.5 | Add to My Program |
Design for Hip Abduction Assistive Device Based on Relationship between Hip Joint Motion and Torque During Running |
|
Lee, Myunghyun | Agency for Defense Development |
Hong, Man Bok | Agency for Defense Development |
Kim, Gwang Tae | Agency for Defense Development |
Kim, Seonwoo | Agency for Defense Development |
Keywords: Physically Assistive Devices, Human Performance Augmentation, Mechanism Design
Abstract: Numerous attempts have been made to reduce metabolic energy while running with the help of assistive devices. A majority of studies on the assistive devices have focused on the assisting torque in the sagittal plane. In the case of running, however, the abduction torque in the frontal plane at the hip joint is greater than the flexion/extension torque in the sagittal plane. During running, as does an elastic body, the abduction torque and the motion of the hip joint have a linear relationship, but are opposite in direction. It is expected that the hip abduction torque can be assisted with a simple passive method by using an elastic body that reflects the movement characteristics of the hip joint. In this study, therefore, a system to assist hip abduction torque using a leaf spring was proposed with a prototype testing. While running with the assist system proposed, the leaf spring aids the abduction torque on the stance phase, and the torque is not generated due to the passive revolute joint on the swing phase. The joint angle is changed with respective to the rotation in the flexion/extension direction to prevent discomfort torque during swing phase and to increase the duration of the torque action during stance phase. A preliminary test was conducted on one subject using the prototype of the hip joint abduction torque assistive device. The participant with the assistive device reduced metabolic energy by 5% compared to the case without abduction torque assist while running at 2.5m/s. In order to increase the amount of metabolic reduction, the device shall be supplemented by system mass reduction and hip joint position optimization.
|
|
09:00-09:06, Paper MoAT2.6 | Add to My Program |
Dynamic Hand Proprioception Via a Wearable Glove with Fabric Sensors |
|
Behnke, Lily | Yale University |
Sanchez-Botero, Lina | Yale University |
Johnson, William | Yale University |
Agrawala, Anjali | Yale University |
Kramer-Bottiglio, Rebecca | Yale University |
Keywords: Wearable Robotics, Soft Sensors and Actuators, Soft Robot Materials and Design
Abstract: Continuous enhancement in wearable technologies has led to several innovations in the healthcare, virtual reality, and robotics sectors. One form of wearable technology is wearable sensors for kinematic measurements of human motion. However, measuring the kinematics of human movement is a challenging problem as wearable sensors need to conform to complex curvatures and deform without limiting the user's natural range of motion. In fine motor activities, such challenges are further exacerbated by the dense packing of several joints, coupled joint motions, and relatively small deformations. This work presents the design, fabrication, and characterization of a thin, breathable sensing glove capable of reconstructing fine motor kinematics. The fabric glove features capacitive sensors made from layers of conductive and dielectric fabrics, culminating in a non-bulky and discrete glove design. This study demonstrates that the glove can reconstruct the joint angles of the wearer with a root mean square error of 7.2 degrees, indicating promising applicability to dynamic pose reconstruction for wearable technology and robot teleoperation.
|
|
09:06-09:12, Paper MoAT2.7 | Add to My Program |
A Wearable Robotic Rehabilitation System for Neuro-Rehabilitation Aimed at Enhancing Mediolateral Balance |
|
Yu, Zhenyuan | North Carolina State University |
Nalam, Varun | North Carolina State University |
Alili, Abbas | NC State University |
Huang, He (Helen) | North Carolina State University and University of North Carolina |
Keywords: Rehabilitation Robotics, Prosthetics and Exoskeletons, Physical Human-Robot Interaction
Abstract: There is increasing evidence of the role of compromised mediolateral balance in falls and the need for rehabilitation specifically focused on mediolateral direction for various populations with motor deficits. To address this need, we have developed a neurorehabilitation platform by integrating a wearable robotic hip abduction-adduction exoskeleton with a visual interface. The platform is expected to influence and rehabilitate the underlying visuomotor mechanisms in individuals by having users perform motion tasks based on visual feedback while the robot applies various controlled resistances governed by the admittance controller implemented in the robot. A preliminary study was performed on 3 non disabled individuals to analyze the performance of the system and observe any adaptation in hip joint kinematics and kinetics as a result of the visuomotor training under 4 different admittance conditions. All three subjects exhibited increased consistency of motion during training and interlimb coordination to achieve motion tasks, demonstrating the utility of the system. Further analysis of observed human-robot torque interactions and electromyography (EMG) signals, and its implication in neurorehabilitation aimed at populations suffering from chronic stroke are discussed.
|
|
09:12-09:18, Paper MoAT2.8 | Add to My Program |
Analysis of Lower Extremity Shape Characteristics in Various Walking Situations for the Development of Wearable Robot |
|
Park, Joohyun | KAIST, KIST |
Choi, Ho Seon | Yonsei University |
In, HyunKi | Korea Institute of Science and Technology |
Keywords: Datasets for Human Motion, Wearable Robotics, Physical Human-Robot Interaction
Abstract: A strap is a frequently utilized component for securing wearable robots to their users in order to facilitate force transmission between humans and the devices. For the appropriate function of the wearable robot, the pressure between the strap and the skin should be maintained appropriately. Due to muscle contraction, the cross-section area of the human limb changes according to the movement of the muscle. The cross-section area change causes the change in the pressure applied by the strap. Therefore, for a new strap design to resolve this, it is necessary to understand the shape change characteristics of the muscle where the strap is applied. In this paper, the change in the circumference of the thigh and the calf during walking was measured and analyzed by multiple string pot sensors. With a treadmill and string pot sensors using potentiometers, torsion springs, and leg circumference changes were measured for different walking speeds and slopes. And, gait cycles were divided according to a signal from the FSR sensor inserted in the right shoe. From the experimental results, there were changes in the circumference of about 8.5mm and 3mm for the thigh and the calf, respectively. And we found tendencies in various walking circumstances such as walking speed and degree of the slope. It is confirmed that they can be used for estimation algorithms of gait cycles or gait circumstances.
|
|
09:18-09:24, Paper MoAT2.9 | Add to My Program |
Finding Biomechanically Safe Trajectories for Robot Manipulation of the Human Body in a Search and Rescue Scenario |
|
Peiros, Lizzie | University of California, San Diego |
Chiu, Zih-Yun | University of California, San Diego |
Zhi, Yuheng | University of California, San Diego |
Shinde, Nikhil | University of California San Diego |
Yip, Michael C. | University of California, San Diego |
Keywords: Physical Human-Robot Interaction, Modeling and Simulating Humans, Dynamics
Abstract: There has been increasing awareness of the difficulties in reaching and extracting people from mass casualty scenarios, such as those arising from natural disasters. While platforms have been designed to consider reaching casualties and even carrying them out of harm's way, the challenge of physically repositioning a casualty from its found configuration to one suitable for extraction has not been explicitly explored. Furthermore, this type of planning problem needs to incorporate biomechanical safety considerations for the casualty. Thus, we present the problem formulation for biomechanically safe trajectory generation for repositioning limbs of unconscious human casualties. We describe biomechanical safety in robotics terms, describe mechanical descriptions of the dynamics of the robot-human coupled system, and the planning and trajectory optimization process that considers this coupled and constrained system. We finally evaluate the work over several variations of the problem and provide a live example. This work provides a crucial part of search and rescue that can be used in conjunction with past and present works involving robots and vision systems designed for search and rescue.
|
|
09:24-09:30, Paper MoAT2.10 | Add to My Program |
Mechanical Characterisation of Woven Pneumatic Active Textile |
|
Marshall, Ruby | The University of Edinburgh |
Souppez, Jean-Baptiste | Aston University |
Khan, Mariya | Aston University |
Viola, Ignazio Maria | University of Edinburgh |
Nabae, Hiroyuki | Tokyo Institute of Technology |
Suzumori, Koichi | Tokyo Institute of Technology |
Stokes, Adam Andrew | University of Edinburgh |
Giorgio-Serchi, Francesco | University of Edinburgh |
Keywords: Wearable Robotics, Soft Robot Materials and Design, Hydraulic/Pneumatic Actuators
Abstract: Active textiles have shown promising applications in soft robotics owing to their tunable stiffness and design flexibility. Given the breadth of the design space for planar and spatial arrangements of these woven structures, a rig- orous and generalizable characterisation of these systems is not yet available. In order to characterize the response of a stereotypical woven pattern to actuation, we undertake a parametric study of plain weave active fabrics and characterise their mechanical properties in accordance with the relevant ISO standards for varying muscle densities and both monotonically increasing/decreasing pressures. Tensile and flexural tests were undertaken on five plain weave samples made of a nylon 6 (polyamide) warp and EM20 McKibben S-muscle weft, for input pressures ranging from 0.00 MPa to 0.60 MPa, at three muscle densities, namely 100 m^-1, 74.26 m^-1 and 47.62 m^-1. Contrary to intuition, we find that a lower muscle density has a more prominent impact on the thickness, but a significantly lesser one on length, highlighting a critical dependency on the relative orientation among the loading, the passive textile and the muscle filaments. Hysteretic behaviour as large as 10% of the longitudinal contraction is observed on individual filaments and woven textiles, and its onset is identified in the shear between the rubber tube and the outer sleeve of the artificial muscle. Hysteresis is shown to be muscle density-dependent and responsible for a strongly asymmetrical response upon different pressure inputs. These findings provide new insights into the mechanical properties of active textiles with tunable stiffness, and may contribute to future developments in wearable technologies and biomedical devices.
|
|
09:30-09:36, Paper MoAT2.11 | Add to My Program |
Adaptive Symmetry Reference Trajectory Generation in Shared Autonomy for Active Knee Orthosis |
|
Liu, Rongkai | University of Science and Technology of China(USTC) |
Ma, Tingting | Chinese Academy of Sciences |
Yao, Ningguang | University of Science and Technology of China |
Li, Hao | Chinese Academy of Sciences |
Zhao, Xinyan | University of Science and Technology of China |
Wang, Yu | University of Science and Technology of China |
Pan, Hongqing | Hefei Institutes of Physical Science |
Song, Quanjun | Chinese Academy of Science |
Keywords: Human-Centered Robotics, Rehabilitation Robotics, Human-Robot Collaboration
Abstract: Gait symmetry training plays an essential role in the rehabilitation of hemiplegic patients and robotics-based gait training has been widely accepted by patients and clinicians. Reference trajectory generation for the affected side using the motion data of the unaffected side is an important way to achieve this. However, online generation gait reference trajectory requires the algorithm to provide correct gait phase delay and could reduce the impact of measurement noise from sensors and input uncertainty from users. Based on an active knee orthosis (AKO) prototype, this work presents an adaptive symmetric gait trajectory generation framework for the gait rehabilitation of hemiplegic patients. Using the adaptive nonlinear frequency oscillators (ANFO) and movement primitives, we implement online gait pattern encoding and adaptive phase delay according to the real-time user input. A shared autonomy (SA) module with online input validation and arbitration has been designed to prevent undesired movements from being transmitted to the actuator on the affected side. The experimental results demonstrate the feasibility of the framework. Overall, this work suggests that the proposed method has the potential to perform gait symmetry rehabilitation in an unstructured environment and provide a kinematic reference for torque-assist AKO.
|
|
09:36-09:42, Paper MoAT2.12 | Add to My Program |
Data-Driven Modeling for Gait Phase Recognition in a Wearable Exoskeleton Using Estimated Forces (I) |
|
Park, Kyeong-Won | Republic of Korea Air Force Academy |
Choi, Jungsu | Yeungnam University |
Kong, Kyoungchul | Korea Advanced Institute of Science and Technology |
Keywords: Wearable Robots, AI-Based Methods, Human-Centered Robotics, Robust/Adaptive Control of Robotic Systems
Abstract: Accurate identification of gait phases is critical in effectively assessing the assistance provided by lower-limb exoskeletons. In this study, we propose a novel gait phase recognition system called ObsNet to analyze the gait of individuals with spinal cord injuries (SCI). To ensure the reliable use of exoskeletons, it is essential to maintain practicality and avoid exposing the system to unnecessary risks of fatigue, inaccuracy, or incompatibility with human-centered devices. Therefore, we propose a new approach to characterize exoskeletal-assisted gait by estimating forces on exoskeletal joints during walking. Although these estimated forces are potentially useful for detecting gait phases, their nonlinearities make it challenging for existing algorithms to generalize accurately. To address this challenge, we introduce a data-driven model that simultaneously captures both feature extraction and order dependencies, and enhance its performance through a threshold-based compensational method to filter out momentary errors. We evaluated the effectiveness of ObsNet through robotic walking experiments with two practical users with complete paraplegia. Our results indicate that ObsNet outperformed state-of-the-art methods that use joint information and other recurrent networks in identifying the gait phases of individuals with SCI (p < 0.05). We also observed reliable imitation of ground truth after compensation. Overall, our research highlights the potential of wearable technology to improve the daily lives of individuals with disabilities through accurate and stable state assessment.
|
|
MoAT3 Regular session, 140C |
Add to My Program |
Collision Avoidance I |
|
|
Chair: Panagou, Dimitra | University of Michigan, Ann Arbor |
Co-Chair: Pierson, Alyssa | Boston University |
|
08:30-08:36, Paper MoAT3.1 | Add to My Program |
Dynamic Multi-Query Motion Planning with Differential Constraints and Moving Goals |
|
Gentner, Michael | Technical University of Munich and BMW AG |
Zillenbiller, Fabian | Technical University of Munich and BMW AG |
Kraft, André | BMW AG, Germany |
Steinbach, Eckehard | Technical University of Munich |
Keywords: Collision Avoidance, Motion and Path Planning, Industrial Robots
Abstract: Planning robot motions in complex environments is a fundamental research challenge and central to the autonomy, efficiency, and ultimately adoption of robots. While often the environment is assumed to be static, real-world settings, such as assembly lines, contain complex shaped, moving obstacles and changing target states. Therein robots must perform safe and efficient motions to achieve their tasks. In repetitive environments and multi-goal settings, reusable roadmaps can substantially reduce the overall query time. Most dynamic roadmap-based planners operate in state-time-space, which is computationally demanding. Interval-based methods store availabilities as node attributes and thereby circumvent the dimensionality increase. However, current approaches do not consider higher-order constraints, which can ultimately lead to collisions during execution. Furthermore, current approaches must replan when the goal changes. To this end, we propose a novel roadmap-based planner for systems with third-order differential constraints operating in dynamic environments with moving goals. We construct a roadmap with availabilities as node attributes. During the query phase, we use a Double-Integrator Minimum Time (DIMT) solver to recursively build feasible trajectories and accurately estimate arrival times. An exit node set in combination with a moving goal heuristic is used to efficiently find the fastest path through the roadmap to the moving goal. We evaluate our method with a simulated UAV operating in dynamic 2D environments and show that it also transfers to a 6-DoF manipulator. We show higher success rates than other state-of-the-art methods both in collision avoidance and reaching a moving goal.
|
|
08:36-08:42, Paper MoAT3.2 | Add to My Program |
Reactive and Safe Co-Navigation with Haptic Guidance |
|
Coffey, Mela | Boston University |
Zhang, Dawei | Boston University |
Tron, Roberto | Boston University |
Pierson, Alyssa | Boston University |
Keywords: Collision Avoidance, Telerobotics and Teleoperation, Human-Robot Collaboration
Abstract: We propose a co-navigation algorithm that enables a human and a robot to work together to navigate to a common goal. In this system, the human is responsible for making high-level steering decisions, and the robot, in turn, provides haptic feedback for collision avoidance and path suggestions while reacting to changes in the environment. Our algorithm uses optimized Rapidly-exploring Random Trees (RRT*) to generate paths to lead the user to the goal, via an attractive force feedback computed using a Control Lyapunov Function (CLF). We simultaneously ensure collision avoidance where necessary using a Control Barrier Function (CBF). We demonstrate our approach using simulations with a virtual pilot, and hardware experiments with a human pilot. Our results show that combining RRT* and CBFs is a promising tool for enabling collaborative human-robot navigation.
|
|
08:42-08:48, Paper MoAT3.3 | Add to My Program |
An MCTS-DRL Based Obstacle and Occlusion Avoidance Methodology in Robotic Follow-Ahead Applications |
|
Leisiazar, Sahar | Simon Fraser University |
Park, Edward J. | Simon Fraser University |
Lim, Angelica | Simon Fraser University |
Chen, Mo | Simon Fraser University |
Keywords: Robot Companions, Collision Avoidance, AI-Enabled Robotics
Abstract: We propose a novel methodology for robotic follow-ahead applications that address the critical challenge of obstacle and occlusion avoidance. Our approach effectively navigates the robot while ensuring avoidance of collisions and occlusions caused by surrounding objects. To achieve this, we developed a high-level decision-making algorithm that generates short-term navigational goals for the mobile robot. Monte Carlo Tree Search is integrated with a Deep Reinforcement Learning method to enhance the performance of the decision-making process and generate more reliable navigational goals. Through extensive experimentation and analysis, we demonstrate the effectiveness and superiority of our proposed approach in comparison to the existing follow-ahead human-following robotic methods. Our code is available at https://github.com/saharLeisiazar/follow-ahead-ros.
|
|
08:48-08:54, Paper MoAT3.4 | Add to My Program |
Proactive Model Predictive Control with Multi-Modal Human Motion Prediction in Cluttered Dynamic Environments |
|
Heuer, Lukas | Örebro University, Robert Bosch GmbH |
Palmieri, Luigi | Robert Bosch GmbH |
Rudenko, Andrey | Robert Bosch GmbH |
Mannucci, Anna | Robert Bosch GmbH Corporate Research |
Magnusson, Martin | Örebro University |
Arras, Kai Oliver | Bosch Research |
Keywords: Collision Avoidance, Human-Aware Motion Planning, Motion and Path Planning
Abstract: For robots navigating in dynamic environments, exploiting and understanding uncertain human motion prediction is key to generate efficient, safe and legible actions. The robot may perform poorly and cause hindrances if it does not reason over possible, multi-modal future social interactions. With the goal of further enhancing autonomous navigation in cluttered environments, we propose a novel formulation for nonlinear model predictive control including multi-modal predictions of human motion. As a result, our approach leads to less conservative, smooth and intuitive human-aware navigation with reduced risk of collisions, and shows a good balance between task efficiency, collision avoidance and human comfort. To show its effectiveness, we compare our approach against the state of the art in crowded simulated environments, and with real-world human motion data from the THOR dataset. This comparison shows that we are able to improve task efficiency, keep a larger distance to humans and significantly reduce the collision time, when navigating in cluttered dynamic environments. Furthermore, the method is shown to work robustly with different state-of-the-art human motion predictors.
|
|
08:54-09:00, Paper MoAT3.5 | Add to My Program |
A Novel Obstacle-Avoidance Solution with Non-Iterative Neural Controller for Joint-Constrained Redundant Manipulators |
|
Li, Weibing | Sun Yat-Sen University |
Yi, Zilian | Sun Yat-Sen University |
Zou, Yanying | Sun Yat-Sen University |
Wu, Haimei | Sun Yat-Sen University |
Yang, Yang | Sun Yat-Sen University |
Pan, Yongping | Sun Yat-Sen University |
Keywords: Collision Avoidance, Optimization and Optimal Control, Redundant Robots
Abstract: Obstacle avoidance (OA) and joint-limit avoidance (JLA) are essential for redundant manipulators to ensure safe and reliable robotic operations. One solution to OA and JLA is to incorporate the involved constraints into a quadratic programming (QP), by solving which OA and JLA can be achieved. There exist a few non-iterative solvers such as zeroing neural networks (ZNNs), which can solve each sampled QP problem using only one iteration, yet no solution is suitable for OA and JLA due to the absence of some derivative information. To tackle these issues, this paper proposes a novel solution with a non-iterative neural controller termed NCP-ZNN for joint-constrained redundant manipulators. Unlike iterative methods, the neural controller involving derivative information proposed in this paper possesses some positive features including non-iterative computing and convergence with time. In this paper, the reestablished OA-JLA scheme is first introduced. Then, the design details of the neural controller are presented. After that, some comparative simulations based on a PA10 robot and an experiment based on a Franka Emika Panda robot are conducted, demonstrating that the proposed neural controller is more competent in OA and JLA.
|
|
09:00-09:06, Paper MoAT3.6 | Add to My Program |
TTC4MCP: Monocular Collision Prediction Based on Self-Supervised TTC Estimation |
|
Li, Changlin | Shanghai Jiao Tong University |
Qian, Yeqiang | Shanghai Jiao Tong University |
Sun, Cong | Shanghai Jiao Tong University |
Yan, Weihao | Shanghai Jiao Tong University |
Wang, Chunxiang | Shanghai Jiaotong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Collision Avoidance, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Vision-based collision prediction for autonomous driving is a challenging task due to the dynamic movement of vehicles and diverse types of obstacles. Most existing methods rely on object detection algorithms, which only predict predefined collision targets, such as vehicles and pedestrians, and cannot anticipate emergencies caused by unknown obstacles. To address this limitation, we propose a novel approach using pixel-wise time-to-collision (TTC) estimation for monocular collision prediction (TTC4MCP). Our approach predicts TTC and optical flow from monocular images and identifies potential collision areas using feature clustering and motion analysis. To overcome the challenge of training TTC estimation models without ground truth data in new scenes, we propose a self-supervised TTC training method, enabling collision prediction in a wider range of scenarios. TTC4MCP is evaluated on multiple road conditions and demonstrates promising results in terms of accuracy and robustness.
|
|
09:06-09:12, Paper MoAT3.7 | Add to My Program |
DAMON: Dynamic Amorphous Obstacle Navigation Using Topological Manifold Learning and Variational Autoencoding |
|
Dastider, Apan | University of Central Florida |
Mingjie, Lin | University of Central Florida |
Keywords: Collision Avoidance, Deep Learning Methods, Motion and Path Planning
Abstract: DAMON leverages manifold learning and vari- ational autoencoding to achieve obstacle avoidance, allowing for motion planning through adaptive graph traversal in a pre-learned low-dimensional hierarchically-structured manifold graph that captures intricate motion dynamics between a robotic arm and its obstacles. This versatile and reusable approach is applicable to various collaboration scenarios. The primary advantage of DAMON is its ability to embed information in a low-dimensional graph, eliminating the need for repeated computation required by current sampling-based methods. As a result, it offers faster and more efficient motion planning with significantly lower computational overhead and memory footprint. In summary, DAMON is a breakthrough methodology that addresses the challenge of dynamic obstacle avoidance in robotic systems and offers a promising solution for safe and efficient human-robot collaboration. Our approach has been experimentally validated on a 7-DoF robotic manipulator in both simulation and physical settings. DAMON enables the robot to learn and generate skills for avoiding previously-unseen obstacles while achieving predefined objectives. We also optimize DAMON’s design parameters and performance using an analytical framework. Our approach outperforms mainstream methodologies, including RRT, RRT*, Dynamic RRT*, L2RRT, and MpNet, with 40% more trajectory smoothness and over 65% improved latency performance, on average.
|
|
09:12-09:18, Paper MoAT3.8 | Add to My Program |
Gatekeeper: Online Safety Verification and Control for Nonlinear Systems in Dynamic Environments |
|
Agrawal, Devansh | University of Michigan |
Chen, Ruichang | University of Michigan |
Panagou, Dimitra | University of Michigan, Ann Arbor |
Keywords: Collision Avoidance, Motion and Path Planning
Abstract: This paper presents the gatekeeper algorithm, a real-time and computationally-lightweight method to ensure that nonlinear systems can operate safely in dynamic environments despite limited perception. Gatekeeper integrates with existing path planners and feedback controllers by introducing an additional verification step that ensures that proposed trajectories can be executed safely, despite nonlinear dynamics subject to bounded disturbances, input constraints and partial knowledge of the environment. Our key contribution is that (A) we propose an algorithm to recursively construct committed trajectories, and (B) we prove that tracking the committed trajectory ensures the system is safe for all time into the future. The method is demonstrated on a complicated firefighting mission in a dynamic environment, and compares against the state-of-the-art techniques for similar problems.
|
|
09:18-09:24, Paper MoAT3.9 | Add to My Program |
Combinatorial Disjunctive Constraints for Obstacle Avoidance in Path Planning |
|
Garcia, Raul | Rice University |
Hicks, Illya V. | Rice University |
Huchette, Joey | Google Research |
Keywords: Collision Avoidance, Motion and Path Planning, Optimization and Optimal Control
Abstract: We present a new approach for modeling avoidance constraints in 2D environments, in which waypoints are assigned to obstacle-free polyhedral regions. Constraints of this form are often formulated as mixed-integer programming (MIP) problems employing big-M techniques - however, these are generally not the strongest formulations possible with respect to the MIP's convex relaxation (so called ideal formulations), potentially resulting in larger computational burden. We instead model obstacle avoidance as combinatorial disjunctive constraints and leverage the independent branching scheme to construct small, ideal formulations. As our approach requires a biclique cover for an associated graph, we exploit the structure of this class of graphs to develop a fast subroutine for obtaining biclique covers in polynomial time. We also contribute an open-source Julia library named ClutteredEnvPathOpt to facilitate computational experiments of MIP formulations for obstacle avoidance. Experiments have shown our formulation is more compact and remains competitive on a number of instances compared with standard big-M techniques, for which solvers possess highly optimized procedures.
|
|
09:24-09:30, Paper MoAT3.10 | Add to My Program |
Reachability-Aware Collision Avoidance for Tractor-Trailer System with Non-Linear MPC and Control Barrier Function |
|
Tang, Yucheng | University of Applied Sciences Karlsruhe |
Mamaev, Ilshat | Karlsruhe Institute of Technology |
Qin, Jing | Karlsruhe University of Applied Sciences |
Wurll, Christian | Karlsruhe University of Applied Sciences |
Hein, Björn | Karlsruhe University of Applied Sciences |
Keywords: Collision Avoidance, Optimization and Optimal Control, Nonholonomic Motion Planning
Abstract: This paper proposes a reachability-aware model predictive control with a discrete control barrier function for backward obstacle avoidance for a tractor-trailer system. The framework incorporates the state-variant reachable set obtained through sampling-based reachability analysis and symbolic regression into the objective function of model predictive control. By optimizing the intersection of the reachable set and iterative non-safe region generated by the control barrier function, the system demonstrates better performance in terms of safety with a constant decay rate, while enhancing the feasibility of the optimization problem. The proposed algorithm improves real-time performance due to a shorter horizon and outperforms the state-of-the-art algorithms in the simulation environment and on a real robot.
|
|
09:30-09:36, Paper MoAT3.11 | Add to My Program |
Continuous Implicit SDF Based Any-Shape Robot Trajectory Optimization |
|
Zhang, Tingrui | Zhejiang University |
Wang, Jingping | Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Alan | Fan'gang |
Gao, Fei | Zhejiang University |
Keywords: Collision Avoidance, Whole-Body Motion Planning and Control, Motion and Path Planning
Abstract: Optimization-based trajectory generation methods are widely used in whole-body planning for robots. However, existing work either oversimplifies the robot’s geometry and environment representation, resulting in a conservative trajectory or suffers from a huge overhead in maintaining additional information such as the Signed Distance Field (SDF). To bridge the gap, we consider the robot as an implicit function, with its surface boundary represented by the zero-level set of its SDF. We further employ another implicit function to lazily compute the signed distance to the swept volume generated by the robot and its trajectory. The computation is efficient by exploiting continuity in space-time, and the implicit function guarantees precise and continuous collision evaluation even for nonconvex robots with complex surfaces. We also propose a trajectory optimization pipeline applicable to the implicit SDF. Simulation and real-world experiments validate the high performance of our approach for arbitrarily shaped robot trajectory optimization. The code will be released at https://github.com/ZJU-FAST-Lab/Implicit-SDF-Planner.
|
|
09:36-09:42, Paper MoAT3.12 | Add to My Program |
Robo-Centric ESDF: A Fast and Accurate Whole-Body Collision Evaluation Tool for Any-Shape Robotic Planning |
|
Geng, Shuang | Zhejiang University |
Wang, Qianhao | Zhejiang University |
Xie, Lei | State Key Laboratory of Industrial Control Technology, Zhejiang |
Xu, Chao | Zhejiang University |
Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Gao, Fei | Zhejiang University |
Keywords: Collision Avoidance, Motion and Path Planning
Abstract: For letting mobile robots travel flexibly through complicated environments, increasing attention has been paid to the whole-body collision evaluation. Most existing works either opt for the conservative corridor-based methods that impose strict requirements on the corridor generation, or ESDF-based methods that suffer from high computational overhead. It is still a great challenge to achieve fast and accurate whole-body collision evaluation. In this paper, we propose a Robo-centric ESDF (RC-ESDF) that is pre-built in the robot body frame and is capable of seamlessly applied to any-shape mobile robots, even for those with non-convex shapes. RC-ESDF enjoys lazy collision evaluation, which retains only the minimum information sufficient for whole-body safety constraint and significantly speed up trajectory optimization. Based on the analytical gradients provided by RC-ESDF, we optimize the position and rotation of robot jointly, with whole-body safety, smoothness, and dynamical feasibility taken into account. Extensive simulation and real-world experiments verified the reliability and generalizability of our method.
|
|
09:42-09:48, Paper MoAT3.13 | Add to My Program |
Global Map Assisted Multi-Agent Collision Avoidance Via Deep Reinforcement Learning Around Complex Obstacles |
|
Du, Yuanyuan | Cuhk, Sz |
Zhang, Jianan | Peking University |
Xu, Jie | Cush, Sz |
Cheng, Xiang | Pku |
Cui, Shuguang | Cush, Sz |
Keywords: Collision Avoidance, Motion and Path Planning, Reinforcement Learning
Abstract: State-of-the-art multi-agent collision avoidance algorithms face limitations when applied to cluttered public environments, where obstacles may have a variety of shapes and structures. The issue arises because most of these algorithms are agent-level methods. They concentrate solely on preventing collisions between the agents while the obstacles are handled merely out-of-policy. Obstacle-aware policies output an action considering both agents and obstacles. Current obstacle-aware algorithms, mainly based on Lidar sensor data, struggle to handle collision avoidance around complex obstacles. To resolve this issue, this paper investigates how to find a better way to travel around diverse obstacles. In particular, we present a global map assisted collision avoidance algorithm which, following the lead of a high-level goal guide and using an obstacle representation called distance map, considers other agents and obstacles simultaneously. Moreover, our model can be loaded into each agent individually, making it applicable to large maps or more agents. Simulation results indicate that our model outperforms the state-of-the-art algorithms, showing in scenarios with complex obstacles. We present a notion for incorporating global information in decentralized decision-making, along with a method for extending agent-level algorithms to adjust to cluttered environments in real-world scenarios.
|
|
MoAT4 Regular session, 140D |
Add to My Program |
Control Applications |
|
|
Chair: Stuart, Hannah | UC Berkeley |
Co-Chair: Poonawala, Hasan A. | University of Kentucky |
|
08:30-08:36, Paper MoAT4.1 | Add to My Program |
A Geometric Sufficient Condition for Contact Wrench Feasibility |
|
Li, Shenggao | University of Notre Dame |
Chen, Hua | Southern University of Science and Technology |
Zhang, Wei | Southern University of Science and Technology |
Wensing, Patrick M. | University of Notre Dame |
Keywords: Body Balancing, Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control
Abstract: A fundamental problem in legged locomotion is to verify whether a desired trajectory satisfies all physical constraints, especially those for maintaining the contacts. Although foot tipping can be avoided via the Zero Moment Point (ZMP) condition, preventing foot sliding and twisting leads to the more complex Contact Wrench Cone (CWC) constraints. This paper proposes an efficient algorithm to certify the inclusion of a net contact wrench in the CWC on flat ground with uniform friction. In addition to checking the ZMP criteria, the proposed method also verifies whether the linear force and the yaw moment are feasible. The key step in the algorithm is a new exact geometric characterization of the yaw moment limits in the case when the support polygon is approximated by a single supporting line. We propose two approaches to select this approximating line, providing an accurate inner approximation of the ground truth yaw moment limits with only 18.80% (resp. 7.13%) error. The methods require only 1/150 (resp. 1/139) computation time compared to the exact CWC method based on conic programming. As a benchmark, approximating the CWC using square friction pyramids requires similar computation times as the exact CWC, but has > 19.35% error. Unlike the ZMP condition, our method provides a sufficient condition for contact wrench feasibility.
|
|
08:36-08:42, Paper MoAT4.2 | Add to My Program |
Aggregating Single-Wheeled Mobile Robots for Omnidirectional Movements |
|
Wang, Meng | Beijing Institute for General Artificial Intelligence |
Su, Yao | Beijing Institute for General Artificial Intelligence |
Li, Hang | Beijing Institute for General Artificial Intelligence |
Li, Jiarui | Peking University |
Liang, Jixaing | Beihang University |
Liu, Hangxin | Beijing Institute for General Artificial Intelligence (BIGAI) |
Keywords: Education Robotics, Art and Entertainment Robotics
Abstract: This paper presents a novel modular robot system that can self-reconfigure to achieve omnidirectional movements for collaborative object transportation. Each robotic module is equipped with a steerable omni-wheel for navigation and is shaped as a regular icositetragon with a permanent magnet installed on each corner for stable docking. After aggregating multiple modules and forming a structure that can cage a target object, we have developed an optimization-based method to compute the distribution of all wheels' heading directions, which enables efficient omnidirectional movements of the structure. By implementing a hierarchical controller on our prototyped system in both simulation and experiment, we validated the trajectory-tracking performance of an individual module and a team of six modules in multiple navigation and collaborative object transportation setting. The results demonstrate that the proposed system can maintain a stable caging formation and achieve smooth transportation, indicating the effectiveness of our hardware and locomotion designs.
|
|
08:42-08:48, Paper MoAT4.3 | Add to My Program |
An On-Wall-Rotating Strategy for Effective Upstream Motion of Untethered Millirobot: Principle, Design and Demonstration (I) |
|
Yang, Liu | City University of Hong Kong |
Zhang, Tieshan | City University of Hong Kong |
Huang, Han | City University of Hong Kong |
Ren, Hao | City University of Hongkong |
Shang, Wanfeng | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Shen, Yajing | The Hong Kong University of Science and Technology |
Keywords: on-wall-rotating, Medical Robots and Systems, Modeling, Control, and Learning for Soft Robots, Micro/Nano Robots
Abstract: Untethered miniature robots that can access narrow and harsh environments in the body show great potential for future biomedical applications. Despite many types of millirobot have been developed, swimming against the fast blood flow remains a big challenge due to the low staying still ability of the robot and the large hydraulic resistance from blood. This work proposes an on-wall-rotating strategy and a streamlined millirobot to achieve the effective upstream motion in the lumen. First, the principle of on-wall-rotating strategy and the dynamic motion model of the millirobot is established. Then, a critical safety angle θs is theoretically and experimentally analyzed for the safe and stable control of the robot. After that, a series of experiment are conducted to verify the proposed driving strategy. The resutls suggest that the robot is able to move at speed of 5 mm/s against flow velocity of 138 mm/s, which is comparable to the blood flow of 2700 mm3 /s and several times faster than other reported driving strategies. This work offers a new strategy for the untethered magnetic robot construction and control for blood vessels, which would promote the application of millirobot for biomedical engineering.
|
|
08:48-08:54, Paper MoAT4.4 | Add to My Program |
Smooth Stride Length Change of Rat Robot with a Compliant Actuated Spine Based on CPG Controller |
|
Huang, Yuhong | Technische Universität München |
Bing, Zhenshan | Technical University of Munich |
Zhang, Zitao | Sun Yat-Sen University |
Huang, Kai | Sun Yat-Sen University |
Morin, Fabrice O. | Technische Universität München |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Robust/Adaptive Control, Motion Control, Biologically-Inspired Robots
Abstract: The aim of this research is to investigate the relationship between spinal flexion and quadruped locomotion in a rat robot equipped with a compliant spine, controlled by a central pattern generator (CPG). The study reveals that spinal flexion can enhance limb stride length, but it may also cause significant and unexpected motion disturbances during stride length variations. To address this issue, this paper proposes a CPG model driven by spinal flexion and a novel oscillator that incorporates a circular limit cycle and accounts for the anticipated stride length transition process. This approach effectively matches the torque change with the dynamics of stride length changes, leading to lower energy consumption. Extensive simulations are conducted to evaluate the efficacy of the proposed oscillator and compare it with the original kinetic model and other CPG models. The results demonstrate that the designed CPG model with the proposed oscillator yields smoother gait transitions during stride length variations and reduces energy consumption.
|
|
08:54-09:00, Paper MoAT4.5 | Add to My Program |
Learning Terrain-Adaptive Locomotion with Agile Behaviors by Imitating Animals |
|
Li, Tingguang | The Chinese University of Hong Kong |
Zhang, Yizheng | Tencent |
Zhang, Chong | Tencent |
Zhu, Qingxu | Tencent |
Sheng, Jiapeng | Shandong University |
Chi, Wanchao | Tencent |
Zhou, Cheng | Tencent |
Han, Lei | Tencent Robotics X |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, AI-Based Methods
Abstract: In this paper, we present a general learning framework for controlling a quadruped robot that can mimic the behavior of real animals and traverse challenging terrains. Our method consists of two steps: an imitation learning step to learn from motions of real animals, and a terrain adaptation step to enable generalization to unseen terrains. We capture motions from a Labrador on various terrains to facilitate terrain adaptive locomotion. Our experiments demonstrate that our policy can traverse various terrains and produce a natural-looking behavior. We deployed our method on the real quadruped robot Max via zero-shot simulation-to-reality transfer, achieving a speed of 1.1 m/s on stairs climbing.
|
|
09:00-09:06, Paper MoAT4.6 | Add to My Program |
A Stable Adaptive Extended Kalman Filter for Estimating Robot Manipulators Link Velocity and Acceleration |
|
Baradaran Birjandi, Seyed Ali | Technical University of Munich |
Khurana, Harshit | EPFL |
Billard, Aude | EPFL |
Haddadin, Sami | Technical University of Munich |
Keywords: Sensor Fusion, Kinematics
Abstract: One can estimate the velocity and acceleration of robot manipulators by utilizing nonlinear observers. This involves combining inertial measurement units (IMUs) with the motor encoders of the robot through a model-based sensor fusion technique. This approach is lightweight, versatile (suitable for a wide range of trajectories and applications), and straightforward to implement. In order to further improve the estimation accuracy while running the system, we propose to adapt the noise information in this paper. This would automatically reduce the system vulnerability to imperfect modelings and sensor changes. Moreover, viable strategies to maintain the system stability are introduced. Finally, we thoroughly evaluate the overall framework with a seven DoF robot manipulator whose links are equipped with IMUs.
|
|
09:06-09:12, Paper MoAT4.7 | Add to My Program |
Provably Correct Sensor-Driven Path-Following for Unicycles Using Monotonic Score Functions |
|
Clark, Benton | University of Kentucky |
Hariprasad, Varun | Paul Laurence Dunbar High School |
Poonawala, Hasan A. | University of Kentucky |
Keywords: Sensor-based Control, Autonomous Vehicle Navigation, Machine Learning for Robot Control
Abstract: This paper develops a provably stable sensor-driven controller for path-following applications of robots with unicycle kinematics, one specific class of which is the wheeled mobile robot (WMR). The sensor measurement is converted to a scalar value (the score) through some mapping (the score function); the latter may be designed or learned. The score is then mapped to forward and angular velocities using a simple rule with three parameters. The key contribution is that the correctness of this controller only relies on the score function satisfying monotonicity conditions with respect to the underlying state - local path coordinates - instead of achieving specific values at all states. The monotonicity conditions may be checked online by moving the WMR, without state estimation, or offline using a generative model of measurements such as in a simulator. Our approach provides both the practicality of a purely measurement-based control and the correctness of state-based guarantees. We demonstrate the effectiveness of this path-following approach on both a simulated and a physical WMR that use a learned score function derived from a binary classifier trained on real depth images.
|
|
09:12-09:18, Paper MoAT4.8 | Add to My Program |
Contact Reduction with Bounded Stiffness for Robust Sim-To-Real Transfer of Robot Assembly |
|
Nghia, Vuong | Nanyang Technological University |
Pham, Quang-Cuong | NTU Singapore |
Keywords: Simulation and Animation, Reinforcement Learning, Machine Learning for Robot Control
Abstract: In sim-to-real Reinforcement Learning (RL), a policy is trained in a simulated environment and then deployed on the physical system. The main challenge of sim-to-real RL is to overcome the emph{reality gap} - the discrepancies between the real world and its simulated counterpart. Using generic geometric representations, such as convex decomposition, triangular mesh, signed distance field can improve simulation fidelity, and thus potentially narrow the reality gap. Common to these approaches is that many contact points are generated for geometrically-complex objects, which slows down simulation and may cause numerical instability. Contact reduction methods address these issues by limiting the number of contact points, but the validity of these methods for sim-to-real RL has not been confirmed. In this paper, we present a contact reduction method with bounded stiffness to improve the simulation accuracy. Our experiments show that the proposed method critically enables training RL policy for a tight-clearance double pin insertion task and successfully deploying the policy on a rigid, position-controlled physical robot.
|
|
09:18-09:24, Paper MoAT4.9 | Add to My Program |
Trajectory Tracking Via Multiscale Continuous Attractor Networks |
|
Joseph, Therese | Queensland University of Technology |
Fischer, Tobias | Queensland University of Technology |
Milford, Michael J | Queensland University of Technology |
Keywords: Neurorobotics, Cognitive Modeling
Abstract: Animals and insects showcase remarkably robust and adept navigational abilities, up to literally circumnavigating the globe. Primary progress in robotics inspired by these natural systems has occurred in two areas: highly theoretical computational neuroscience models, and handcrafted systems like RatSLAM and NeuroSLAM. In this research, we present work bridging the gap between the two, in the form of Multiscale Continuous Attractor Networks (MCAN), that combine the multiscale parallel spatial neural networks of the previous theoretical models with the real-world robustness of the robot-targeted systems, to enable trajectory tracking over large velocity ranges. To overcome the limitations of the reliance of previous systems on hand-tuned parameters, we present a genetic algorithm-based approach for automated tuning of these networks, substantially improving their usability. To provide challenging navigational scale ranges, we open source a flexible city-scale navigation simulator that adapts to any street network, enabling high throughput experimentation. In extensive experiments using the city-scale navigation environment and Kitti, we show that the system is capable of stable dead reckoning over a wide range of velocities and environmental scales, where a single-scale approach fails.
|
|
09:24-09:30, Paper MoAT4.10 | Add to My Program |
Design and Control of a Ballbot Drivetrain with High Agility, Minimal Footprint, and High Payload |
|
Xiao, Chenzhang | University of Illinois at Urbana-Champaign |
Mansouri, Mahshid | University of Illinois at Urbana-Champaign |
Lam, David | University of Michigan - Ann Arbor |
Ramos, Joao | University of Illinois at Urbana-Champaign |
Hsiao-Wecksler, Elizabeth T. | University of Illinois at Urbana-Champaign |
Keywords: Body Balancing, Wheeled Robots, Underactuated Robots
Abstract: This paper presents the design and control of a ballbot drivetrain that aims to achieve high agility, minimal footprint, and high payload capacity while maintaining dynamic stability. Two hardware platforms and analytical models were developed to test design and control methodologies. The full-scale ballbot prototype (MiaPURE) was constructed using off-the-shelf components and designed to have agility, footprint, and balance similar to that of a walking human. The planar inverted pendulum testbed (PIPTB) was developed as a reduced-order testbed for quick validation of system performance. We then proposed a simple yet robust cascaded LQR-PI controller to balance and maneuver the ballbot drivetrain with a heavy payload. This is crucial because the drivetrain is often subject to high stiction due to elastomeric components in the torque transmission system. This controller was first tested in the PIPTB to compare with traditional LQR and cascaded PI-PD controllers, and then implemented in the ballbot drivetrain. The MiaPURE drivetrain was able to carry a payload of 60 kg, achieve a maximum speed of 2.3 m/s, and come to a stop from a speed of 1.4 m/s in 2 seconds in a selected translation direction. Finally, we demonstrated the omnidirectional movement of the ballbot drivetrain in an indoor environment as a payload-carrying robot and a human-riding mobility device. Our experiments demonstrated the feasibility of using the ballbot drivetrain as a universal mobility platform with agile movements, minimal footprint, and high payload capacity using our proposed design and control methodologies.
|
|
09:30-09:36, Paper MoAT4.11 | Add to My Program |
A Bayesian Reinforcement Learning Method for Periodic Robotic Control under Significant Uncertainty |
|
Jia, Yuanyuan | Ritsumeikan University |
Uriguen Eljuri, Pedro Miguel | Ritsumeikan University |
Taniguchi, Tadahiro | Ritsumeikan University |
Keywords: Dexterous Manipulation, Medical Robots and Systems, Reinforcement Learning
Abstract: This paper addresses the lack of research on periodic reinforcement learning for physical robot control by presenting a 3-phase periodic Bayesian reinforcement learning method for uncertain environments. Drawing on cognition theory, the proposed approach achieves effective convergence with fewer training episodes. The coach-based demonstration phase narrows the search space and establishes a foundation for a coarse-to-fine control strategy. The reconnaissance phase enhances adaptability by discovering a valuable global representation, and the operation phase produces accurate robotic control by applying the learned representation and periodically updating local information. Comparative analysis with state-of-the-art methods validates the efficacy of our approach on exemplar control tasks in simulation and a biomedical project involving a simulated cranial window task.
|
|
09:36-09:42, Paper MoAT4.12 | Add to My Program |
Residual Physics Learning and System Identification for Sim-To-Real Transfer of Policies on Buoyancy Assisted Legged Robots |
|
Sontakke, Nitish Rajnish | Georgia Institute of Technology |
Chae, Hosik | University of California at Los Angeles |
Lee, Sangjoon | University of California, Los Angeles |
Huang, Tianle | Georgia Institute of Technology |
Hong, Dennis | UCLA |
Ha, Sehoon | Georgia Institute of Technology |
Keywords: Model Learning for Control, Reinforcement Learning, Legged Robots
Abstract: The light and soft characteristics of Buoyancy Assisted Lightweight Legged Unit (BALLU) robots have a great potential to provide intrinsically safe interactions in environments involving humans, unlike many heavy and rigid robots. However, their unique and sensitive dynamics impose challenges to obtaining robust control policies in the real world. In this work, we demonstrate robust sim-to-real transfer of control policies on the BALLU robots via system identification and our novel residual physics learning method, Environment Mimic (EnvMimic). First, we model the nonlinear dynamics of the actuators by collecting hardware data and optimizing the simulation parameters. Rather than relying on standard supervised learning formulations, we utilize deep reinforcement learning to train an external force policy to match real-world trajectories, which enables us to model residual physics with greater fidelity. We analyze the improved simulation fidelity by comparing the simulation trajectories against the real-world ones. We finally demonstrate that the improved simulator allows us to learn better walking and turning policies that can be successfully deployed on the hardware of BALLU.
|
|
09:42-09:48, Paper MoAT4.13 | Add to My Program |
DiffClothAI: Differentiable Cloth Simulation with Intersection-Free Frictional Contact and Differentiable Two-Way Coupling with Articulated Rigid Bodies |
|
Yu, Xinyuan | National University of Singapore |
Zhao, Siheng | Nanjing University |
Luo, Siyuan | Xi'an Jiaotong University |
Yang, Gang | National University of Singapore |
Shao, Lin | National University of Singapore |
Keywords: Simulation and Animation, Optimization and Optimal Control
Abstract: Differentiable Simulations have recently proven useful for various robotic manipulation tasks, including cloth manipulation. In robotic cloth simulation, it is crucial to maintain intersection-free properties. We present DiffClothAI, a differentiable cloth simulation with intersection-free friction contact and two-way coupling with articulated rigid bodies. DiffClothAI integrates the Project Dynamics and Incremental Potential Contact coherently and proposes an effective method to derive gradients in the Cloth Simulation. It also establishes the differentiable coupling mechanism between articulated rigid bodies and cloth. We conduct a comprehensive evaluation of DiffClothAI’s effectiveness and accuracy and perform a variety of experiments in downstream robotic manipulation tasks. Supplemental materials and videos are available on our project webpage.
|
|
09:48-09:54, Paper MoAT4.14 | Add to My Program |
Walk-Burrow-Tug: Legged Anchoring Analysis Using RFT-Based Granular Limit Surfaces |
|
Huh, Tae Myung | UC Berkeley |
Cao, Cyndia | University of California Berkeley |
Aderibigbe, Jadesola | University of California, Berkeley |
Moon, Deaho | Korea Institute of Science and Technology |
Stuart, Hannah | UC Berkeley |
Keywords: Contact Modeling, Legged Robots, Mobile Manipulation
Abstract: We develop a new resistive force theory based granular limit surface (RFT-GLS) method to predict and guide behaviors of forceful ground robots. As a case study, we harness a small mobile robotic system – MiniRQuad (296g) – to ‘walk-burrow-tug;’ it actively exploits ground anchoring by burrowing its legs to tug loads. RFT-GLS informs the selection of efficient strategies to transport sleds with varying masses. The granular limit surface (GLS), a wrench boundary that separates stationary and kinetic behavior, is computed using 3D resistive force theory (RFT) for a given body and set of motion twists. This limit surface is then used to predict the quasi-static trajectory of the robot when it fails to withstand an external load. We find that the RFT-GLS enables accurate force and motion predictions in laboratory tests. For control applications, a pre-composed state space map of the twist-wrench pairs enables computationally efficient simulations to improve robotic anchoring strategies.
|
|
MoAT5 Regular session, 140E |
Add to My Program |
Mechanism Design I |
|
|
Chair: Tadakuma, Kenjiro | Tohoku University |
Co-Chair: Sorokin, Maks | Georgia Institute of Technology |
|
08:30-08:36, Paper MoAT5.1 | Add to My Program |
Tube Mechanism with 3-Axis Rotary Joints Structure to Achieve Variable Stiffness Using Positive Pressure |
|
Onda, Issei | Tohoku University |
Watanabe, Masahiro | Tohoku University |
Tadakuma, Kenjiro | Tohoku University |
Abe, Kazuki | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Mechanism Design, Hydraulic/Pneumatic Actuators, Flexible Robotics
Abstract: Studies on soft robotics have explored mechanisms for switching the stiffness of a robot structure. The hybrid soft-rigid approach, which combines soft materials and high-rigidity structures, is commonly used to achieve variable stiffness mechanisms. In particular, the positive-pressurization method has attracted significant attention in recent years as it can eliminate the constraints on driving pressure. Moreover, it can change the shape holding force according to internal pressure. In this study, a variable stiffness mechanism, comprising 3-axis rotary ball joints and a single chamber, was devised via frictional force using positive pressure. The prototype can change joint angles arbitrarily when no pressure is applied and can hold joint angles when positive pressure is applied. Using a theoretical model of the torque required to hold the joint angle, we simulated the holding torque using finite element modeling analysis and measured the holding torque in the pitch and roll directions when internal pressure was applied. Based on the interaction of the theoretical model, measurement, and FEM analysis, it was confirmed that the value of the holding torque in the roll direction was approximately π/2 times larger than that in the pitch direction for each value of the internal pressure. Further, we evaluated the FEM value, theoretical value, and measured value of the holding torque by performing pairwise numerical comparisons. Our approach will aid the design of effective stiffening mechanisms for soft robotics applications.
|
|
08:36-08:42, Paper MoAT5.2 | Add to My Program |
Timor Python: A Toolbox for Industrial Modular Robotics |
|
Külz, Jonathan | Technical University of Munich |
Mayer, Matthias | Technical University of Munich |
Althoff, Matthias | Technische Universität München |
Keywords: Cellular and Modular Robots, Methods and Tools for Robot System Design, Software Tools for Robot Programming
Abstract: Modular Reconfigurable Robots (MRRs) represent an exciting path forward for industrial robotics, opening up new possibilities for robot design. Compared to monolithic manipulators, they promise greater flexibility, improved maintainability, and cost-efficiency. However, there is no tool or standardized way to model and simulate assemblies of modules in the same way it has been done for robotic manipulators for decades. We introduce the Toolbox for Industrial Modular Robotics (Timor), a Python toolbox to bridge this gap and integrate modular robotics into existing simulation and optimization pipelines. Our open-source library offers model generation and task-based configuration optimization for MRRs. It can easily be integrated with existing simulation tools – not least by offering URDF export of arbitrary modular robot assemblies. Moreover, our experimental study demonstrates the effectiveness of Timor as a tool for designing modular robots optimized for specific use cases.
|
|
08:42-08:48, Paper MoAT5.3 | Add to My Program |
Ultra-Low Inertia 6-DOF Manipulator Arm for Touching the World |
|
Nishii, Kazutoshi | Toyota Motor Corporation |
Okumatsu, Yohishiro | Toyota Motor Corporation |
Hatano, Akira | Toyota Motor Corporation |
Keywords: Mechanism Design, Tendon/Wire Mechanism
Abstract: As robotic intelligence increases, so does the importance of agents that collect data from real-world environments. When learning in contact with the environment, one must consider how to minimize the impact on the environment and maintain reproducibility. To achieve this, the contact force with the environment must be reduced. One way to achieve this is to reduce the inertia of the arm. In this study, we present an arm we have developed with 6 degrees of freedom and low inertia. The inertia of our arm has been significantly reduced compared to previous research, and experiments have confirmed that it also has low joint friction torque and good contact sensitivity.
|
|
08:48-08:54, Paper MoAT5.4 | Add to My Program |
Determination of the Characteristics of Gears of Robot-Like Systems by Analytical Description of Their Structure |
|
Landler, Stefan | Technical University of Munich |
Molina Blanco, Raúl | Technical University of Munich |
Otto, Michael | Technical University of Munich, Chair of Machine Elements, Gear |
Vogel-Heuser, Birgit | Technical University Munich |
Zimmermann, Markus | Technical University of Munich |
Stahl, Karsten | Technical University of Munich |
Keywords: Methods and Tools for Robot System Design, Product Design, Development and Prototyping, Engineering for Robotic Systems
Abstract: The axes of robots and robot-like systems (RLS) usually include e-motor-gearbox-arrangements for optimal connection of the elements. The characteristics of the drive system and thus also of the robot depend strongly on the gears. Different gearbox designs are available which differ in stiffness, efficiency and further properties. For an application-optimal design of RLS a uniform documentation and a comparability of gearbox concepts is a decisive factor. The application-optimal design is supported by an interdisciplinary approach between mechanical engineering and software design, guided by adequate product development methodology. The quite heterogeneous characterization of gearboxes for RLS which is currently the state of the art is a relevant obstacle in the flexible and optimal design of RLS. The paper shows the analysis of the gear structure with unified symbols for specific machine elements and contact types. The introduced method gives insight into the mechanical structure of the gearboxes. Similarities between gear types can thus be revealed. This also enables the classification of new developments in the state of the art. Moreover, the developed method for analyzing the gear structure can be used to determine the characteristics of gears. Examples for these characteristics are backlash, efficiency or stiffness. Specifically, the stiffness of gears can be synthesized by the force action of individual contacts and the individual phenomena that occur with them. The representation by individual phenomena also makes it possible to extend the calculation to include influencing parameters such as temperature that have not been sufficiently taken into account so far.
|
|
08:54-09:00, Paper MoAT5.5 | Add to My Program |
Tension Jamming for Deployable Structures |
|
Hasegawa, Daniel | Harvard University |
Aktas, Buse | ETH Zurich |
Howe, Robert D. | Harvard University |
Keywords: Mechanism Design, Compliant Joints and Mechanisms, Soft Robot Materials and Design
Abstract: Deployable structures provide adaptability and versatility for applications such as temporary architectures, space structures, and biomedical devices. Jamming is a mechanical phenomenon with which dramatic changes in stiffness can be achieved by increasing the frictional and kinematic coupling between constituents in a structure by applying an external pressure. This study applies jamming, which has been primarily used in medium-scale soft robotics applications to large-scale deployable structures with components that are soft and compact during transport, but rigid upon deployment. It proposes a new jamming structure with a novel built-in actuation mechanism which enables high-performance at large scales: a composite beam made of rectangular segments along a cable which can be pre-tensioned and thus jammed. Two theoretical models are developed to provide insights into the mechanical behavior of the composite beams and predict their performance under loading. A scale model of a deployable bridge is built using the tension-based composite beams, and the bridge is deployed and assembled by air with a drone demonstrating the versatility and viability of the proposed approach for robotics applications.
|
|
09:00-09:06, Paper MoAT5.6 | Add to My Program |
Task2Morph: Differentiable Task-Inspired Framework for Contact-Aware Robot Design |
|
Cai, Yishuai | National University of Defense Technology |
Yang, Shaowu | National University of Defense Technology |
Li, Minglong | National University of Defense Technology |
Chen, Xinglin | National University of Defense Technology |
Mao, Yunxin | National University of Defense Technology |
Yi, Xiaodong | National University of Defense Technology |
Yang, Wenjing | State Key Laboratory of High Performance Computing (HPCL), Schoo |
Keywords: Evolutionary Robotics, AI-Enabled Robotics
Abstract: Optimizing the morphologies and the controllers that adapt to various tasks is a critical issue in the field of robot design, aka. embodied intelligence. Previous works typically model it as a joint optimization problem and use search-based methods to find the optimal solution in the morphology space. However, they ignore the implicit knowledge of task-to-morphology mapping which can directly inspire robot design. For example, flipping heavier boxes tends to require more muscular robot arms. This paper proposes a novel and general differentiable task-inspired framework for contact-aware robot design called Task2Morph. We abstract task features highly related to task performance and use them to build a task-to-morphology mapping. Further, we embed the mapping into a differentiable robot design process, where the gradient information is leveraged for both the mapping learning and the whole optimization. The experiments are conducted on three scenarios, and the results validate that Task2Morph outperforms DiffHand, which lacks a task-inspired morphology module, in terms of efficiency and effectiveness.
|
|
09:06-09:12, Paper MoAT5.7 | Add to My Program |
Constraint Programming for Component-Level Robot Design |
|
Wilhelm, Andrew | Cornell University |
Napp, Nils | Cornell University |
Keywords: Methods and Tools for Robot System Design, Formal Methods in Robotics and Automation, Product Design, Development and Prototyping
Abstract: Effective design automation for building robots would make development faster and easier while also less prone to design errors. However, complex multi-domain constraints make creating such tools difficult. One persistent challenge in achieving this goal of design automation is the fundamental problem of component selection, an optimization problem where, given a general robot model, components must be selected from a possibly large set of catalogs to minimize design objectives while meeting target specifications. Different approaches to this problem have used Monotone Co-Design Problems (MCDPs) or linear and quadratic programming, but these require judicious system approximations that affect the accuracy of the solution. We take an alternative approach formulating the component selection problem as a combinatorial optimization problem, which does not require any system approximations, and using constraint programming (CP) to solve this problem with a depth-first branch-and-bound algorithm. As the efficacy of CP critically depends upon the orderings of variables and their domain values, we present two heuristics specific to the problem of component selection that significantly improve solve time compared to traditional constraint satisfaction programming heuristics. We also add redundant constraints to the optimization problem to further improve run time by evaluating certain global constraints before all relevant variables are assigned. We demonstrate that our CP approach can find optimal solutions from over 20 trillion candidate solutions in only seconds, up to 48 times faster than an MCDP approach solving the same problem. Finally, for three different robot designs we build the corresponding robots to physically validate that the selected components meet the target design specifications.
|
|
09:12-09:18, Paper MoAT5.8 | Add to My Program |
Design and Implementation of a Two-Limbed 3T1R Haptic Device |
|
Kang, Long | Nanjing University of Science and Technology |
Yang, Yang | Nanjing University of Information Science and Technology |
Yi, Byung-Ju | Hanyang University |
Keywords: Mechanism Design, Haptics and Haptic Interfaces, Parallel Robots
Abstract: This paper presents a haptic device with a simple architecture of only two limbs that can provide translational motion in three degrees of freedom (DOF) and one-DOF rotational motion. Actuation redundancy eliminates all forward-kinematic singularities and improves the motion-force transmission property. Thanks to the special structure of the kinematic chains, all actuators are close to the base and full gravity compensation is achieved passively by using springs. Force producibility analysis shows that this haptic device is able to produce long-term continuous force feedback of 15–30 N in each direction. By developing a prototype of the haptic device and a virtual three-dimensional simulator, a preliminary performance evaluation of the haptic device was conducted. In addition, a torque distribution algorithm considering a relaxed form of actuator-torque saturation was experimentally evaluated, and a comparison with other algorithms reveals that this algorithm offers several advantages.
|
|
09:18-09:24, Paper MoAT5.9 | Add to My Program |
Combining Measurement Uncertainties with the Probabilistic Robustness for Safety Evaluation of Robot Systems |
|
Baek, Woo-Jeong | Karlsruhe Institute of Technology (KIT) |
Ledermann, Christoph | Karlsruhe Institute of Technology |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Kroeger, Torsten | Karlsruher Institut Für Technologie (KIT) |
Keywords: Methods and Tools for Robot System Design, Robot Safety, Probability and Statistical Methods
Abstract: In this paper, we present a method to engage measurement uncertainties with the probabilistic robustness to one system uncertainty measure. Providing a metric indicating the potential occurrence of dangerous situations is highly essential for safety-critical robot applications. Due to the difficulty of finding a quantifiable, unambiguous representation however, such a metric has not been derived to date. In case of sensory devices, measurement uncertainties are usually provided by manufacturer specifications. Apart from that, several contributions demonstrate that the accuracy of neural networks is verifiable via the robustness. However, state-of-the-art literature is mainly concerned with theoretical investigations such that scarce attention has been devoted to the transfer of the robustness to real-world applications. To fill this gap, we show how the probabilistic robustness can be made useful for evaluating quantitative safety limits. Our key idea is to exploit the analogy between measurement uncertainties and the probabilistic robustness: While measurement uncertainties reflect possible shifts due to technical limitations, the robustness refers to the tolerated amount of distortions in the input data for an unaltered output. Inspired by this analogy, we combine both measures to quantify the system uncertainty online. We validate our method in different settings under real-world conditions. Our findings exemplify that incorporating the novel uncertainty metric effectively prevents the rate of dangerous situations in Human-Robot Collaboration.
|
|
09:24-09:30, Paper MoAT5.10 | Add to My Program |
Computational Design of Closed-Chain Linkages: Respawn Algorithm for Generative Design |
|
Ivolga, Dmitriy | ITMO University |
Nasonov, Kirill | ITMO University |
Borisov, Ivan | ITMO University |
Kolyubin, Sergey | ITMO University |
Keywords: Mechanism Design, Legged Robots, Grippers and Other End-Effectors
Abstract: Designing robots is a multiphase process aimed at solving a multi-criteria optimization problem to find the best possible detailed design. Generative design (GD) aims to accelerate the design process compared to manual design, since GD allows exploring and exploiting the vast design space more efficiently. In the field of robotics, however, relevant research focuses mostly on the generation of fully-actuated open chain kinematics, which is trivial in mechanical engineering perspective. Within this paper, we address the problem of generative design of closed-chain linkage mechanisms. A GD algorithm has to be able to generate meaningful mechanisms which satisfy conditions of existence. We propose an optimization-driven algorithm for generation of planar closed-chain linkages to follow a predefined trajectory. The algorithm creates an unlimited range of physically reproducible design alternatives that can be further tested in simulation. These tests could be done in order to find solutions that satisfy extra criteria, e.g., desired dynamic behavior or low energy consumption. The proposed algorithm is called "respawn" since it builds a new linkage after the ancestor has been tested in a virtual environment in pursuit for the optimal solution. To show that the algorithm is general enough, we show a set of generated linkages that can be used for a wide class of robots.
|
|
09:30-09:36, Paper MoAT5.11 | Add to My Program |
On Designing a Learning Robot: Improving Morphology for Enhanced Task Performance and Learning |
|
Sorokin, Maks | Georgia Institute of Technology |
Fu, Chuyuan | X, the Moonshot Factory |
Tan, Jie | Google |
Liu, Karen | Stanford University |
Bai, Yunfei | Google X |
Lu, Wenlong | Everyday Robots, X the Moonshot Factory |
Ha, Sehoon | Georgia Institute of Technology |
Khansari, Mohi | Google X |
Keywords: Mechanism Design, Visual Learning, Evolutionary Robotics
Abstract: As robots become more prevalent, optimizing their design for better performance and efficiency is becoming increasingly important. However, current robot design practices overlook the impact of perception and design choices on a robot's learning capabilities. To address this gap, we propose a comprehensive methodology that accounts for the interplay between the robot's perception, hardware characteristics, and task requirements. Our approach optimizes the robot's morphology holistically, leading to improved learning and task execution proficiency. To achieve this, we introduce a Morphology-AGnostIc Controller (MAGIC), which helps with the rapid assessment of different robot designs. The MAGIC policy is efficiently trained through a novel PRIvileged Single-stage learning via latent alignMent (PRISM) framework, which also encourages behaviors that are typical of robot onboard observation. Our simulation-based results demonstrate that morphologies optimized holistically improve the robot performance by 15-20% on various manipulation tasks, and require 25x less data to match human-expert made morphology performance. In summary, our work contributes to the growing trend of learning-based approaches in robotics and emphasizes the potential in designing robots that facilitate better learning.
|
|
09:36-09:42, Paper MoAT5.12 | Add to My Program |
Development of a Dynamic Quadruped with Tunable, Compliant Legs |
|
Chen, Fuchen | Arizona State University |
Tao, Weijia | Arizona State University |
Aukes, Daniel | Arizona State University |
Keywords: Mechanism Design, Compliant Joints and Mechanisms, Legged Robots
Abstract: To facilitate the study of how passive leg stiffness influences locomotion dynamics and performance, we have developed an affordable and accessible 400 g quadruped robot driven by tunable compliant laminate legs, whose series and parallel stiffness can be easily adjusted; fabrication only takes 2.5 hours for all four legs. The robot can trot at 0.52 m/s or 4.4 body lengths per second with a 3.2 cost of transport (COT). Through locomotion experiments in both the real world and simulation we demonstrate that legs with different stiffness have an obvious impact on the robot’s average speed, COT, and pronking height. When the robot is trotting at 4 Hz in the real world, changing the leg stiffness yields a maximum improvement of 37.1% in speed and 62.0% in COT, showing its great potential for future research on locomotion controller designs and leg stiffness optimizations.
|
|
09:42-09:48, Paper MoAT5.13 | Add to My Program |
A Passive Compliance Obstacle Crossing Robot for Power Line Inspection and Maintenance |
|
Chen, Minghao | Institute of Automation, Chinese Academy of Sciences |
Cao, Yinghua | Institute of Automation,Chinese Academy of Sciences |
Tian, Yunong | Institute of Automation, Chinese Academy of Sciences |
Li, En | Institute of Automation, Chinese Academy of Sciences |
Liang, Zize | Institute of Automation, Chinese Academy of Sciences |
Tan, Min | Institute of Automation, Chinese Academy of Sciences |
Keywords: Mechanism Design, Industrial Robots, Engineering for Robotic Systems
Abstract: In scenarios of the overhead power line system, manual methods are inefficient and unsafe. Meanwhile, the majority of cantilevered robots have poor efficiency when crossing obstacles. This paper proposes a novel power line inspection and maintenance robot to solve these problems. The robot employs a passive compliance obstacle-crossing principle, which could rapidly cross obstacles with the cooperation of gas springs and climbing wheels. Under high payload, the robot could take 5-15 seconds without any complex strategies to roll over obstacles. A variable configuration platform is also designed, which has a multiple line mode and a single line mode. It makes the robot suitable for different kinds of overhead power lines. Meanwhile, the related adaptability analyses are presented. Manipulators are also installed to help the robot perform specific maintenance tasks. The results of lab experiments and field tests reveal that the robot could stably and rapidly cross obstacles, such as suspension clamps, vibration dampers, and spacers, and could perform three kinds of maintenance tasks on the line.
|
|
09:48-09:54, Paper MoAT5.14 | Add to My Program |
Open Robot Hardware: Progress, Benefits, Challenges, and Best Practices (I) |
|
Patel, Vatsal | Yale University |
Liarokapis, Minas | The University of Auckland |
Dollar, Aaron | Yale University |
Keywords: Methods and Tools for Robot System Design, Product Design, Development and Prototyping, Mechanism Design
Abstract: Open-source projects have seen widespread adoption and improved availability in robotics over recent years. The rapid pace of progress in robotics is in part fueled by open-source projects, allowing researchers to implement novel ideas and approaches quickly. Open-source hardware in particular lowers the barrier of entry to new technologies, and can further accelerate innovation in robotics. But it is also more difficult to propagate in comparison to software because it requires replicating physical components. We present a review on Open Robot Hardware (ORH), by first highlighting key benefits and challenges encountered by users and developers of ORH, and relaying some best practices that can be adopted in developing an ORH. Then, we survey over 60 major ORH works in the different domains within robotics. Lastly, we identify strategies exemplified by the surveyed works to further detail the development process and guide developers through the design, documentation, and dissemination stages of an ORH project.
|
|
MoAT6 Regular session, 140FG |
Add to My Program |
Modeling, Control, and Learning for Soft Robots I |
|
|
Chair: Gillespie, Brent | University of Michigan |
Co-Chair: Karydis, Konstantinos | University of California, Riverside |
|
08:30-08:36, Paper MoAT6.1 | Add to My Program |
Modelling of Tendon Driven Robot Based on Constraint Analysis and Pseudo-Rigid Body Model |
|
Troeung, Charles | Monash University |
Liu, Shaotong | Monash University |
Chen, Chao | Monash University |
Keywords: Modeling, Control, and Learning for Soft Robots, Tendon/Wire Mechanism, Soft Robot Applications
Abstract: Quasi-static models of tendon-driven continuum robots (TDCR) require consideration of both the kinematic and static conditions simultaneously. While the Pseudo-Rigid Body (PRB-3R) model has been demonstrated to be efficient, existing works ignore the mechanical effect of the tendons such as elongation. In addition, the static equilibrium equations for the partially constrained tendons have been expressed in different forms within the literature. This leads to inconsistent simulation results which have not been validated by experimental data when external loads are applied. Furthermore, the inverse problem for solving the required inputs for a prescribed end effector pose has not been studied for the PRB-3R model. In this work, we introduce a new modelling approach based on constraint analysis (CA) of a multi-body system and Lagrange multipliers to systematically derive all the relevant governing equations required for a planar TDCR. This method can include tendon mechanics and efficiently solve for the direct and inverse kinetostatic models with either forces or displacements as the actuation inputs. We validate the proposed CA method using numerical simulation of a benchmark model and experimental data.
|
|
08:36-08:42, Paper MoAT6.2 | Add to My Program |
An Improved Koopman-MPC Framework for Data-Driven Modeling and Control of Soft Actuators |
|
Wang, Jiajin | Southeast University |
Xu, Baoguo | Southeast University |
Lai, Jianwei | Southeast University |
Wang, Yifei | Southeast University |
Hu, Cong | Guilin University of Electronic Technology |
Li, Huijun | Southeast University |
Song, Aiguo | Southeast University |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators
Abstract: The challenge of achieving precise control of soft actuators with strong nonlinearity is mainly due to the difficulty of deriving models suitable for model-based control techniques. Fortunately, Koopman operator provides a data-driven method for constructing control-oriented models of nonlinear systems to achieve model predictive control (MPC). It is called the Koopman-MPC framework, which is theoretically effective for soft actuators. Nevertheless, in this framework, a critical challenge is to select correct basis functions for Koopman-based modeling. Furthermore, there is room for improvement in control performance. To overcome these problems, this letter presents an improved Koopman-MPC framework to efficiently implement model-based control techniques for soft actuators. Firstly, we propose a systematic method for selecting the basis functions, which extends the measurement coordinates with derivative and time-delay coordinates and uses the spares identification of nonlinear dynamics (SINDy) algorithm. Secondly, an incremental model predictive control with dynamic constraints (IMPCDC) is developed based on the Koopman model. Finally, several comparative experiments are conducted to verify the utility of the improved Koopman-MPC framework for data-driven modeling and control of soft actuators.
|
|
08:42-08:48, Paper MoAT6.3 | Add to My Program |
Soft Robot Shape Estimation: A Load-Agnostic Geometric Method |
|
Sorensen, Christian | Brigham Young University |
Killpack, Marc | Brigham Young University |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators, Soft Robot Applications
Abstract: In this paper we present a novel kinematic representation of a soft continuum robot to enable full shape estimation using a purely geometric solution. The kinematic representation involves using length varying piecewise constant curvature segments to describe the deformed shape of the robot. Based on this kinematic representation, we can use overlapping length sensors to estimate the shape of continuously deformable bodies without prior knowledge of the current loading conditions. We show an implementation that assumes one change in curvature along the length of a joint, using string potentiometers as an arc length sensor, and an orientation measurement from the tip of the continuum joint. For 56 randomized joint configurations, we estimate the shape of a 250 mm long continually deformable robot with less then 2.5 mm of average error. The average error is reported for each of the 10 different equally spaced points along the length, demonstrating the ability to accurately represent the full shape of the soft robot.
|
|
08:48-08:54, Paper MoAT6.4 | Add to My Program |
Robust Generalized Proportional Integral Control for Trajectory Tracking of Soft Actuators in a Pediatric Wearable Assistive Device |
|
Mucchiani, Caio | University of California Riverside |
Liu, Zhichao | University of California, Riverside |
Sahin, Ipsita | University of California, Riverside |
Kokkoni, Elena | University of California, Riverside |
Karydis, Konstantinos | University of California, Riverside |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Wearable Robotics
Abstract: Soft robotics hold promise in the development of safe yet powered assistive wearable devices for infants. Key to this is the development of closed-loop controllers that can help regulate pneumatic pressure in the device's actuators in an effort to induce controlled motion at the user's limbs and be able to track different types of trajectories. This work develops a controller for soft pneumatic actuators aimed to power a pediatric soft wearable robotic device prototype for upper extremity motion assistance. The controller tracks desired trajectories for a system of soft pneumatic actuators supporting two-degree-of-freedom shoulder joint motion on an infant-sized engineered mannequin. The degrees of freedom assisted by the actuators are equivalent to shoulder motion (abduction/adduction and flexion/extension). Embedded inertial measurement unit sensors provide real-time joint feedback. Experimental data from performing reaching tasks using the engineered mannequin are obtained and compared against ground truth to evaluate the performance of the developed controller. Results reveal the proposed controller leads to accurate trajectory tracking performance across a variety of shoulder joint motions.
|
|
08:54-09:00, Paper MoAT6.5 | Add to My Program |
Data-Efficient Online Learning of Ball Placement in Robot Table Tennis |
|
Tobuschat, Philip | Max Planck Institue for Intelligent Systems, Tübingen |
Ma, Hao | Max Planck Institute for Intelligent Systems |
Büchler, Dieter | Max Planck Institute for Intelligent Systems Tübingen |
Schölkopf, Bernhard | Max Planck Institute for Intelligent Systems |
Muehlebach, Michael | ETH |
Keywords: Modeling, Control, and Learning for Soft Robots, Bioinspired Robot Learning, Machine Learning for Robot Control
Abstract: We present an implementation of an online optimization algorithm for hitting a predefined target when returning ping-pong balls with a table tennis robot. The online algorithm optimizes over so-called interception policies, which define the manner in which the robot arm intercepts the ball. In our case, these are composed of the state of the robot arm (position and velocity) at interception time. Gradient information is provided to the optimization algorithm via the mapping from the interception policy to the landing point of the ball on the table, which is approximated with a black-box and a grey-box approach. Our algorithm is applied to a robotic arm with four degrees of freedom that is driven by pneumatic artificial muscles. As a result, the robot arm is able to return the ball onto any predefined target on the table after about 2-5 iterations. We highlight the robustness of our approach by showing rapid convergence with both the black-box and the grey-box gradients. In addition, the small number of iterations required to reach close proximity to the target also underlines the sample efficiency. A demonstration video can be found here: https://youtu.be/VC3KJoCss0k.
|
|
09:00-09:06, Paper MoAT6.6 | Add to My Program |
Learning Reduced-Order Soft Robot Controller |
|
Liang, Chen | Zhejiang University |
Gao, Xifeng | Tencent America |
Wu, Kui | Tencent |
Pan, Zherong | Tencent America |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Optimization and Optimal Control
Abstract: Deformable robots are notoriously difficult to model or control due to its high-dimensional configuration spaces. Direct trajectory optimization suffers from the curse-of-dimensionality and incurs a high computational cost, while learning-based controller optimization methods are sensitive to hyper-parameter tuning. To overcome these limitations, we hypothesize that high fidelity soft robots can be both simulated and controlled by restricting to low-dimensional spaces. Under such assumption, we propose a two-stage algorithm to identify such simulation- and control-spaces. Our method first identifies the so-called simulation-space that captures the salient deformation modes, to which the robot's governing equation is restricted. We then identify the control-space, to which control signals are restricted. We propose a multi-fidelity Riemannian Bayesian bilevel optimization to identify task-specific control spaces. We show that the dimension of control-space can be less than 10 for a high-DOF soft robot to accomplish walking and swimming tasks, allowing low-dimensional MPC controllers to be applied to soft robots with tractable computational complexity.
|
|
09:06-09:12, Paper MoAT6.7 | Add to My Program |
A Single-Parameter Model for Soft Bellows Actuators under Axial Deformation and Loading |
|
Treadway, Emma | Trinity University |
Brei, Melissa | University of Michigan |
Sedal, Audrey | McGill University |
Gillespie, Brent | University of Michigan |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators, Hydraulic/Pneumatic Actuators
Abstract: Soft fluidic actuators are becoming popular for their backdrivability, potential for high power density, and their support for power supply through flexible tubes. Control and design of such actuators requires serviceable models that describe how they relate fluid pressure and flow to mechanical force and motion. We present a simple 2-port model of a bellows actuator that accounts for the relationships among fluid and mechanical variables imposed by the kinematics of the deforming bellows structure and accounts for elastic energy stored in the actuator’s thermoplastic material structure. Elastic energy storage due to axial deformation is captured by revolving a differential strip whose linear elastic behavior is a nonlinear function of the actuator length. The model is evaluated through experiments in which either actuator length and pressure or force and pressure are imposed. The model has an error of 9.8% of the force range explored and yields insight into the effects of geometry changes. The resulting model can be used for model-based control or actuator design across the full operating range and can be exercised under either imposed force or imposed actuator length.
|
|
09:12-09:18, Paper MoAT6.8 | Add to My Program |
Task and Configuration Space Compliance of Continuum Robots Via Lie Group and Modal Shape Formulations |
|
Orekhov, Andrew | Carnegie Mellon University |
Johnston, Garrison | Vanderbilt University |
Simaan, Nabil | Vanderbilt University |
Keywords: Modeling, Control, and Learning for Soft Robots, Kinematics, Flexible Robotics
Abstract: Continuum robots suffer large deflections due to internal and external forces. Accurate modeling of their passive compliance is necessary for accurate environmental interaction, especially in scenarios where direct force sensing is not practical. This paper focuses on deriving analytic formulations for the compliance of continuum robots that can be modeled as Kirchhoff rods. Compared to prior works, the approach presented herein is not subject to the constant-curvature assumptions to derive the configuration space compliance, and we do not rely on computationally-expensive finite difference approximations to obtain the task space compliance. Using modal approximations over curvature space and Lie group integration, we obtain closed-form expressions for the task and configuration space compliance matrices of continuum robots, thereby bridging the gap between constant-curvature analytic formulations of configuration space compliance and variable curvature task space compliance. We first present an analytic expression for the compliance of a single Kirchhoff rod. We then extend this formulation for computing both the task space and configuration space compliance of a tendon-actuated continuum robot. We then use our formulation to study the tradeoffs between computation cost and modeling accuracy as well as the loss in accuracy from neglecting the Jacobian derivative term in the compliance model. Finally, we experimentally validate the model on a tendon-actuated continuum segment, demonstrating the model's ability to predict passive deflections with error below 11.5% percent of total arc length.
|
|
09:18-09:24, Paper MoAT6.9 | Add to My Program |
A Localization Framework for Boundary Constrained Soft Robots |
|
Tanaka, Koki | Illinois Institute of Technology |
Zhou, Qiyuan | Illinois Institute of Technology |
Srivastava, Ankit | Illinois Institute of Technology |
Spenko, Matthew | Illinois Institute of Technology |
Keywords: Modeling, Control, and Learning for Soft Robots, Localization, Soft Robot Applications
Abstract: Soft robots possess unique capabilities for adapting to the environment and interacting with it safely. However, their deformable nature also poses challenges for controlling their movement. In particular, the large deformations of a soft robot make it difficult to localize its individual body parts, which in turn impedes effective control. This paper introduces a novel localization framework designed for soft robots that are constrained by boundaries and benefit from unique hardware architecture. To this end, we propose a method that exploits the flexible boundaries of the robot to create an onboard sensor capable of measuring the relative distances between its sub-robots. This measurement data is incorporated into a linear Kalman filter for accurate localization. We evaluate the framework's performance in benchmark and dynamic cases and demonstrate its effectiveness in improving localization accuracy compared to an IMU-based approach. The results also show that the proposed method achieves sufficient localization accuracy for contact-based mapping, enabling the robot to sense the location of obstacles in the environment. Finally, we validate the proposed framework using a physical prototype of a boundary-constrained soft robot and demonstrate its ability to accurately estimate the robot's shape. This framework has the potential to enable soft robots to autonomously navigate and map unknown environments, which could be beneficial for a variety of exploration tasks.
|
|
09:24-09:30, Paper MoAT6.10 | Add to My Program |
EViper: A Scalable Platform for Untethered Modular Soft Robots |
|
Cheng, Hsin | Princeton University |
Zheng, Zhiwu | Princeton University |
Kumar, Prakhar | Princeton University |
Afridi, Wali | Ithaca Senior High School |
Kim, Ben | Princeton University |
Wagner, Sigurd | Princeton University |
Verma, Naveen | Princeton University |
Sturm, James | Princeton University |
Chen, Minjie | Princeton University |
Keywords: Modeling, Control, and Learning for Soft Robots
Abstract: Soft robots present unique capabilities, but have been limited by the lack of scalable technologies for construction and the complexity of algorithms for efficient control and motion. These depend on soft-body dynamics, high-dimensional actuation patterns, and external/onboard forces. This paper presents scalable methods and platforms to study the impact of weight distribution and actuation patterns on fully untethered modular soft robots. An extendable Vibrating Intelligent Piezo-Electric Robot (eViper), together with an open-source Simulation Framework for Electroactive Robotic Sheet (SFERS) implemented in PyBullet, was developed as a platform to analyze the complex weight-locomotion interaction. By integrating power electronics, sensors, actuators, and batteries onboard, the eViper platform enables rapid design iteration and evaluation of different weight distribution and control strategies for the actuator arrays. The design supports both physics-based modeling and data-driven modeling via onboard automatic data-acquisition capabilities. We show that SFERS can provide useful guidelines for optimizing the weight distribution and actuation patterns of the eViper, thereby achieving maximum speed or minimum cost of transport (COT).
|
|
09:30-09:36, Paper MoAT6.11 | Add to My Program |
Domain Randomization for Robust, Affordable and Effective Closed-Loop Control of Soft Robots |
|
Tiboni, Gabriele | Politecnico Di Torino |
Protopapa, Andrea | Politecnico Di Torino |
Tommasi, Tatiana | Politecnico Di Torino |
Averta, Giuseppe | Politecnico Di Torino |
Keywords: Modeling, Control, and Learning for Soft Robots, Reinforcement Learning
Abstract: Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability. However, the potentially infinite number of Degrees of Freedom makes their modeling a daunting task, and in many cases only an approximated description is available. This challenge makes reinforcement learning (RL) based approaches inefficient when deployed on a realistic scenario, due to the large domain gap between models and the real platform. In this work, we demonstrate, for the first time, how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots with: i) robustness w.r.t. unknown dynamics parameters; ii) reduced training times by exploiting drastically simpler dynamic models for learning; iii) better environment exploration, which can lead to exploitation of environmental constraints for optimal performance. Moreover, we introduce a novel algorithmic extension of previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects. We provide an extensive evaluation in simulation on four different tasks and two soft robot designs, opening interesting perspectives for future research on Reinforcement Learning for closed-loop soft robot control.
|
|
09:36-09:42, Paper MoAT6.12 | Add to My Program |
Implementation of a Cosserat Rod-Based Configuration Tracking Controller on a Multi-Segment Soft Robotic Arm |
|
Doroudchi, Azadeh | Arizona State University |
Qiao, Zhi | ASU |
Zhang, Wenlong | Arizona State University |
Berman, Spring | Arizona State University |
Keywords: Modeling, Control, and Learning for Soft Robots, Motion Control, Distributed Robot Systems
Abstract: Controlling soft continuum robotic arms is challenging due to their hyper-redundancy and dexterity. In this paper we experimentally demonstrate, for the first time, closed-loop control of the configuration space variables of a soft robotic arm, composed of independently controllable segments, using a Cosserat rod model of the robot and the distributed sensing and actuation capabilities of the segments. Our controller solves the inverse dynamic problem by simulating the Cosserat rod model in MATLAB using a computationally efficient numerical solution scheme, and it applies the computed control output to the actual robot in real time. The position and orientation of the tip of each segment are measured in real time, while the remaining unknown variables that are needed to solve the inverse dynamics are estimated simultaneously in the simulation. We implement the controller on a multi-segment silicone robotic arm with pneumatic actuation, using a motion capture system to measure the segments' positions and orientations. The controller is used to reshape the arm into configurations that are achieved through combinations of bending and extension deformations in 3D space. Although the possible deformations are limited for this robot platform, our study demonstrates the potential for implementing the control approach on a wide range of continuum robots in practice. The resulting tracking performance indicates the effectiveness of the controller and the accuracy of the simulated Cosserat rod model.
|
|
09:42-09:48, Paper MoAT6.13 | Add to My Program |
Closed Loop Static Control of Multi-Magnet Soft Continuum Robots |
|
Pittiglio, Giovanni | Harvard University |
Orekhov, Andrew | Carnegie Mellon University |
da Veiga, Tomas | University of Leeds |
Calò, Simone | University of Leeds |
Chandler, James Henry | University of Leeds |
Simaan, Nabil | Vanderbilt University |
Valdastri, Pietro | University of Leeds |
Keywords: Force Control, Medical Robots and Systems, Formal Methods in Robotics and Automation
Abstract: This paper discusses a novel static control approach applied to magnetic soft continuum robots (MSCRs). Our aim is to demonstrate the control of a multi-magnet soft continuum robot (SCR) in 3D. The proposed controller, based on a simplified yet accurate model of the robot, has a high update rate and is capable of real-time shape control. For the actuation of the MSCR, we employ the dual external permanent magnet (dEPM) platform and we sense the shape via fiber Bragg grating (FBG). The employed actuation system and sensing technique makes the proposed approach directly applicable to the medical context. We demonstrate that the proposed controller, running at approximately 300 Hz, is capable of shape tracking with a mean error of 8.5% and maximum error of 35.2% .We experimentally show that the static controller is 25.9% more accurate than a standard PID controller in shape tracking
|
|
MoAT7 Regular session, 258/259 |
Add to My Program |
Cooperating Robots |
|
|
Chair: Krakow, Lucas | Texas A&M University |
Co-Chair: Dantam, Neil | Colorado School of Mines |
|
08:30-08:36, Paper MoAT7.1 | Add to My Program |
IF-Based Trajectory Planning and Cooperative Control for Transportation System of Cable Suspended Payload with Multi UAVs |
|
Zhang, Yu | Northeastern University, China |
Xu, Jie | Northeastern University, China |
Zhao, Cheng | Northeastern University, China |
Dong, Jiuxiang | Northeastern University, China |
Keywords: Distributed Robot Systems, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: In this paper, we tackle the control and trajectory planning problems for the cooperative transportation system of cable-suspended payload with multi Unmanned Aerial Vehicles (UAVs). Firstly, a payload controller is presented considering the dynamic coupling between the UAV and the payload to accomplish the active suppression of payload swing and the complex payload trajectory tracking. Secondly, different from the simplification of obstacles in most approaches, we propose three Insetting Formation (IF) algorithms for the complete obstacle shape to generate collision-free waypoints for the cooperative transportation system. An IF strategy is proposed by integrating three IF algorithms to improve the success rate of obstacle avoidance and reduce the algorithm complexity for performing the aggressive flight. Finally, we verify the robustness and high performance of the proposed algorithm through benchmark comparison and real-world experiments. Moreover, our source code is released as an open-source ros package.
|
|
08:36-08:42, Paper MoAT7.2 | Add to My Program |
Cooperative Dual-Arm Control for Heavy Object Manipulation Based on Hierarchical Quadratic Programming |
|
Dio, Maximilian | Friedrich-Alexander-Universität Erlangen-Nürnberg |
Völz, Andreas | Friedrich-Alexander-Universität Erlangen-Nürnberg |
Graichen, Knut | Friedrich Alexander University Erlangen-Nürnberg |
Keywords: Cooperating Robots, Dual Arm Manipulation, Optimization and Optimal Control
Abstract: This paper presents a new control scheme for cooperative dual-arm robots manipulating heavy objects. The proposed method uses the full dynamical model of the kinematically coupled robot system and builds on a gls{hqp} formulation to enforce dynamical inequality constraints such as joint torques or internal loads. This ensures optimal tracking of an object trajectory, while additional objectives with lower priority are optimized on the prior solution space. Therefore, the redundancy of the inherent load distribution problem between the two arms can be eliminated. With this approach, higher object loads can be manipulated compared to non-optimized methods. Simulations with a 14~gls{dof} dual-arm robotic system demonstrate the effectiveness of the proposed control method. The real-time feasibility is guaranteed with an average computation time of less than 0.35 milliseconds at a control rate of 1 kilohertz.
|
|
08:42-08:48, Paper MoAT7.3 | Add to My Program |
Multi-UAV Adaptive Path Planning Using Deep Reinforcement Learning |
|
Westheider, Jonas | University Bonn |
Rückin, Julius | University of Bonn |
Popovic, Marija | University of Bonn |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Reinforcement Learning, Cooperating Robots
Abstract: Efficient aerial data collection is important in many remote sensing applications. In large-scale monitoring scenarios, deploying a team of unmanned aerial vehicles (UAVs) offers improved spatial coverage and robustness against individual failures. However, a key challenge is cooperative path planning for the UAVs to efficiently achieve a joint mission goal. We propose a novel multi-agent informative path planning approach based on deep reinforcement learning for adaptive terrain monitoring scenarios using UAV teams. We introduce new network feature representations to effectively learn path planning in a 3D workspace. By leveraging a counterfactual baseline, our approach explicitly addresses credit assignment to learn cooperative behaviour. Our experimental evaluation shows improved planning performance, i.e. maps regions of interest more quickly, with respect to non-counterfactual variants. Results on synthetic and real-world data show that our approach has superior performance compared to state-of-the-art non-learning-based methods, while being transferable to varying team sizes and communication constraints.
|
|
08:48-08:54, Paper MoAT7.4 | Add to My Program |
Collective Intelligence for 2D Push Manipulations with Mobile Robots |
|
Kuroki, So | The University of Tokyo |
Matsushima, Tatsuya | The University of Tokyo |
Jumpei, Arima | Matsuo Institute |
Furuta, Hiroki | The University of Tokyo |
Matsuo, Yutaka | The University of Tokyo |
Gu, Shixiang Shane | OpenAI |
Tang, Yujin | Google |
Keywords: Cooperating Robots, Mobile Manipulation, Imitation Learning
Abstract: While natural systems often present collective intelligence that allows them to self-organize and adapt to changes, the equivalent is missing in most artificial systems. We explore the possibility of such a system in the context of cooperative 2D push manipulations using mobile robots. Although conventional works demonstrate potential solutions for the problem in restricted settings, they have computational and learning difficulties. More importantly, these systems do not possess the ability to adapt when facing environmental changes. In this work, we show that by distilling a planner derived from a differentiable soft-body physics simulator into an attentionbased neural network, our multi-robot push manipulation system achieves better performance than baselines. In addition, our system also generalizes to configurations not seen during training and is able to adapt toward task completions when external turbulence and environmental changes are applied.
|
|
08:54-09:00, Paper MoAT7.5 | Add to My Program |
Emergent Cooperative Behavior in Distributed Target Tracking with Unknown Occlusions |
|
Li, Tianqi | Texas A&M University |
Krakow, Lucas | Texas A&M University |
Gopalswamy, Swaminathan | Texas A&M University |
Keywords: Cooperating Robots, Reactive and Sensor-Based Planning, Behavior-Based Systems
Abstract: Tracking multiple moving objects of interest (OOI) with multiple robot systems (MRS) has been addressed by active sensing that maintains a shared belief of OOIs and plans the motion of robots to maximize the information quality. Mobility of robots enables the behavior of pursuing better visibility, which is constrained by sensor field of view (FoV) and occlusion objects. We first extend prior work to detect, maintain and share occlusion information explicitly, allowing us to generate occlusion-aware planning even if a priori semantic occlusion information is unavailable. The efficacy of active sensing approaches is often evaluated according to estimation error and information gain metrics. However, these metrics do not directly explain the level of cooperative behavior engendered by the active sensing algorithms. Next, we extract different emergent cooperative behaviors that stem from the same underlying algorithms but manifest differently under differing scenarios. In particular, we highlight and demonstrate three emergent behavior patterns in active sensing MRS: (i) Change of tracking responsibility between agents when tracking trajectories with divergent directions or due to a re-allocation of the resource among heterogeneous agents; (ii) Awareness of occlusions to a trajectory and temporal leave-and-return of the sensing agent; (iii) Sharing of local occlusion objects in MRS that subsequently improves the awareness of occlusion.
|
|
09:00-09:06, Paper MoAT7.6 | Add to My Program |
Multi-Objective Sparse Sensing with Ergodic Optimization |
|
Rao, Ananya | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Keywords: Motion and Path Planning, Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems
Abstract: We consider a search problem where a robot has one or more types of sensors, each suited to detecting different types of targets or target information. Often, information in the form of a distribution of possible target locations, or locations of interest, may be available to guide the search. When multiple types of information exist, then a distribution for each type of information must also exist, thereby making the search problem that uses these distributions to guide the search a multi-objective one. In this paper, we consider a multi-objective search problem when the ”cost” to use a sensor is limited. To this end, we leverage the ergodic metric, which drives agents to spend time in regions proportional to the expected amount of information there. We define the multi-objective sparse sensing ergodic (MO-SS-E) metric in order to optimize when and where each sensor measurement should be taken while planning trajectories that balance the multiple objectives. We observe that our approach maintains coverage performance as the number of samples taken considerably degrades. Further empirical results on different multi-agent problem setups demonstrate the applicability of our approach for both homogeneous and heterogeneous multi-agent teams.
|
|
09:06-09:12, Paper MoAT7.7 | Add to My Program |
Team Coordination on Graphs with State-Dependent Edge Costs |
|
Limbu, Manshi | George Mason University |
Hu, Zechen | George Mason University |
Oughourli, Sara | George Mason University |
Wang, Xuan | George Mason University |
Xiao, Xuesu | George Mason University |
Shishika, Daigo | George Mason University |
Keywords: Planning, Scheduling and Coordination, Cooperating Robots, Multi-Robot Systems
Abstract: This paper studies a team coordination problem in a graph environment. Specifically, we incorporate “support” action which an agent can take to reduce the cost for its teammate to traverse some high cost edges. Due to this added feature, the graph traversal is no longer a standard multi-agent path planning problem. To solve this new problem, we propose a novel formulation that poses it as a planning problem in a joint state space: the joint state graph (JSG). Since the edges of JSG implicitly incorporate the support actions taken by the agents, we are able to now optimize the joint actions by solving a standard single-agent path planning problem in JSG. One main drawback of this approach is the curse of dimensionality in both the number of agents and the size of the graph. To improve scalability in graph size, we further propose a hierarchical decomposition method to perform path planning in two levels. We provide both theoretical and empirical complexity analyses to demonstrate the efficiency of our two algorithms.
|
|
09:12-09:18, Paper MoAT7.8 | Add to My Program |
Incorporating Stochastic Human Driving States in Cooperative Driving between a Human-Driven Vehicle and an Autonomous Vehicle |
|
Hossain, Sanzida | Oklahoma State University |
Lu, Jiaxing | Oklahoma State University |
Bai, He | Oklahoma State University |
Sheng, Weihua | Oklahoma State University |
Keywords: Cooperating Robots, Intelligent Transportation Systems, Human Factors and Human-in-the-Loop
Abstract: Modeling a human-driven vehicle is a difficult subject since human drivers have a variety of stochastic behavioral components that influence their driving styles. We develop a cooperative driving framework to incorporate different human behavior aspects, including the attentiveness of a driver and the tendency of the driver following advising commands. To demonstrate the framework, we consider the merging coordination between a human-driven vehicle and an autonomous vehicle (AV) in a connected environment. We propose a stochastic model predictive controller (sMPC) to address the stochasticity in human driving behavior and design coordinated merging actions to optimize the AV input and influence human driving behavior through advising commands. Simulation and human-in-the-loop (HITL) experimental results show that our formulation is capable of accommodating a distracted driver and optimizing AV inputs based on human driving behavior recognition.
|
|
09:18-09:24, Paper MoAT7.9 | Add to My Program |
Epistemic Planning for Heterogeneous Robotic Systems |
|
Bramblett, Lauren | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Cooperating Robots, Path Planning for Multiple Mobile Robots or Agents, Task and Motion Planning
Abstract: In applications such as search and rescue or disaster relief, heterogeneous multi-robot systems (MRS) can provide significant advantages for complex objectives that require a suite of capabilities. However, within these application spaces, communication is often unreliable, causing inefficiencies or outright failures to arise in most MRS algorithms. Many researchers tackle this problem by requiring all robots to either maintain communication using proximity constraints or assuming that all robots will execute a predetermined plan over long periods of disconnection. The latter method allows for higher levels of efficiency in a MRS, but failures and environmental uncertainties can have cascading effects across the system, especially when a mission objective is complex or time-sensitive. To solve this, we propose an epistemic planning framework that allows robots to reason about the system state, leverage heterogeneous system makeups, and optimize information dissemination to disconnected neighbors. Dynamic epistemic logic formalizes the propagation of belief states, and epistemic task allocation and gossip is accomplished via a mixed integer program using the belief states for utility predictions and planning. The proposed framework is validated using simulations and experiments with heterogeneous vehicles.
|
|
09:24-09:30, Paper MoAT7.10 | Add to My Program |
Reinforced Potential Field for Multi-Robot Motion Planning in Cluttered Environments |
|
Zhang, Dengyu | Sun Yat-Sen University |
Zhang, Xinyu | Sun Yat-Sen University |
Zhang, Zheng | Sun Yat-Sen University |
Zhu, Bo | Sun Yat-Sen University |
Zhang, Qingrui | Sun Yat-Sen University |
Keywords: Multi-Robot Systems, Motion and Path Planning, Collision Avoidance
Abstract: Motion planning is challenging for multiple robots in cluttered environments without communication, especially in view of real-time efficiency, motion safety, distributed computation, and trajectory optimality, etc. In this paper, a reinforced potential field method is developed for distributed multi-robot motion planning, which is a synthesized design of reinforcement learning and artificial potential fields. An observation embedding with a self-attention mechanism is presented to model the robot-robot and robot-environment interactions. A soft wall-following rule is developed to improve the trajectory smoothness. Our method belongs to reactive planning, but environment properties are implicitly encoded. The total amount of robots in our method can be scaled up to any number. The performance improvement over a vanilla APF and RL method has been demonstrated via numerical simulations. Experiments are also performed using quadrotors to further illustrate the competence of our method.
|
|
09:30-09:36, Paper MoAT7.11 | Add to My Program |
Robot Team Data Collection with Anywhere Communication |
|
Schack, Matthew | Colorado School of Mines |
Rogers III, John G. | US Army Research Laboratory |
Han, Qi | Colorado School of Mines |
Dantam, Neil | Colorado School of Mines |
Keywords: Multi-Robot Systems, Cooperating Robots, Path Planning for Multiple Mobile Robots or Agents
Abstract: Using robots to collect data is an effective way to obtain information from the environment and communicate it to a static base station. Furthermore, robots have the capability to communicate with one another, potentially decreasing the time for data to reach the base station. We present a Mixed Integer Linear Program that reasons about discrete routing choices, continuous robot paths, and their effect on the latency of the data collection task. We analyze our formulation, discuss optimization challenges inherent to the data collection problem, and propose a factored formulation that finds optimal answers more efficiently. Our work is able to find paths that reduce latency by up to 101% compared to treating all robots independently in our tested scenarios.
|
|
09:36-09:42, Paper MoAT7.12 | Add to My Program |
Coordination of Multiple Mobile Manipulators for Ordered Sorting of Cluttered Objects |
|
Ahn, Jeeho | Korea University |
Lee, Sebin | Sogang University |
Nam, Changjoo | Sogang University |
Keywords: Cooperating Robots, Multi-Robot Systems, Manipulation Planning
Abstract: We present a coordination method for multiple mobile manipulators to sort objects in clutter. We consider the object rearrangement problem in which the objects must be sorted into different groups in a particular order. In clutter, the order constraints could not be easily satisfied since some objects occlude other objects so the occluded ones are not directly accessible to the robots. Those objects occluding others need to be moved more than once to make the occluded objects accessible. Such rearrangement problems fall into the class of nonmonotone rearrangement problems which are computationally intractable. While the nonmonotone problems with order constraints are harder, involving with multiple robots requires another computation for task allocation. In this work, we aim to develop a fast method, albeit suboptimally, for the multi-robot coordination for ordered sorting in clutter. The proposed method finds a sequence of objects to be sorted using a search such that the order constraint in each group is satisfied. The search can solve nonmonotone instances that require temporal relocation of some objects to access the next object to be sorted. Once a complete sorting sequence is found, the objects in the sequence are assigned to multiple mobile manipulators using a greedy task allocation method. We develop four versions of the method with different search strategies. In the experiments, we show that our method can find a sorting sequence quickly (e.g., 4.6 sec with 20 objects sorted into five groups) even though the solved instances include hard nonmonotone ones. The extensive tests and the experiments in simulation show the ability of the method to solve the real-world sorting problem using multiple mobile manipulators.
|
|
09:42-09:48, Paper MoAT7.13 | Add to My Program |
MOTLEE: Distributed Mobile Multi-Object Tracking with Localization Error Elimination |
|
Peterson, Mason B. | Massachusetts Institute of Technology |
Lusk, Parker C. | Massachusetts Institute of Technology |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Distributed Robot Systems, Visual Tracking, Localization
Abstract: We present MOTLEE, a distributed mobile multi-object tracking algorithm that enables a team of robots to collaboratively track moving objects in the presence of localization error. Existing approaches to distributed tracking make limiting assumptions regarding the relative spatial relationship of sensors, including assuming a static sensor network or that perfect localization is available. Instead, we develop an algorithm based on the Kalman-Consensus filter for distributed tracking that properly leverages localization uncertainty in collaborative tracking. Further, our method allows the team to maintain an accurate understanding of dynamic objects in the environment by realigning robot frames and incorporating frame alignment uncertainty into our object tracking formulation. We evaluate our method in hardware on a team of three mobile ground robots tracking four people. Compared to previous works that do not account for localization error, we show that MOTLEE is resilient to localization uncertainties, enabling accurate tracking in distributed, dynamic settings with mobile tracking sensors.
|
|
MoAT8 Regular session, 141 |
Add to My Program |
Legged Robots I |
|
|
Chair: Behnke, Sven | University of Bonn |
Co-Chair: Semini, Claudio | Istituto Italiano Di Tecnologia |
|
08:30-08:36, Paper MoAT8.1 | Add to My Program |
Dynamic Object Tracking for Quadruped Manipulator with Spherical Image-Based Approach |
|
Zhang, Tianlin | Harbin Institute of Technology |
Guo, Sikai | Harbin Institute of Technology |
Xiong, Xiaogang | Harbin Institute of Technology, Shenzhen |
Li, Wanlei | Harbin Institute of Technology(ShenZhen) |
Qi, Zezheng | Harbin Institute of Technology, Shenzhen |
Lou, Yunjiang | Harbin Institute of Technology, Shenzhen |
Keywords: Legged Robots, Visual Servoing, Visual Tracking
Abstract: Exactly estimating and tracking the motion of surrounding dynamic objects is one of important tasks for the autonomy of a quadruped manipulator. However, with only an onboard RGB camera, it is still a challenging work for a quadruped manipulator to track the motion of a dynamic object moving with unknown and changing velocities. To address this problem, this manuscript proposes a novel image-based visual servoing (IBVS) approach consisting of three elements: a spherical projection model, a robust super-twisting observer, and a model predictive controller (MPC). The spherical projection model decouples the visual error of the dynamic target into linear and angular ones. Then, with the presence of the visual error, the robustness of the observer is exploited to estimate the unknown and changing velocities of the dynamic target without depth estimation. Finally, the estimated velocity is fed into the model predictive controller (MPC) to generate joint torques for the quadruped manipulator to track the motion of the dynamical target. The proposed approach is validated through hardware experiments and the experimental results illustrate the approach's effectiveness in improving the autonomy of the quadruped manipulator.
|
|
08:36-08:42, Paper MoAT8.2 | Add to My Program |
Proprioception and Tail Control Enable Extreme Terrain Traversal by Quadruped Robots |
|
Yang, Yanhao | Oregon State University |
Norby, Joseph | Apptronik |
Yim, Justin K. | University of Illinois Urbana-Champaign |
Johnson, Aaron M. | Carnegie Mellon University |
Keywords: Legged Robots, Biologically-Inspired Robots, Optimization and Optimal Control
Abstract: Legged robots leverage ground contacts and the reaction forces they provide to achieve agile locomotion. However, uncertainty coupled with contact discontinuities can lead to failure, especially in real-world environments with unexpected height variations such as rocky hills or curbs. To enable dynamic traversal of extreme terrain, this work introduces 1) a proprioception-based gait planner for estimating unknown hybrid events due to elevation changes and responding by modifying contact schedules and planned footholds online, and 2) a two-degree-of-freedom tail for improving contact-independent control and a corresponding decoupled control scheme for better versatility and efficiency. Simulation results show that the gait planner significantly improves stability under unforeseen terrain height changes compared to methods that assume fixed contact schedules and footholds. Further, tests have shown that the tail is particularly effective at maintaining stability when encountering a terrain change with an initial angular disturbance. The results show that these approaches work synergistically to stabilize locomotion with elevation changes up to 1.5 times the leg length and tilted initial states.
|
|
08:42-08:48, Paper MoAT8.3 | Add to My Program |
Run and Catch: Dynamic Object-Catching of Quadrupedal Robots |
|
You, Yangwei | Institute for Infocomm Research |
Liu, Tianlin | Peking University |
Liang, Xiaowei | Beijing Xiaomi Mobile Software Co., Ltd |
Xu, Zhe | Beijing Institute of Technology |
Zhou, Mingliang | Beijing Xiaomi Mobile Software Co., Ltd |
Li, Zhibin (Alex) | University College London |
Zhang, Shiwu | University of Science and Technology of China |
Keywords: Legged Robots, Whole-Body Motion Planning and Control, Climbing Robots
Abstract: Quadrupedal robots are performing increasingly more real-world capabilities, but are primarily limited to locomotion tasks. To expand their task-level abilities of object acquisition, i.e., run-to-catch as frisbee catching for dogs, this paper developed a control pipeline using stereo vision for legged robots which allows for dynamic catching balls while the robot is in motion. To achieve high-frame-rate tracking, we designed a ball that can actively emit homogeneous infrared (IR) light and then located the flying ball based on binocular vision positioning using the onboard RealSense D450 camera with an additional IR bandpass filter. The camera was mounted on top of a 2-DoF head to gain a full view of the target ball. A state estimation module was developed to fuse the vision positioning, camera motor readings, localization result of RealSense T265 equipped on the back, and the legged odometry output altogether. With the use of a ballistic model, we achieved a robust estimation of both the ball and robot positions in an inertial coordinate. Additionally, we developed a close-loop catching strategy and employed trajectory prediction so that tracking and run-to-catch were performed simultaneously, which is critical for such drastically dynamic and precise tasks. The proposed approach was validated through both static testing and dynamic catch experiments conducted on the CyberDog robot with a high success rate.
|
|
08:48-08:54, Paper MoAT8.4 | Add to My Program |
A Composite Control Strategy for Quadruped Robot by Integrating Reinforcement Learning and Model-Based Control |
|
Lyu, Shangke | Nanyang Technological University |
Zhao, Han | Beijing University of Posts and Telecommunications |
Wang, Donglin | Westlake University |
Keywords: Legged Robots, Motion Control, Reinforcement Learning
Abstract: Locomotion in the wild requires the quadruped robot to have strong capabilities in adaptation and robustness. The deep reinforcement learning (DRL) exhibits the huge potential in environmental adaptability, while its stability issues remain open. On the other hand, the quadruped robot dynamic model contains a lot of useful information that is beneficial to the robust control. The combination of DRL with model-based control may take both strengths and hold promises in better robustness. In this paper, the DRL and the proposed model-based controller are firmly integrated in a novel manner such that the proposed model-based controller is able to rectify the gait commands generated by DRL based on the system dynamic model so as to enhance the robustness of the quadruped robot against the external disturbances. Besides, a potential energy function is introduced to achieve the compliant contact. The stability of the proposed method is ensured in terms of passivity analysis. Several physical experiments are carried out to verify the performance of the proposed method.
|
|
08:54-09:00, Paper MoAT8.5 | Add to My Program |
Load Awareness: Sensorless Body Payload Sensing and Localization for Heavy Quadruped Robot |
|
Liu, Shaoxun | Shanghai Jiao Tong University |
Zhou, Shiyu | Shanghai Jiao Tong University |
Pan, Zheng | Shanghai Jiao Tong University |
Niu, Zhihua | Shanghai Jiao Tong University |
Wang, Rongrong | Shanghai Jiao Tong University |
Keywords: Legged Robots, Contact Modeling, Dynamics
Abstract: Heavy quadrupedal drives have great potential for overcoming obstacles, showing great possibilities for transportation industries in complex environments. Ground reaction force (GRF) is a crucial state variable for quadrupedal control. Most GRF observations are implemented in lightweight quadrupeds, with little consideration of the loading being static or slippery on the body. However, the load information is vital to the heavy-duty quadruped applied in transportation tasks. In this paper, we disassembled the whole-body dynamics into the body dynamics combined with the individual floating single-leg dynamics and completed observing the virtual coupling effects between the body and legs. Based on the observed coupling force and centroidal dynamics (CD), the GRF of a stance leg is obtained without the awareness of body weight, movement, and load information. Furthermore, we utilized the body dynamics and the observed virtual force to obtain the body's unknown payload. By reconstructing the moment balance equation, we obtained the payload's position concerning the body coordinate. Compared to conventional quadrupedal GRF observation methods, this framework achieves higher observation accuracy in heavy quadrupeds without load and body information. Additionally, it enables real-time calculation of load magnitude and position.
|
|
09:00-09:06, Paper MoAT8.6 | Add to My Program |
Evolutionary-Based Online Motion Planning Framework for Quadruped Robot Jumping |
|
Yue, Linzhu | The Chinese University of Hong Kong |
Song, Zhitao | The Chinese University of Hong Kong |
Zhang, Hongbo | The Chinese University of Hong Kong |
Zhang, Lingwei | Hong Kong Centre for Logistics Robotics |
Zeng, Xuanqi | Chinese University of Hong Kong |
Liu, Yunhui | Chinese University of Hong Kong |
Keywords: Legged Robots, Whole-Body Motion Planning and Control, Motion and Path Planning
Abstract: Offline evolutionary-based methodologies have supplied a successful motion planning framework for the quadrupedal jump. However, the time-consuming computation caused by massive population evolution in offline evolutionary-based jumping framework significantly limits the popularity in the quadrupedal field. This paper presents a time-friendly online motion planning framework, based on meta-heuristic Differential evolution (DE), Latin hypercube sampling, and Configuration space (DLC). The DLC framework establishes a multidimensional optimization problem leveraging centroidal dynamics to determine the ideal trajectory of the center of mass (CoM) and ground reaction forces (GRFs). The configuration space is introduced to the evolutionary optimization in order to condense the searching region. Latin hypercube sampling offers more uniform initial populations of DE under limited sampling points, which accelerates away from a local minimum. This research also constructs a collection of pre-motion trajectories as a warm start, when the objective state is in the neighborhood of the pre-motion state, to drastically reduce the solving time. The proposed methodology is successfully validated via real robot experiments for online jumping trajectory optimization with different jumping motions (e.g., ordinary jumping, flipping, and spinning).
|
|
09:06-09:12, Paper MoAT8.7 | Add to My Program |
Multi-IMU Proprioceptive Odometry for Legged Robots |
|
Yang, Shuo | Carnegie Mellon University |
Zhang, Zixin | Carnegie Mellon University |
Bokser, Benjamin | Boston Dynamics AI Institute |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Legged Robots, Sensor Fusion, Contact Modeling
Abstract: This paper presents a novel, low-cost proprioceptive sensing solution for legged robots with point feet to achieve accurate low-drift long-term position and velocity estimation. In addition to conventional sensors, including one body Inertial Measurement Unit (IMU) and joint encoders, we attach an additional IMU to each calf link of the robot just above the foot. An extended Kalman filter is used to fuse data from all sensors to estimate the robot's body and foot positions in the world frame. Using the additional IMUs, the filter is able to reliably determine foot contact modes and detect foot slips without tactile or pressure-based foot contact sensors. This sensing solution is validated in various hardware experiments, which confirm that it can reduce position drift by nearly an order of magnitude compared to conventional approaches with only a very modest increase in hardware and computational costs.
|
|
09:12-09:18, Paper MoAT8.8 | Add to My Program |
Design and Motion Guidelines for Quadrupedal Locomotion of Maximum Speed or Efficiency with Serial and Parallel Legs |
|
Machairas, Konstantinos | National Technical University of Athens |
Papadopoulos, Evangelos | National Technical University of Athens |
Keywords: Legged Robots, Task and Motion Planning, Mechanism Design
Abstract: Analytical expressions are derived for actuator demands in quadrupedal locomotion of constant speed and height by using a reduction from a trot/ pace 6-bar model to a single-legged model and employing two widely used two-segmented leg architectures, the serial and the parallel. A method is developed that outputs optimal gait characteristics and leg designs for a robot to move with maximum efficiency or speed. Also, generic guidelines are presented, which answer questions such as: which speed should be selected for maximum efficiency, or which is the optimal leg architecture (serial/ parallel) and leg length for maximum efficiency or speed.
|
|
09:18-09:24, Paper MoAT8.9 | Add to My Program |
Towards Legged Locomotion on Steep Planetary Terrain |
|
Valsecchi, Giorgio | Robotic System Lab, ETH |
Weibel, Cedric | ETH Zuerich |
Kolvenbach, Hendrik | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Legged Robots, Space Robotics and Automation, Reinforcement Learning
Abstract: Scientific exploration of planetary bodies is an activity well-suited for robots. Unfortunately, the regions that are richer in potential discoveries, such as impact craters, caves, and vulcanic terraces, are hard to access with wheeled robots. Recent advances in legged-based approaches have shown the potential of the technology to overcome difficult terrains such as slopes and slippery surfaces. In this work, we focus on locomotion for sandy slopes, comparing baseline state-of-the-art walking policies with a novel crawling-based gait for quadrupedal robots. We fine-tuned a state-of-the-art locomotion framework and introduced hardware modifications to the robot ANYmal, which enables walking on its knees. Moreover, we integrated a novel metric for stability, the stability margin, in the training process to increase robustness in such conditions. We benchmarked the locomotion policies in simulation and in real-world experiments on martian soil simulant. Results show an improvement in locomotion performance and a more robust gait at higher slope angles.
|
|
09:24-09:30, Paper MoAT8.10 | Add to My Program |
Dynamic Hybrid Locomotion and Jumping for Wheeled-Legged Quadrupeds |
|
Hosseini, Mojtaba | University of Bonn |
Rodriguez, Diego | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Legged Robots, Wheeled Robots, Whole-Body Motion Planning and Control
Abstract: Hybrid wheeled-legged quadrupeds have the potential to navigate challenging terrain with agility and speed and over long distances. However, obstacles can impede their progress by requiring the robots to either slow down to step over obstacles or modify their path to circumvent the obstacles. We propose a motion optimization framework for quadruped robots that incorporates non-steerable wheels and dynamic jumps, enabling them to perform hybrid wheeled-legged locomotion while overcoming obstacles without slowing down. Our approach involves a model predictive controller that uses a time-varying rigid body dynamics model of the robot, including legs and wheels, to track dynamic motions such as jumping. We also introduce a method for driving with minimal leg swings to reduce energy consumption by sparing the effort involved in lifting the wheels. Our method was tested successfully on the wheeled Mini Cheetah and the Unitree AlienGo robots. Further videos and results are available at https://www.ais.uni-bonn.de/%7ehosseini/iros2023
|
|
09:30-09:36, Paper MoAT8.11 | Add to My Program |
Quadrupedal Footstep Planning Using Learned Motion Models of a Black-Box Controller |
|
Taouil, Ilyass | Istituto Italiano Di Tecnologia |
Turrisi, Giulio | Istituto Italiano Di Tecnologia |
Schleich, Daniel | University of Bonn |
Barasuol, Victor | Istituto Italiano Di Tecnologia |
Semini, Claudio | Istituto Italiano Di Tecnologia |
Behnke, Sven | University of Bonn |
Keywords: Legged Robots, Motion and Path Planning, Machine Learning for Robot Control
Abstract: Legged robots are increasingly entering new domains and applications, including search and rescue, inspection, and logistics. However, for such systems to be valuable in real-world scenarios, they must be able to autonomously and robustly navigate irregular terrains. In many cases, robots that are sold on the market do not provide such abilities, being able to perform only blind locomotion. Furthermore, their controller cannot be easily modified by the end-user, requiring a new and time-consuming control synthesis. In this work, we present a local motion planning pipeline that extends the capabilities of a black-box walking controller that is only able to track high-level reference velocities. More precisely, we learn a set of motion models for such controller that maps high-level velocity commands to Center of Mass (CoM) and footstep motions. We then integrate these models with a variant of the A* algorithm to plan the CoM trajectory, footstep sequences, and corresponding high-level velocity commands based on visual information, allowing the quadruped to safely traverse irregular terrains at demand.
|
|
09:36-09:42, Paper MoAT8.12 | Add to My Program |
An Efficient Paradigm for Feasibility Guarantees in Legged Locomotion (I) |
|
Abdalla, Abdelrahman | Italian Institute of Technology |
Focchi, Michele | Università Di Trento |
Orsolino, Romeo | Arrival Ltd |
Semini, Claudio | Istituto Italiano Di Tecnologia |
Keywords: Legged Robots, Dynamics, Kinematics, Motion and Path Planning
Abstract: Developing feasible body trajectories for legged systems on arbitrary terrains is a challenging task. In this article, we present a paradigm that allows to design feasible Center of Mass (CoM) and body trajectories in an efficient manner. In our previous work (Orsolino et al., 2020), we introduced the notion of the two-dimensional feasible region, where static balance and the satisfaction of joint-torque limits were guaranteed, whenever the projection of the CoM lied inside the proposed admissible region. In this work, we propose a general formulation of the improved feasible region that guarantees dynamic balance alongside the satisfaction of both joint-torque and kinematic limits in an efficient manner. To incorporate the feasibility of the kinematic limits, we introduce an algorithm that computes the reachable region of the CoM. Furthermore, we propose an efficient planning strategy that utilizes the improved feasible region to design feasible CoM and body orientation trajectories. Finally, we validate the capabilities of the improved feasible region and the effectiveness of the proposed planning strategy, using simulations and experiments on the 90 kg hydraulically actuated quadruped and the 21 kg Aliengo robots.
|
|
MoAT9 Regular session, 142ABC |
Add to My Program |
Motion and Path Planning I |
|
|
Chair: Hovakimyan, Naira | University of Illinois at Urbana-Champaign |
Co-Chair: Bezzo, Nicola | University of Virginia |
|
08:30-08:36, Paper MoAT9.1 | Add to My Program |
Locomotion Planning of a Truss Robot on Irregular Terrain |
|
Bae, Jangho | University of Pennsylvania |
Park, Inha | Hanyang University |
Yim, Mark | University of Pennsylvania |
Seo, TaeWon | Hanyang University |
Keywords: Cellular and Modular Robots, Motion and Path Planning
Abstract: This paper proposes a new locomotion algorithm for truss robots on irregular terrain, in particular, for the Variable Topology Truss (VTT) system. The previous Polygon-based Random Tree (PRT) search algorithm for support polygon generation is extended to irregular terrain while considering friction and internal force limitations. By characterizing terrain, unreachable areas are excluded from search to increase efficiency. A one-step rolling motion primitive is generated based on the kinematics, statics, and constraints of VTT. The locomotion planning is completed by transforming and connecting multiple motion primitives with respect to the desired support polygons. The algorithm’s performance is verified by conducting simulations in multiple types of environments.
|
|
08:36-08:42, Paper MoAT9.2 | Add to My Program |
A Model Predictive Path Integral Method for Fast, Proactive, and Uncertainty-Aware UAV Planning in Cluttered Environments |
|
Higgins, Jacob | University of Virginia |
Mohammad, Nicholas | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Motion and Path Planning, Autonomous Vehicle Navigation, Aerial Systems: Mechanics and Control
Abstract: Current motion planning approaches for autonomous mobile robots often assume that the low level controller of the system is able to track the planned motion with very high accuracy. In practice, however, tracking error can be affected by many factors, and could lead to potential collisions when the robot must traverse a cluttered environment. To address this problem, this paper proposes a novel receding-horizon motion planning approach based on Model Predictive Path Integral (MPPI) control theory -- a flexible sampling-based control technique that requires minimal assumptions on vehicle dynamics and cost functions. This flexibility is leveraged to propose a motion planning framework that also considers a data-informed risk function. Using the MPPI algorithm as a motion planner also reduces the number of samples required by the algorithm, relaxing the hardware requirements for implementation. The proposed approach is validated through trajectory generation for a quadrotor unmanned aerial vehicle (UAV), where fast motion increases trajectory tracking error and can lead to collisions with nearby obstacles. Simulations and hardware experiments demonstrate that the MPPI motion planner proactively adapts to the obstacles that the UAV must negotiate, slowing down when near obstacles and moving quickly when away from obstacles, resulting in a complete reduction of collisions while still producing lively motion.
|
|
08:42-08:48, Paper MoAT9.3 | Add to My Program |
Energy-Efficient Team Orienteering Problem in the Presence of Time-Varying Ocean Currents |
|
Mansfield, Ariella | University of Pennsylvania |
G. Macharet, Douglas | Universidade Federal De Minas Gerais |
Hsieh, M. Ani | University of Pennsylvania |
Keywords: Task and Motion Planning, Multi-Robot Systems, Planning, Scheduling and Coordination
Abstract: Autonomous Marine Vehicles (AMVs) have gained interest for scientific and commercial applications, including pipeline and algae bloom monitoring, contaminant tracking, and ocean debris removal. The Team Orienteering Problem (TOP) is relevant in this context as Multi-Robot Systems (MRSs) allow for better coverage of the area of interest, simultaneous data collection at different locations, and an increase in the overall robustness and efficiency of the mission. However, route planning for AMVs in dynamic ocean environments is challenging due to the coupling of environmental and vehicle dynamics. We propose a multi-objective formulation that accounts for the trade-offs between visiting multiple task locations and energy consumption by the vehicles subject to a time budget. Different from existing approaches, our method is able to leverage time-varying ocean currents to improve the energy efficiency of resulting routes. We validate our approach experimentally by superimposing ocean flow models with benchmark instances of the TOP.
|
|
08:48-08:54, Paper MoAT9.4 | Add to My Program |
Multi-Agent Multi-Objective Ergodic Search Using Branch and Bound |
|
Kesarimangalam Srinivasan, Akshaya | Carnegie Mellon University |
Gutow, Geordan | Carnegie Mellon University |
Ren, Zhongqiang | Carnegie Mellon University |
Abraham, Ian | Yale University |
Vundurthy, Bhaskar | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Keywords: Task and Motion Planning, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: Search and rescue applications often need multiple agents to complete a set of conflicting tasks. This paper studies a Multi-Agent Multi-Objective Ergodic Search (MA-MO-ES) approach to this problem where each objective or task is to cover a domain subject to an information map. The goal is to allocate tasks to agents so that all maps are covered ergodically. The combinatorial nature of task allocation makes it computationally expensive to solve optimally using brute force. Apart from a large number of possible allocations, computing the cost of a task allocation is itself a planning problem. To mitigate the computational challenge, we present a branch and bound-based algorithm with pruning techniques that reduce the number of allocations to be searched to find an optimal allocation. We also present an approach to leverage the similarity between information maps to further reduce computation. Extensive testing on 150 randomly generated test cases shows an order of magnitude improvement in runtime compared to an exhaustive brute force approach.
|
|
08:54-09:00, Paper MoAT9.5 | Add to My Program |
Leveraging Single-Goal Predictions to Improve the Efficiency of Multi-Goal Motion Planning with Dynamics |
|
Lu, Yuanjie | George Mason University |
Plaku, Erion | George Mason University |
Keywords: Motion and Path Planning, Nonholonomic Motion Planning
Abstract: Multi-goal motion planning requires a robot to plan collision-free and dynamically-feasible motions to reach multiple goals, often in unstructured, obstacle-rich environments. This is challenging due to the complex dependencies between navigation and high-level reasoning, requiring the robot to explore a vast space of feasible motions and goal sequences.Our approach combines machine learning and Traveling Salesman Problem (TSP) solvers with sampling-based motion planning. Machine learning predicts distances and directions between locations, considering obstacles and robot dynamics, which the TSP solver uses to compute promising tours. Sampling-based motion planning expands a motion tree to follow the tours along the predicted directions. We demonstrate the effectiveness of our approach through experiments with vehicle and snake-like robot models operating in unstructured environments with multiple goals.
|
|
09:00-09:06, Paper MoAT9.6 | Add to My Program |
DynGMP: Graph Neural Network-Based Motion Planning in Unpredictable Dynamic Environments |
|
Zhang, Wenjin | Rutgers University |
Zang, Xiao | Rutgers University |
Huang, Lingyi | Rutgers University |
Sui, Yang | Rutgers University |
Yu, Jingjin | Rutgers University |
Chen, Yingying | Rutgers University |
Yuan, Bo | Rutgers University |
Keywords: Motion and Path Planning, Planning, Scheduling and Coordination, Deep Learning Methods
Abstract: Abstract— Neural networks have already demonstrated attractive performance for solving motion planning problems, especially in static and predictable environments. However, efficient neural planners that can adapt to unpredictable dynamic environments, a highly demanded scenario in many practical applications, are still under-explored. To fill this research gap and enrich the existing motion planning approaches, in this paper, we propose DynGMP, a graph neural network (GNN)-based planner that provides high-performance planning solutions in unpredictable dynamic environments. By fully leveraging the prior exploration experience and minimizing the replanning the cost incurred by environmental change, DynGMP achieves high planning performance and efficiency simultaneously. Empirical evaluations across different environments show that DynGMP can achieve close to 100% success rate with fast planning speed and short path cost. Compared with existing non-learning and learning-based counterparts, DynGMP shows very significant planning performance improvement, e.g., at least 2.7×, 2.2×, 2.4× and 2× faster planning speed with low path distance in four environments, respectively.
|
|
09:06-09:12, Paper MoAT9.7 | Add to My Program |
Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning |
|
Zhang, Xiaohan | SUNY Binghamton |
Zhu, Yifeng | The University of Texas at Austin |
Ding, Yan | SUNY Binghamton |
Jiang, Yuqian | University of Texas at Austin |
Zhu, Yuke | The University of Texas at Austin |
Stone, Peter | University of Texas at Austin |
Zhang, Shiqi | SUNY Binghamton |
Keywords: Task and Motion Planning, Mobile Manipulation, Service Robotics
Abstract: In existing task and motion planning (TAMP) research, it is a common assumption that experts manually specify the state space for task-level planning. A well-developed state space enables the desirable distribution of limited computational resources between task planning and motion planning. However, developing such task-level state spaces can be non-trivial in practice. In this paper, we consider a long horizon mobile manipulation domain including repeated navigation and manipulation. We propose Symbolic State Space Optimization (S3O) for computing a set of abstracted locations and their 2D geometric groundings for generating task-motion plans in such domains. Our approach has been extensively evaluated in simulation and demonstrated on a real mobile manipulator working on clearing up dining tables. Results show the superiority of the proposed method over TAMP baselines in task completion rate and execution time.
|
|
09:12-09:18, Paper MoAT9.8 | Add to My Program |
A Fast and Map-Free Model for Trajectory Prediction in Traffics |
|
Xiang, Junhong | Chongqing University |
Zhang, Jingmin | No. 208 Research Institute of China Ordnance Industries |
Nan, Zhixiong | Chongqing University |
Keywords: Motion and Path Planning, Autonomous Agents, Deep Learning Methods
Abstract: To handle the two shortcomings of existing methods, (i) nearly all models rely on high-definition (HD) maps, yet the map information is not always available in real traffic scenes and HD map-building is expensive and time-consuming and (ii) existing models usually focus on improving prediction accuracy at the expense of reducing computing efficiency, yet the efficiency is crucial for various real applications, this paper proposes an efficient trajectory prediction model that is not dependent on traffic maps. The core idea of our model is encoding single-agent's spatial-temporal information in the first stage and exploring multi-agents' spatial-temporal interactions in the second stage. By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer in the two stages, our model is able to learn rich dynamic and interaction information of all agents. Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset. In addition, our model also exhibits a faster inference speed than the baseline methods.
|
|
09:18-09:24, Paper MoAT9.9 | Add to My Program |
Local Non-Cooperative Games with Principled Player Selection for Scalable Motion Planning |
|
Chahine, Makram | Massachusetts Institute of Technology |
Firoozi, Roya | Stanford University |
Xiao, Wei | MIT |
Schwager, Mac | Stanford University |
Rus, Daniela | MIT |
Keywords: Motion and Path Planning, Multi-Robot Systems, Aerial Systems: Applications
Abstract: Game-theoretic motion planners are a powerful tool for the control of interactive multi-agent robot systems. Indeed, contrary to predict-then-plan paradigms, game-theoretic planners do not ignore the interactive nature of the problem, and simultaneously predict the behaviour of other agents while considering change in one’s policy. This, however, comes at the expense of computational complexity, especially as the number of agents considered grows. In fact, planning with more than a handful of agents can quickly become intractable, disqualifying game-theoretic planners as possible candidates for large scale planning. In this paper, we propose a planning algorithm enabling the use of game-theoretic planners in robot systems with a large number of agents. Our planner is based on the reality of locality of information and thus deploys local games with a selected subset of agents in a receding horizon fashion to plan collision avoiding trajectories. We propose five different principled schemes for selecting game participants and compare their collision avoidance performance. We observe that the use of Control Barrier Functions for priority ranking is a potent solution to the player selection problem for motion planning.
|
|
09:24-09:30, Paper MoAT9.10 | Add to My Program |
Target Attribute Perception Based UAV Real-Time Task Planning in Dynamic Environments |
|
He, Jinhong | Huazhong University of Science and Technology |
Sun, Zheyu | Huazhong University of Science and Technology |
Ming, Delie | Huazhong University of Science and Technology |
Cai, Chao | Huazhong University of Science and Technology |
Cao, Ningbo | Huazhong University of Science and Technology |
Keywords: Motion and Path Planning, Computer Vision for Automation, Deep Learning for Visual Perception
Abstract: In this paper, a comprehensive solution for enabling unmanned aerial vehicle (UAV) to autonomously fly through complex and dynamic environments is proposed. Moving objects all have unique property information, we propose a method that utilizes deep learning for 3D dynamic environment perception, while taking into account limitations in computing resources. For safer dynamic avoidance, we first model the dynamic target and integrate it into a static grid occupancy map, and then construct a gradient field based on its attribute information. To achieve autonomous UAV flight in dynamic environments, we design an adaptive planning method based on gradient optimisation, which achieves significant computational savings by autonomously adjusting the planning frequency and using manually constructed gradients instead of maintaining a signed distance field (SDF). We have integrated the above approach into a customised quadrotor system and thoroughly tested it in real-world, verifying its flexibility to handle multiple objects with variable speed motion in complex enviroment.
|
|
09:30-09:36, Paper MoAT9.11 | Add to My Program |
Simultaneous Spatial and Temporal Assignment for Fast UAV Trajectory Optimization Using Bilevel Optimization |
|
Chen, Qianzhong | University of Illinois Urbana-Champaign |
Cheng, Sheng | University of Illinois Urbana-Champaign |
Hovakimyan, Naira | University of Illinois at Urbana-Champaign |
Keywords: Constrained Motion Planning, Aerial Systems: Applications, Optimization and Optimal Control
Abstract: In this paper, we propose a framework for fast trajectory planning for unmanned aerial vehicles (UAVs). Our framework is reformulated from an existing bilevel optimization, in which the lower-level problem solves for the optimal trajectory with a fixed time allocation, whereas the upper-level problem updates the time allocation using analytical gradients. The lower-level problem incorporates the safety-set constraints (in the form of inequality constraints) and is cast as a convex quadratic program (QP). Our formulation modifies the lower-level QP by excluding the inequality constraints for the safety sets, which significantly reduces the computation time. The safety-set constraints are moved to the upper-level problem, where the feasible waypoints are updated together with the time allocation using analytical gradients enabled by the OptNet. We validate our approach in simulations, where our method's computation time scales linearly with respect to the number of safety sets, in contrast to the state-of-the-art that scales exponentially.
|
|
09:36-09:42, Paper MoAT9.12 | Add to My Program |
A Non-Prehensile Object Transportation Framework with Adaptive Tilting Based on Quadratic Programming |
|
Subburaman, Rajesh | University of Naples Federico II |
Selvaggio, Mario | Università Degli Studi Di Napoli Federico II |
Ruggiero, Fabio | Università Di Napoli Federico II |
Keywords: Dexterous Manipulation, Optimization and Optimal Control, Intelligent Transportation Systems
Abstract: This work proposes an operational space control framework for non-prehensile object transportation using a robot arm. The control actions for the manipulator are computed by solving a quadratic programming (QP) problem considering the object's and manipulator's kinematic and dynamic constraints. Given the desired transportation trajectory, the proposed controller generates control commands for the robot to achieve the desired motion whilst preventing object slippage. In particular, the controller minimizes the occurrence of object slippage by adaptively regulating the tray orientation. The proposed approach has been extensively evaluated numerically with a 7-degree-of-freedom manipulator, and it is also verified and validated with a real experimental setup.
|
|
09:42-09:48, Paper MoAT9.13 | Add to My Program |
Dynamic Optimization Fabrics for Motion Generation (I) |
|
Spahn, Max | TU Delft |
Wisse, Martijn | Delft University of Technology |
Alonso-Mora, Javier | Delft University of Technology |
Keywords: Mobile Manipulation, Nonholonomic Motion Planning, Motion Control of Manipulators, Geometric Control
Abstract: Optimization fabrics are a geometric approach to real-time local motion generation, where motions are designed by the composition of several differential equations that exhibit a desired motion behavior. We generalize this framework to dynamic scenarios and non-holonomic robots and prove that fundamental properties can be conserved. We show that convergence to desired trajectories and avoidance of moving obstacles can be guaranteed using simple construction rules of the components. Additionally, we present the first quantitative comparisons between optimization fabrics and model predictive control and show that optimization fabrics can generate similar trajectories with better scalability, and thus, much higher replanning frequency (up to 500 Hz with a 7 degrees of freedom robotic arm). Finally, we present empirical results on several robots, including a non-holonomic mobile manipulator with 10 degrees of freedom and avoidance of a moving human, supporting the theoretical findings.
|
|
MoAT10 Regular session, 250ABC |
Add to My Program |
Learning for Manipulation I |
|
|
Chair: Lou, Xibai | University of Minnesota Twin Cities |
Co-Chair: Garcia, Ricardo | Inria |
|
08:30-08:36, Paper MoAT10.1 | Add to My Program |
Foldsformer: Learning Sequential Multi-Step Cloth Manipulation with Space-Time Attention |
|
Mo, Kai | Tsinghua University, Shenzhen International Graduate School |
Xia, Chongkun | Tsinghua University |
Wang, Xueqian | Center for Artificial Intelligence and Robotics, Graduate School |
Deng, Yuhong | Tsinghua Univerisity |
Gao, Xue-Hai | Tsinghua University |
Liang, Bin | Tsinghua University |
Keywords: Deep Learning in Grasping and Manipulation, Perception-Action Coupling
Abstract: Sequential multi-step cloth manipulation is a challenging problem in robotic manipulation, requiring a robot to perceive the cloth state and plan a sequence of chained actions leading to the desired state. Most previous works address this problem in a goal-conditioned way, and goal observation must be given for each specific task and cloth configuration, which is not practical and efficient. Thus, we present a novel multi-step cloth manipulation planning framework named Foldformer. Foldformer can complete similar tasks with only a general demonstration and utilize a space-time attention mechanism to capture the instruction information behind this demonstration. We experimentally evaluate Foldsformer on four representative sequential multi-step manipulation tasks and show that Foldsformer significantly outperforms state-of-the-art approaches in simulation. Foldformer can complete multi-step cloth manipulation tasks even when configurations of the cloth (e.g., size and pose) vary from configurations in the general demonstrations. Furthermore, our approach can be transferred from simulation to the real world without additional training or domain randomization. Despite training on rectangular clothes, we also show that our approach can generalize to unseen cloth shapes (T-shirts and shorts). Videos are available at https://sites.google.com/view/foldsformer.
|
|
08:36-08:42, Paper MoAT10.2 | Add to My Program |
GraNet: A Multi-Level Graph Network for 6-DoF Grasp Pose Generation in Cluttered Scenes |
|
Wang, Haowen | Shanghai Jiao Tong University |
Niu, Wanhao | Shanghai Jiao Tong University |
Zhuang, Chungang | Shanghai Jiao Tong University |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Computer Vision for Automation
Abstract: 6-DoF object-agnostic grasping in unstructured environments is a critical yet challenging task in robotics. Most current works use non-optimized approaches to sample grasp locations and learn spatial features without concerning the grasping task. This paper proposes GraNet, a graph-based grasp pose generation framework that translates a point cloud scene into multi-level graphs and propagates features through graph neural networks. By building graphs at the scene level, object level, and grasp point level, GraNet enhances feature embedding at multiple scales while progressively converging to the ideal grasping locations by learning. Our pipeline can thus characterize the spatial distribution of grasps in cluttered scenes, leading to a higher rate of effective grasping. Furthermore, we enhance the representation ability of scalable graph networks by a structure-aware attention mechanism to exploit local relations in graphs. Our method achieves state-of-the-art performance on the large-scale GraspNet-1Billion benchmark, especially in grasping unseen objects (+11.62 AP). The real robot experiment shows a high success rate in grasping scattered objects, verifying the effectiveness of the proposed approach in unstructured environments.
|
|
08:42-08:48, Paper MoAT10.3 | Add to My Program |
Modular Neural Network Policies for Learning In-Flight Object Catching with a Robot Hand-Arm System |
|
Hu, Wenbin | University of Edinburgh |
Acero, Fernando | University of Edinburgh |
Triantafyllidis, Eleftherios | The University of Edinburgh |
Liu, Zhaocheng | The University of Edinburgh |
Li, Zhibin (Alex) | University College London |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning, Perception-Action Coupling
Abstract: We present a modular framework designed to enable a robot hand-arm system to learn how to catch flying objects, a task that requires fast, reactive, and accurately-timed robot motions. Our framework consists of five core modules: (i) an object state estimator that learns object trajectory prediction, (ii) a catching pose quality network that learns to score and rank object poses for catching, (iii) a reaching control policy trained to move the robot hand to pre-catch poses, (iv) a grasping control policy trained to perform soft catching motions for safe and robust grasping, and (v) a gating network trained to synthesize the actions given by the reaching and grasping policy. The former two modules are trained via supervised learning and the latter three use deep reinforcement learning in a simulated environment. We conduct extensive evaluations of our framework in simulation for each module and the integrated system, to demonstrate high success rates of in-flight catching and robustness to perturbations and sensory noise. Whilst only simple cylindrical and spherical objects are used for training, the integrated system shows successful generalization to a variety of household objects that are not used in training.
|
|
08:48-08:54, Paper MoAT10.4 | Add to My Program |
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation |
|
Kim, Junghyun | Seoul National University |
Kang, Gi-Cheon | Seoul National University |
Kim, Jaein | Seoul National University |
Shin, Suyeon | Seoul National University |
Zhang, Byoung-Tak | Seoul National University |
Keywords: Multi-Modal Perception for HRI, Deep Learning Methods, Autonomous Agents
Abstract: Language-Guided Robotic Manipulation (LGRM) is a challenging task as it requires a robot to understand human instructions to manipulate everyday objects. Recent approaches in LGRM rely on pre-trained Visual Grounding (VG) models to detect objects without adapting to manipulation environments. This results in a performance drop due to a substantial domain gap between the pre-training and real-world data. A straightforward solution is to collect additional training data, but the cost of human-annotation is extortionate. In this paper, we propose Grounding Vision to Ceaselessly Created Instructions (GVCCI), a lifelong learning framework for LGRM, which continuously learns VG without human supervision. GVCCI iteratively generates synthetic instruction via object detection and trains the VG model with the generated data. We validate our framework in offline and online settings across diverse environments on different VG models. Experimental results show that accumulating synthetic data from GVCCI leads to a steady improvement in VG by up to 56.7% and improves resultant LGRM by up to 29.4%. Furthermore, the qualitative analysis shows that the unadapted VG model often fails to find correct objects due to a strong bias learned from the pre-training data. Finally, we introduce a novel VG dataset for LGRM, consisting of nearly 252k triplets of image-object-instruction from diverse manipulation environments.
|
|
08:54-09:00, Paper MoAT10.5 | Add to My Program |
Bag All You Need: Learning a Generalizable Bagging Strategy for Heterogeneous Objects |
|
Bahety, Arpit | Columbia University |
Jain, Shreeya | Columbia University |
Ha, Huy | Columbia University |
Hager, Nathalie | Columbia University |
Burchfiel, Benjamin | Toyota Research Institute |
Cousineau, Eric | Toyota Research Institute |
Feng, Siyuan | Toyota Research Institute |
Song, Shuran | Columbia University |
Keywords: Deep Learning in Grasping and Manipulation, Manipulation Planning, Service Robotics
Abstract: We introduce a practical robotics solution for the task of heterogeneous bagging, requiring the placement of multiple rigid and deformable objects into a deformable bag. This is a difficult task as it features complex interactions between multiple highly deformable objects under limited observability. To tackle these challenges, we propose a robotic system consisting of two learned policies: a rearrangement policy that learns to place multiple rigid objects and fold deformable objects in order to achieve desirable pre-bagging conditions, and a lifting policy to infer suitable grasp points for bi-manual bag lifting. We evaluate these learned policies on a real-world three-arm robot platform that achieves a 70% heterogeneous bagging success rate with novel objects. To facilitate future research and comparison, we also develop a novel heterogeneous bagging simulation benchmark that will be made publicly available.
|
|
09:00-09:06, Paper MoAT10.6 | Add to My Program |
Multi-Source Fusion for Voxel-Based 7-DoF Grasping Pose Estimation |
|
Qiu, Junning | Xi'an Jiaotong University |
Wang, Fei | Xi'an Jiaotong University |
Dang, Zheng | EPFL |
Keywords: Deep Learning in Grasping and Manipulation, Visual Learning, Deep Learning Methods
Abstract: In this work, we tackle the problem of 7-DoF grasping pose estimation(6-DoF with the opening width of parallel-jaw gripper) from point cloud data, which is a fundamental task in robotic manipulation. Most existing methods adopt 3D voxel CNNs as the backbone for their efficiency in handling unordered point cloud data. However, we found that these approaches overlook detailed information of the point clouds, resulting in decreased performance. Through our analysis, we identified quantization loss and boundary information loss within 3D convolutional layers as the primary causes of this issue. To address these challenges, we introduced two novel branches: one adds an extra positional encoding operation to preserve details and unique features for each point, and the other uses a 2D CNN to operate on the rangebased image, which better aggregates boundary information on a continuous 2D domain. To integrate these branches with the original branch, we introduced a novel multi-source fusion gated mechanism to aggregate features. Our approach achieved state-of-the-art performance on the Graspnet-1Billion benchmark and demonstrated high success rates in real robotic experiments across different scenes. Our work has the potential to improve the performance of robotic grasping systems and contribute to the field of robotics.
|
|
09:06-09:12, Paper MoAT10.7 | Add to My Program |
VL-Grasp: A 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes |
|
Lu, Yuhao | Tsinghua University |
Fan, Yixuan | Tsinghua University |
Deng, Beixing | Tsinghua University |
Liu, Fangfu | Tsinghua University |
Li, Yali | Tsinghua University |
Wang, Shengjin | Tsinghua University |
Keywords: Deep Learning in Grasping and Manipulation, Multi-Modal Perception for HRI, Data Sets for Robotic Vision
Abstract: Robotic grasping faces new challenges in human-robot-interaction scenarios. We consider the task that the robot grasps a target object designated by human's language directives. The robot not only needs to locate a target based on vision-and-language information, but also needs to predict the reasonable grasp pose candidate at various views and postures. In this work, we propose a novel interactive grasp policy, named Visual-Lingual-Grasp (VL-Grasp), to grasp the target specified by human language. First, we build a new challenging visual grounding dataset to provide functional training data for robotic interactive perception in indoor environments. Second, we propose a 6-Dof interactive grasp policy combined with visual grounding and 6-Dof grasp pose detection to extend the universality of interactive grasping. Third, we design a grasp pose filter module to enhance the performance of the policy. Experiments demonstrate the effectiveness and extendibility of the VL-Grasp in real world. The VL-Grasp achieves a success rate of 72.5% in different indoor scenes. The code and dataset is available at https://github.com/luyh20/VL-Grasp.
|
|
09:12-09:18, Paper MoAT10.8 | Add to My Program |
QDP: Learning to Sequentially Optimise Quasi-Static and Dynamic Manipulation Primitives for Robotic Cloth Manipulation |
|
Blanco-Mulero, David | Aalto University |
Alcan, Gokhan | Aalto University |
Abu-Dakka, Fares | Technische Universität München |
Kyrki, Ville | Aalto University |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning, Manipulation Planning
Abstract: Pre-defined manipulation primitives are widely used for cloth manipulation. However, cloth properties such as its stiffness or density can highly impact the performance of these primitives. Although existing solutions have tackled the parameterisation of pick and place locations, the effect of factors such as the velocity or trajectory of quasi-static and dynamic manipulation primitives has been neglected. Choosing appropriate values for these parameters is crucial to cope with the range of materials present in house-hold cloth objects. To address this challenge, we introduce the Quasi-Dynamic Parameterisable (QDP) method, which optimises parameters such as the motion velocity in addition to the pick and place positions of quasi-static and dynamic manipulation primitives. In this work, we leverage the framework of Sequential Reinforcement Learning to decouple sequentially the parameters that compose the primitives. To evaluate the effectiveness of the method, we focus on the task of cloth unfolding with a robotic arm in simulation and real-world experiments. Our results in simulation show that by deciding the optimal parameters for the primitives the performance can improve by 20% compared to sub-optimal ones. Real-world results demonstrate the advantage of modifying the velocity and height of manipulation primitives for cloths with different mass, stiffness, shape, and size. Supplementary material, videos, and code, can be found at https://sites.google.com/view/qdp-srl.
|
|
09:18-09:24, Paper MoAT10.9 | Add to My Program |
Robust Visual Sim-To-Real Transfer for Robotic Manipulation |
|
Garcia, Ricardo | Inria |
Strudel, Robin | INRIA Paris |
Chen, Shizhe | Inria |
Arlaud, Etienne | INRIA |
Laptev, Ivan | INRIA |
Schmid, Cordelia | Inria |
Keywords: Deep Learning in Grasping and Manipulation, Learning from Demonstration, Transfer Learning
Abstract: Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR). While previous work mainly evaluates DR for disembodied tasks, such as pose estimation and object detection, here we systematically explore visual domain randomization methods and benchmark them on a rich set of challenging robotic manipulation tasks. In particular, we propose an off-line proxy task of cube localization to select DR parameters for texture randomization, lighting randomization, variations of object colors and camera parameters. Notably, we demonstrate that DR parameters have similar impact on our off-line proxy task and on-line policies. We, hence, use off-line optimized DR parameters to train visuomotor policies in simulation and directly apply such policies to a real robot. Our approach achieves 93% success rate on average when tested on a diverse set of challenging manipulation tasks. Moreover, we evaluate the robustness of policies to visual variations in real scenes and show that our simulator-trained policies outperform policies learned using real but limited data. Code, simulation environment, real robot datasets and trained models are available at https://www.di.ens.fr/willow/research/robust_s2r/.
|
|
09:24-09:30, Paper MoAT10.10 | Add to My Program |
Multi-Dimensional Deformable Object Manipulation Using Equivariant Models |
|
Fu, Tianyu | East China University of Science and Technology |
Tang, Yang | East China University of Science and Technology |
Wu, Tianyu | East China University of Science and Technology |
Xia, Xiaowu | East China University of Science and Technology |
Wang, Jianrui | East China University of Science and Technology |
Zhao, Chaoqiang | East China University of Science and Technology |
Keywords: Deep Learning in Grasping and Manipulation, Learning from Demonstration, Imitation Learning
Abstract: Manipulating deformable objects, such as ropes (1D), fabrics (2D), and bags (3D), is a very challenging problem in robotic research since the deformable object has a high degree of freedom in the physical state and nonlinear dynamics. Compared with single-dimensional deformable objects, multi-dimensional object manipulation suffers from the difficulty in recognizing the characteristics of the object correctly and making an accurate action decision on the deformable object of various dimensions.Some methods are proposed to use neural networks to rearrange deformable objects in all dimensions, but their approaches are not accurate in predicting the motion of the robot as they just consider the equivariance in the picking objects. To address this problem, we present a novel Transporter Network encoded and decoded with equivariance to generalize to different picking and placing positions. Additionally, we propose an equivariant goal-conditioned model to enable the robot to manipulate deformable objects into flexible configurations without artificially marked visual anchors for the target position. Finally, experiments in Deformable-Ravens and the real world demonstrate that our equivariant models are more sample efficient than the traditional Transporter Network. The video is available at https://youtu.be/SH4aV2f0wt0.
|
|
09:30-09:36, Paper MoAT10.11 | Add to My Program |
Adversarial Object Rearrangement in Constrained Environments with Heterogeneous Graph Neural Networks |
|
Lou, Xibai | University of Minnesota Twin Cities |
Yu, Houjian | University of Minnesota, Twin Cities |
Worobel, Ross | University of Minnesota |
Yang, Yang | University of Minnesota |
Choi, Changhyun | University of Minnesota, Twin Cities |
Keywords: Deep Learning in Grasping and Manipulation, Deep Learning for Visual Perception, Task and Motion Planning
Abstract: Adversarial object rearrangement in the real world (e.g., previously unseen or oversized items in kitchens and stores) could benefit from understanding task scenes, which inherently entail heterogeneous components such as current objects, goal objects, and environmental constraints. The semantic relationships among these components are distinct from each other and crucial for multi-skilled robots to perform efficiently in everyday scenarios. We propose a hierarchical robotic manipulation system that learns the underlying relationships and maximizes the collaborative power of its diverse skills (e.g., pick-place, push) for rearranging adversarial objects in constrained environments. The high-level coordinator employs a heterogeneous graph neural network (HetGNN), which reasons about the current objects, goal objects, and environmental constraints; the low-level 3D Convolutional Neural Network-based actors execute the action primitives. Our approach is trained entirely in simulation, and achieved an average success rate of 87.88% and a planning cost of 12.82 in real-world experiments, surpassing all baseline methods. Supplementary material is available at https://sites.google.com/umn.edu/versatile-rearrangement.
|
|
09:36-09:42, Paper MoAT10.12 | Add to My Program |
Probabilistic Slide-Support Manipulation Planning in Clutter |
|
Shusei, Nagato | Osaka University |
Motoda, Tomohiro | National Institute of Advanced Industrial Science and Technology |
Nishi, Takao | Osaka University |
Petit, Damien | Osaka University |
Kiyokawa, Takuya | Osaka University |
Wan, Weiwei | Osaka University |
Harada, Kensuke | Osaka University |
Keywords: Deep Learning in Grasping and Manipulation, Bimanual Manipulation, Manipulation Planning
Abstract: To safely and efficiently extract an object from the clutter, this paper presents a bimanual manipulation planner in which one hand of the robot is used to slide the target object out of the clutter while the other hand is used to support the surrounding objects to prevent the clutter from collapsing. Our method uses a neural network to predict the physical phenomena of the clutter when the target object is moved. We generate the most efficient action based on the Monte Carlo tree search.The grasping and sliding actions are planned to minimize the number of motion sequences to pick the target object. In addition, the object to be supported is determined to minimize the position change of surrounding objects. Experiments with a real bimanual robot confirmed that the robot could retrieve the target object, reducing the total number of motion sequences and improving safety.
|
|
09:42-09:48, Paper MoAT10.13 | Add to My Program |
GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning |
|
Niu, Yaru | Carnegie Mellon University |
Jin, Shiyu | Baidu |
Zhang, Zeqing | The University of Hong Kong |
Zhu, Jiacheng | Carnegie Mellon University |
Zhao, Ding | Carnegie Mellon University |
Zhang, Liangjun | Baidu |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning
Abstract: In this work, we first formulate the problem of robotic water scooping using goal-conditioned reinforcement learning. This task is particularly challenging due to the complex dynamics of fluid and the need to achieve multi-modal goals. The policy is required to successfully reach both position goals and water amount goals, which leads to a large convoluted goal state space. To overcome these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum throughout the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently adapt to noisy real-robot water-scooping scenarios with diverse physical configurations and unseen settings, demonstrating superior efficacy and generalizability. The videos of this work are available on our project page: https://sites.google.com/view/goatscooping.
|
|
MoAT11 Regular session, 251ABC |
Add to My Program |
Aerial Systems - Applications I |
|
|
Chair: Min, Byung-Cheol | Purdue University |
Co-Chair: Lee, Jongseok | German Aerospace Center |
|
08:30-08:36, Paper MoAT11.1 | Add to My Program |
Auto Filmer: Autonomous Aerial Videography under Human Interaction |
|
Zhang, Zhiwei | Zhejiang University |
Zhong, Yuhang | NanKai Unviersity |
Guo, Junlong | Zhejiang University |
Wang, Qianhao | Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Fei | Zhejiang University |
Keywords: Aerial Systems: Applications, Human-Aware Motion Planning, Aerial Systems: Perception and Autonomy
Abstract: The advance of unmanned aerial vehicles (UAVs) has enabled customers and directors to film from the air. However, operating the drone to produce desired videos upon a moving object is hard to achieve. This letter proposes an autonomous aerial videography system that integrates customized shots and drone dynamics. We design a user-friendly interface for the operator to create the desired shot in real-time. The shot information is then transmitted to the kinodynamic path search process, in which a safe shooting path will be evaluated. Later, feasible regions and safe flight corridors are constructed for safety and visibility. Finally, a joint optimization is carried out to generate the trajectory of the quadrotor and the gimbal to maintain the required image composition. Extensive simulation and real-world experiments validate the effectiveness of our method.
|
|
08:36-08:42, Paper MoAT11.2 | Add to My Program |
New Era in Cultural Heritage Preservation: Cooperative Aerial Autonomy for Fast Digitalization of Difficult-To-Access Interiors of Historical Monuments (I) |
|
Petráček, Pavel | Czech Technical University in Prague |
Krátký, Vít | Czech Technical University in Prague |
Baca, Tomas | Ceske Vysoke Uceni Technicke V Praze, FEL |
Petrlik, Matej | Czech Technical University in Prague, Faculty of Electrical Engi |
Saska, Martin | Czech Technical University in Prague |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Multi-Robot Systems
Abstract: Digital documentation of large interiors of historical buildings is an exhausting task since most of the areas of interest are beyond typical human reach. We advocate the use of autonomous teams of multi-rotor Unmanned Aerial Vehicles (UAVs) to speed up the documentation process by several orders of magnitude while allowing for a repeatable, accurate, and condition-independent solution capable of precise collision-free operation at great heights. The proposed multi-robot approach allows for performing tasks requiring dynamic scene illumination in large-scale real-world scenarios, a process previously applicable only in small-scale laboratory-like conditions. Extensive experimental analyses range from single-UAV imaging to specialized lighting techniques requiring accurate coordination of multiple UAVs. The system’s robustness is demonstrated in more than two hundred autonomous flights in fifteen historical monuments requiring superior safety while lacking access to external localization. This unique experimental campaign, cooperated with restorers and conservators, brought numerous lessons transferable to other safety-critical robotic missions in documentation and inspection tasks.
|
|
08:42-08:48, Paper MoAT11.3 | Add to My Program |
Tight Collision Probability for UAV Motion Planning in Uncertain Environment |
|
Liu, Tianyu | The University of Hong Kong |
Zhang, Fu | University of Hong Kong |
Gao, Fei | Zhejiang University |
Pan, Jia | University of Hong Kong |
Keywords: Aerial Systems: Applications, Motion and Path Planning, Collision Avoidance
Abstract: Operating unmanned aerial vehicles (UAVs) in complex environments that feature dynamic obstacles and external disturbances poses significant challenges, primarily due to the inherent uncertainty in such scenarios. Additionally, inaccurate robot localization and modeling errors further exacerbate these challenges. Recent research on UAV motion planning in static environments has been unable to cope with the rapidly changing surroundings, resulting in trajectories that may not be feasible. Moreover, previous approaches that have addressed dynamic obstacles or external disturbances in isolation are insufficient to handle the complexities of such environments. This paper proposes a reliable motion planning framework for UAVs, integrating various uncertainties into a chance constraint that characterizes the uncertainty in a probabilistic manner. The chance constraint provides a probabilistic safety certificate by calculating the collision probability between the robot's Gaussian-distributed forward reachable set and states of obstacles. To reduce the conservatism of the planned trajectory, we propose a tight upper bound of the collision probability and evaluate it both exactly and approximately. The approximated solution is used to generate motion primitives as a reference trajectory, while the exact solution is leveraged to iteratively optimize the trajectory for better results. Our method is thoroughly tested in simulation and real-world experiments, verifying its reliability and effectiveness in uncertain environments.
|
|
08:48-08:54, Paper MoAT11.4 | Add to My Program |
Dodging Like a Bird: An Inverted Dive Maneuver Taking by Lifting-Wing Multicopters |
|
Gao, Wenhan | Beihang University |
Wang, Shuai | Beihang University |
Quan, Quan | Beihang University |
Keywords: Aerial Systems: Applications, Motion and Path Planning
Abstract: It is crucial for hybrid unmanned aerial vehicles, such as lifting-wing multicopters, to plan a continuous, smooth, and collision-free trajectory to avoid obstacles. Unlike quadcopters, which typically work in indoor environments, lifting-wing multicopters typically fly at a high altitude with a high cruising speed, requiring higher maneuverability in the vertical direction. Inspired by birds, lifting-wing multicopters can take an inverted flight maneuver to gain more maneuverability than the corresponding multicopter owing to the additional lifting wing. In this paper, a rotation-aware collision-free motion planning strategy is proposed that takes aerodynamics into consideration and allows lifting-wing multicopters to fly at large rotation angles, even in inverted postures. Specifically, a collision-free state sequence is found using rotation-aware primitives by solving a graph search problem. The sequence is then refined with B-spline into smooth trajectories to be tracked by the differential flatness-based controller for lifting-wing multicopters. We analyze the proposed motion planning algorithm in different scenarios and demonstrate the feasibility of the generated trajectories in simulation and real-world experiments.
|
|
08:54-09:00, Paper MoAT11.5 | Add to My Program |
Model-Based Planning and Control for Terrestrial-Aerial Bimodal Vehicles with Passive Wheels |
|
Zhang, Ruibin | Zhejiang University |
Lin, Junxiao | Zhejiang University |
Wu, Yuze | Zhejiang University |
Gao, Yuman | Zhejiang University |
Wang, Chi | Zhejiang University |
Xu, Chao | Zhejiang University |
Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Gao, Fei | Zhejiang University |
Keywords: Aerial Systems: Applications, Motion and Path Planning, Motion Control
Abstract: Terrestrial and aerial bimodal vehicles have gained widespread attention due to their cross-domain maneuverability. Nevertheless, their bimodal dynamics significantly increase the complexity of motion planning and control, thus hindering robust and efficient autonomous navigation in unknown environments. To resolve this issue, we develop a model-based planning and control framework for terrestrial aerial bimodal vehicles. This work begins by deriving a unified dynamic model and the corresponding differential flatness. Leveraging differential flatness, an optimization-based trajectory planner is proposed, which takes into account both solution quality and computational efficiency. Moreover, we design a tracking controller using nonlinear model predictive control based on the proposed unified dynamic model to achieve accurate trajectory tracking and smooth mode transition. We validate our framework through extensive benchmark comparisons and experiments, demonstrating its effectiveness in terms of planning quality and control performance.
|
|
09:00-09:06, Paper MoAT11.6 | Add to My Program |
Polynomial-Based Online Planning for Autonomous Drone Racing in Dynamic Environments |
|
Wang, Qianhao | Zhejiang University |
Wang, Dong | Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Alan | Fan'gang |
Gao, Fei | Zhejiang University |
Keywords: Aerial Systems: Applications, Motion and Path Planning, Task and Motion Planning
Abstract: In recent years, there is a noteworthy advancement in autonomous drone racing. However, the primary focus is on attaining execution times, while scant attention is given to the challenges of dynamic environments. The high-speed nature of racing scenarios, coupled with the potential for unforeseeable environmental alterations, present stringent requirements for online replanning and its timeliness. For racing in dynamic environments, we propose an online replanning framework with an efficient polynomial trajectory representation. We trade off between aggressive speed and flexible obstacle avoidance based on an optimization approach. Additionally, to ensure safety and precision when crossing intermediate racing waypoints, we formulate the demand as hard constraints during planning. For dynamic obstacles, parallel multi-topology trajectory planning is designed based on engineering considerations to prevent racing time loss due to local optimums. The framework is integrated into a quadrotor system and successfully demonstrated at the DJI Robomaster Intelligent UAV Championship, where it successfully complete the racing track and placed first, finishing in less than half the time of the second-place.
|
|
09:06-09:12, Paper MoAT11.7 | Add to My Program |
Autonomous Power Line Inspection with Drones Via Perception-Aware MPC |
|
Xing, Jiaxu | ETH Zurich |
Cioffi, Giovanni | University of Zurich |
Hidalgo Carrio, Javier | University of Zurich and ETH Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy
Abstract: Drones have the potential to revolutionize power line inspection by increasing productivity, reducing inspection time, improving data quality, and eliminating the risks for human operators. Current state-of-the-art systems for power line inspection have two shortcomings: (i) control is decoupled from perception and needs accurate information about the location of the power lines and masts; (ii) obstacle avoidance is decoupled from the power line tracking, which results in poor tracking in the vicinity of the power masts, and, consequently, in decreased data quality for visual inspection. In this work, we propose a model predictive controller (MPC) that overcomes these limitations by tightly coupling perception and action. Our controller generates commands that maximize the visibility of the power lines while, at the same time, safely avoiding the power masts. For power line detection, we propose a lightweight learning-based detector that is trained only on synthetic data and is able to transfer zero-shot to real-world power line images. We validate our system in simulation and real-world experiments on a mock-up power line infrastructure. We release our code and datasets to the public.
|
|
09:12-09:18, Paper MoAT11.8 | Add to My Program |
A Perching and Tilting Aerial Robot for Precise and Versatile Power Tool Work on Vertical Walls |
|
Dautzenberg, Roman | ETH Zürich |
Küster, Timo | ETH Zürich |
Mathis, Timon | ETH Zürich |
Roth, Yann | ETH Zürich |
Steinauer, Curdin | ETH Zürich |
Käppeli, Gabriel | ETH Zürich |
Santen, Julian | ETH Zürich |
Arranhado, Alina | ETH Zürich |
Biffar, Friederike | ETH Zürich |
Kötter, Till | ETH Zürich |
Lanegger, Christian | ETH Zurich |
Allenspach, Mike | ETH Zürich |
Siegwart, Roland | ETH Zurich |
Bähnemann, Rik | ETH Zürich |
Keywords: Aerial Systems: Applications, Robotics and Automation in Construction, Actuation and Joint Mechanisms
Abstract: Drilling, grinding, and setting anchors on vertical walls are fundamental processes in everyday construction work. Manually doing these works is error-prone, potentially dangerous, and elaborate at height. Today, heavy mobile ground robots can perform automatic power tool work. However, aerial vehicles could be deployed in untraversable environments and reach inaccessible places. Existing drone designs do not provide the large forces, payload, and high precision required for using power tools. This work presents the first aerial robot design to perform versatile manipulation tasks on vertical concrete walls with continuous forces of up to 150 N. The platform combines a quadrotor with active suction cups for perching on walls and a lightweight, tiltable linear tool table. This combination minimizes weight using the propulsion system for flying, surface alignment, and feed during manipulation and allows precise positioning of the power tool. We evaluate our design in a concrete drilling application - a challenging construction process that requires high forces, accuracy, and precision. In 30 trials, our design can accurately pinpoint a target position despite perching imprecision. Nine visually guided drilling experiments demonstrate a drilling precision of 6 mm without further automation. Aside from drilling, we also demonstrate the versatility of the design by setting an anchor into concrete.
|
|
09:18-09:24, Paper MoAT11.9 | Add to My Program |
Resource-Constrained Station-Keeping for Latex Balloons Using Reinforcement Learning |
|
Saunders, Jack | University of Bath |
Prenevost, Loïc | Lux Aerobot |
Şimşek, Özgür | University of Bath |
Hunter, Alan Joseph | University of Bath |
Li, Wenbin | University of Bath |
Keywords: Aerial Systems: Applications, Machine Learning for Robot Control, Reinforcement Learning
Abstract: High altitude balloons have proved useful for ecological aerial surveys, atmospheric monitoring, and communication relays. However, due to weight and power constraints, there is a need to investigate alternate modes of propulsion to navigate in the stratosphere. Very recently, reinforcement learning has been proposed as a control scheme to maintain balloons in the region of a fixed location, facilitated through diverse opposing wind-fields at different altitudes. Although air-pump based station keeping has been explored, there is no research on the control problem for venting and ballasting actuated balloons, which is commonly used as a low-cost alternative. We show how reinforcement learning can be used for this type of balloon. Specifically, we use the soft actor-critic algorithm, which on average is able to station-keep within 50 km for on average 25% of the flight, consistent with state-of-the-art. Furthermore, we show that the proposed controller effectively minimises the consumption of resources, thereby supporting long duration flights. We frame the controller as a continuous control reinforcement learning problem, which allows for a more diverse range of trajectories, as opposed to current state-of-the-art work, which uses discrete action spaces. Furthermore, through continuous control, we can make use of larger ascent rates which are not possible using air-pumps. The desired ascent-rate is decoupled into desired altitude and time-factor to provide a more transparent policy, compared to low-level control commands used in previous works. Finally, by applying the equations of motion, we establish appropriate thresholds for venting and ballasting to prevent the agent from exploiting the environment. More specifically, we ensure actions are physically feasible by enforcing constraints on venting and ballasting.
|
|
09:24-09:30, Paper MoAT11.10 | Add to My Program |
A Light-Weight, Low-Cost, and Sustainable Planning System for UAVs Using a Local Map Origin Update Approach |
|
Lee, Dasol | Agency for Defense Development |
La, Jinche | Agency for Defense Development |
Joo, Sanghyun | Agency for Defense Development |
Keywords: Aerial Systems: Applications, Motion and Path Planning
Abstract: This paper proposes a sustainable planning system for small-sized unmanned aerial vehicles (UAVs). Our mapping module of the system uses a voxel array as data structure with an introduced feature which is local map origin update. This approach has clear advantages that the planning system can sustainably plan trajectories regardless of operating radius and flight distance, and it shows fastest invariant time complexity O(1) unlike other representation methods. Also, we propose an efficient configuration space (C-space) construction algorithm using incremental voxel inflation, and extend state-of-the-art Euclidean signed distance field (ESDF) algorithm, FIESTA by applying the local map origin update feature. The proposed planning system requires single depth camera only as a sensor, and can operate in real-time on embedded computing platforms. We have verified the planning system through real-world flight tests in dense environments using a light-weight quadrotor platform under 300 mm size equipped with low-cost components only.
|
|
09:30-09:36, Paper MoAT11.11 | Add to My Program |
Bubble Explorer: Fast UAV Exploration in Large-Scale and Cluttered 3D-Environments Using Occlusion-Free Spheres |
|
Tang, Benxu | The University of Hong Kong |
Ren, Yunfan | The University of Hong Kong |
Zhu, Fangcheng | The University of Hong Kong |
He, Rui | The University of Hong Kong |
Liang, Siqi | Harbin Institute of Technology, Shenzhen |
Kong, Fanze | The University of Hong Kong |
Zhang, Fu | University of Hong Kong |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Motion and Path Planning
Abstract: Autonomous exploration is a crucial aspect of robotics that has numerous applications. Most of the existing methods greedily choose goals that maximize immediate reward. This strategy is computationally efficient but insufficient for overall exploration efficiency. In recent years, some state-of-the-art methods are proposed, which generate a global coverage path and significantly improve overall exploration efficiency. However, global optimization produces high computational overhead, leading to low-frequency planner updates and inconsistent planning motion. In this work, we propose a novel method to support fast UAV exploration in large-scale and cluttered 3-D environments. We introduce a computationally lowcost viewpoints generation method using occlusion-free spheres. Additionally, we combine greedy strategy with global optimization, which considers both computational and exploration efficiency. We benchmark our method against state-of-the-art methods to showcase its superiority in terms of exploration efficiency and computational time. We conduct various real-world experiments to demonstrate the excellent performance of our method in large-scale and cluttered environments.
|
|
09:36-09:42, Paper MoAT11.12 | Add to My Program |
UPPLIED: UAV Path Planning for Inspection through Demonstration |
|
Kannan, Shyam Sundar | Purdue University |
Venkatesh, L.N Vishnunandan | Purdue University |
Senthilkumaran, Revanth Krishna | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Aerial Systems: Applications
Abstract: In this paper, a new demonstration-based path-planning framework for the visual inspection of large structures using UAVs is proposed. We introduce UPPLIED: UAV Path PLanning for InspEction through Demonstration, which utilizes a demonstrated trajectory to generate a new trajectory to inspect other structures of the same kind. The demonstrated trajectory can inspect specific regions of the structure and the new trajectory generated by UPPLIED inspects similar regions in the other structure. The proposed method generates inspection points from the demonstrated trajectory and uses standardization to translate those inspection points to inspect the new structure. Finally, the position of these inspection points is optimized to refine their view. Numerous experiments were conducted with various structures and the proposed framework was able to generate inspection trajectories of various kinds for different structures based on the demonstration. The trajectories generated match with the demonstrated trajectory in geometry and at the same time inspect the regions inspected by the demonstration trajectory with minimum deviation. The experimental video of the work can be found at https://youtu.be/YqPx-cLkv04.
|
|
09:42-09:48, Paper MoAT11.13 | Add to My Program |
Learning Fluid Flow Visualizations from In-Flight Images with Tufts |
|
Lee, Jongseok | German Aerospace Center |
Olsman, Jurrien | German Aerospace Center (DLR) |
Triebel, Rudolph | German Aerospace Center (DLR) |
Keywords: Aerial Systems: Applications, Computer Vision for Automation, Object Detection, Segmentation and Categorization
Abstract: For better understanding of fluid flows around aerial systems, strips of wire or rope, widely known as tufts, are often used to visualize the local flow direction. This paper presents a computer vision system that automatically extracts the shape of tufts from images, which have been collected during real flights of a helicopter and an unmanned aerial vehicle (UAV). As images from these aerial systems present challenges to both the model-based computer vision and the end-to-end supervised deep learning techniques, we propose a semantic segmentation pipeline that consists of three uncertainty-based modules namely, (a) active learning for object detection, (b) label propagation for object classification, and (c) weakly supervised instance segmentation. Overall, these probabilistic approaches facilitate the learning process without requiring any manual annotations of semantic segmentation masks. Empirically, we motivate our design choices through comparative assessments and provide real world demonstrations of the proposed concept, for the first time to our knowledge. The project website found at https://sites.google.com/view/tuftrecognition.
|
|
09:48-09:54, Paper MoAT11.14 | Add to My Program |
Fully Autonomous Brick Pick-And-Place in Fields by Articulated Aerial Robot (I) |
|
Anzai, Tomoki | The University of Tokyo |
Zhao, Moju | The University of Tokyo |
Nishio, Takuzumi | The University of Tokyo |
Shi, Fan | ETH Zürich |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Aerial Systems: Applications, Field Robots, Grasping
Abstract: Picking and Placing objects by aerial robot in the fields is an important and challenging task, which can significantly benefit not only the industry but also the rescue. General strategy depends on the magnetic force to pick object, which however lacks both generality and robustness. Therefore, we focus on the articulated structure to grasp bricking. Another issue to perform pick-and-place task in the fields is the autonomous recognition using onboard sensors. In this article, we present the achievement of fully autonomous pick-and-place by articulated aerial robot in a fully-autonomous manner. First, an articulated robot model with actively tiltable sensor is developed to guarantee the robustness in both state estimation and object detection. Second object detection methods are designed according to distance between robot and target object. Third, a comprehensive motion strategy is also developed to perform autonomous object searching, picking, and placing sequence. Particularly, a visual servoing method for robot position control is also proposed in this motion strategy to improve the robustness while approaching target. Finally, we present the experimental results of autonomous
|
|
MoAT12 Regular session, 252AB |
Add to My Program |
Perception for Grasping and Manipulation I |
|
|
Chair: Ang Jr, Marcelo H | National University of Singapore |
Co-Chair: D'Avella, Salvatore | Scuola Superiore Sant'Anna |
|
08:30-08:36, Paper MoAT12.1 | Add to My Program |
I2c-Net: Using Instance-Level Neural Networks for Monocular Category-Level 6D Pose Estimation |
|
Remus, Alberto | Sant'Anna School of Advanced Studies |
D'Avella, Salvatore | Scuola Superiore Sant'Anna |
Di Felice, Francesco | Mechanical Intelligence Institute, Sant'Anna School of Advanced |
Tripicchio, Paolo | Scuola Superiore Sant'Anna |
Avizzano, Carlo Alberto | Scuola Superiore Sant'Anna |
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception, RGB-D Perception
Abstract: Object detection and pose estimation are strict requirements for many robotic grasping and manipulation applications to endow robots with the ability to grasp objects with different properties in cluttered scenes and with various lighting conditions. This work proposes the framework i2c-net to extract the 6D pose of multiple objects belonging to different categories, starting from an instance-level pose estimation network and relying only on RGB images. The network is trained on a custom-made synthetic photo-realistic dataset, generated from some base CAD models, opportunely deformed and enriched with real textures for domain randomization purposes. At inference time, the instance-level network is employed in combination with a 3D mesh reconstruction module, achieving category-level capabilities. Depth information is used for postprocessing as correction. Tests conducted on real objects of the YCB-V and NOCS REAL datasets outline the high accuracy of the proposed approach.
|
|
08:36-08:42, Paper MoAT12.2 | Add to My Program |
Self-Supervised Instance Segmentation by Grasping |
|
Liu, YuXuan | Covariant.ai, UC Berkeley |
Chen, Xi | Embodied Intelligence, UC Berkeley |
Abbeel, Pieter | UC Berkeley |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Perception for Grasping and Manipulation
Abstract: Instance segmentation is a fundamental skill for many robotic applications. We propose a self-supervised method that uses grasp interactions to collect segmentation supervision for an instance segmentation model. When a robot grasps an item, the mask of that grasped item can be inferred from the images of the scene before and after the grasp. Leveraging this insight, we learn a grasp segmentation model from a small dataset of labelled images to segment the grasped object from before and after grasp images. Such a model can segment grasped objects from thousands of grasp interactions without costly human annotation. Using the segmented grasped objects, we can "cut" objects from their original scenes and "paste" them into new scenes to generate instance supervision. We show that our grasp segmentation model provides a 5x error reduction when segmenting grasped objects compared with traditional image subtraction approaches. Combined with our "cut-and-paste" generation method, instance segmentation models trained with our method achieve better performance than a model trained with 10x the amount of labelled data. On a real robotic grasping system, our instance segmentation model reduces the rate of grasp errors by over 3x compared to an image subtraction baseline.
|
|
08:42-08:48, Paper MoAT12.3 | Add to My Program |
Fusing Visual Appearance and Geometry for Multi-Modality 6DoF Object Tracking |
|
Stoiber, Manuel | German Aerospace Center (DLR) |
Elsayed, Mariam | Technical University Munich |
Reichert, Anne Elisabeth | German Aerospace Center |
Steidle, Florian | German Aerospace Center |
Lee, Dongheui | Technische Universität Wien (TU Wien) |
Triebel, Rudolph | German Aerospace Center (DLR) |
Keywords: Visual Tracking, Perception for Grasping and Manipulation, RGB-D Perception
Abstract: In many applications of advanced robotic manipulation, six degrees of freedom (6DoF) object pose estimates are continuously required. In this work, we develop a multi-modality tracker that fuses information from visual appearance and geometry to estimate object poses. The algorithm extends our previous method ICG, which uses geometry, to additionally consider surface appearance. In general, object surfaces contain local characteristics from text, graphics, and patterns, as well as global differences from distinct materials and colors. To incorporate this visual information, two modalities are developed. For local characteristics, keypoint features are used to minimize distances between points from keyframes and the current image. For global differences, a novel region approach is developed that considers multiple regions on the object surface. In addition, it allows the modeling of external geometries. Experiments on the YCB-Video and OPT datasets demonstrate that our approach ICG+ performs best on both datasets, outperforming both conventional and deep learning-based methods. At the same time, the algorithm is highly efficient and runs at more than 300 Hz. The source code of our tracker is publicly available.
|
|
08:48-08:54, Paper MoAT12.4 | Add to My Program |
Viewpoint Push Planning for Mapping of Unknown Confined Spaces |
|
Dengler, Nils | University of Bonn |
Pan, Sicong | University of Bonn |
Kalagaturu, Vamsi Krishna | Hochschule Bonn-Rhein-Sieg |
Menon, Rohit | University of Bonn |
Elnagdi, Murad | University of Bonn |
Bennewitz, Maren | University of Bonn |
Keywords: Perception for Grasping and Manipulation
Abstract: Viewpoint planning is an important task in any application where objects or scenes need to be viewed from different angles to achieve sufficient coverage. The mapping of confined spaces such as shelves is an especially challenging task since objects occlude each other and the scene can only be observed from the front, posing limitations on the possible viewpoints. In this paper, we propose a deep reinforcement learning framework that generates promising views aiming at reducing the map entropy. Additionally, the pipeline extends standard viewpoint planning by predicting adequate minimally invasive push actions to uncover occluded objects and increase the visible space. Using a 2.5D occupancy height map as state representation that can be efficiently updated, our system decides whether to plan a new viewpoint or perform a push. To learn feasible pushes, we use a neural network to sample push candidates on the map based on training data provided by human experts. As simulated and real-world experimental results with a robotic arm show, our system is able to significantly increase the mapped space compared to different baselines, while the executed push actions highly benefit the viewpoint planner with only minor changes to the object configuration.
|
|
08:54-09:00, Paper MoAT12.5 | Add to My Program |
Depth-Based 6DoF Object Pose Estimation Using Swin Transformer |
|
Li, Zhujun | The City University of New York |
Stamos, Ioannis | City University of New York |
Keywords: Perception for Grasping and Manipulation, Deep Learning Methods, Object Detection, Segmentation and Categorization
Abstract: Accurately estimating the 6D pose of objects is crucial for many applications, such as robotic grasping, autonomous driving, and augmented reality. However, this task becomes more challenging in poor lighting conditions or when dealing with textureless objects. To address this issue, depth images are becoming an increasingly popular choice due to their invariance to a scene's appearance and the implicit incorporation of essential geometric characteristics. However, fully leveraging depth information to improve the performance of pose estimation remains a difficult and under-investigated problem. To tackle this challenge, we propose a novel framework called SwinDePose, that uses only geometric information from depth images to achieve accurate 6D pose estimation. SwinDePose first calculates the angles between each normal vector defined in a depth image and the three coordinate axes in the camera coordinate system. The resulting angles are then formed into an image, which is encoded using Swin Transformer. Additionally, we apply RandLA-Net to learn the representations from point clouds. The resulting image and point clouds embeddings are concatenated and fed into a semantic segmentation module and a 3D keypoints localization module. Finally, we estimate 6D poses using a least-square fitting approach based on the target object's predicted semantic mask and 3D keypoints. In experiments on the LineMod and Occlusion LineMod, SwinDePose outperforms existing state-of-the-art methods for 6D object pose estimation using depth images. We also provide competitive results on the YCB-Video dataset even without post-processing. This demonstrates the effectiveness of our approach and highlights its potential for improving performance in real-world scenarios. Our code is at https://github.com/zhujunli1993/SwinDePose.
|
|
09:00-09:06, Paper MoAT12.6 | Add to My Program |
DR-Pose: A Two-Stage Deformation-And-Registration Pipeline for Category-Level 6D Object Pose Estimation |
|
Zhou, Lei | National University of Singapore |
Liu, Zhiyang | National University of Singapore |
Gan, Runze | National University of Singapore |
Wang, Haozhe | National University of Singapore |
Ang Jr, Marcelo H | National University of Singapore |
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: Category-level object pose estimation involves estimating the 6D pose and the 3D metric size of objects from predetermined categories. While recent approaches take categorical shape prior information as reference to improve pose estimation accuracy, the single-stage network design and training manner lead to sub-optimal performance since there are two distinct tasks in the pipeline. In this paper, the advantage of two- stage pipeline over single-stage design is discussed. To this end, we propose a two-stage deformation-and-registration pipeline called DR-Pose, which consists of completion-aided deformation stage and scaled registration stage. The first stage uses a point cloud completion method to generate unseen parts of target object, guiding subsequent deformation on the shape prior. In the second stage, a novel registration network is designed to extract pose-sensitive features and predict the representation of object partial point cloud in canonical space based on the deformation results from the first stage. DR-Pose produces superior results to the state-of-the-art shape prior-based methods on both CAMERA25 and REAL275 benchmarks. Codes are available at https://github.com/Zray26/DR-Pose.git.
|
|
09:06-09:12, Paper MoAT12.7 | Add to My Program |
Learning from Pixels with Expert Observations |
|
Hoang, Minh-Huy | University of Science, Ho Chi Minh City, Vietnam |
Dinh, Long | Hanoi University of Science & Technology |
Hai, Nguyen | Northeastern University |
Keywords: Reinforcement Learning, Learning from Demonstration, Deep Learning in Grasping and Manipulation
Abstract: In reinforcement learning (RL), sparse rewards can present a significant challenge. Fortunately, expert actions can be utilized to overcome this issue. However, acquiring explicit expert actions can be costly, and expert observations are often more readily available. This paper presents a new approach that uses expert observations for learning in robot manipulation tasks with sparse rewards from pixel observations. Specifically, our technique involves using expert observations as intermediate visual goals for a goal-conditioned RL agent, enabling it to complete a task by successively reaching a series of goals. We demonstrate the efficacy of our method in five challenging block construction tasks in simulation and show that when combined with two state-of-the-art agents, our approach can significantly improve their performance while requiring 4-20 times fewer expert actions during training. Moreover, our method is also superior to a hierarchical baseline.
|
|
09:12-09:18, Paper MoAT12.8 | Add to My Program |
RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control |
|
Xiang, Yanfei | Tsinghua University |
Wang, Xin | University at Buffalo |
Hu, Shu | Carnegie Mellon University |
Zhu, Bin Benjamin | Microsoft Research Asia |
Huang, Xiaomeng | Tsinghua University |
Wu, Xi | Chengdu University of Information Technology |
Lyu, Siwei | University at Buffalo |
Keywords: Reinforcement Learning, Performance Evaluation and Benchmarking
Abstract: Reinforcement learning is used to tackle complex tasks with high-dimensional sensory inputs. Over the past decade, a wide range of reinforcement learning algorithms have been developed, with recent progress benefiting from deep learning for raw sensory signal representation. This raises a natural question: how well do these algorithms perform across different robotic manipulation tasks? To objectively compare algorithms, benchmarks use performance metrics. Benchmarks use objective performance metrics to offer a scientific way to compare algorithms. In this paper, we introduce RMBench, the first benchmark for robotic manipulations with high-dimensional continuous action and state spaces. We implement and evaluate reinforcement learning algorithms that take observed pixels as inputs and report their average performance and learning curves to demonstrate their performance and training stability. Our study concludes that none of the evaluated algorithms can handle all tasks well, with soft Actor-Critic outperforming most algorithms in terms of average reward and stability, and an algorithm combined with data augmentation potentially facilitating learning policies. Our code is publicly available at https://github.com/xiangyanfei212/RMBench-2022.git, including all benchmark tasks and studied algorithms.
|
|
09:18-09:24, Paper MoAT12.9 | Add to My Program |
Shape Completion with Prediction of Uncertain Regions |
|
Humt, Matthias | German Aerospace Center (DLR), Technical University Munich (TUM) |
Winkelbauer, Dominik | DLR |
Hillenbrand, Ulrich | German Aerospace Center (DLR) |
Keywords: Perception for Grasping and Manipulation, RGB-D Perception
Abstract: Shape completion, i.e., predicting the complete geometry of an object from a partial observation, is highly relevant for several downstream tasks, most notably robotic manipulation. When basing planning or prediction of real grasps on object shape reconstruction, an indication of severe geometric uncertainty is indispensable. In particular, there can be an irreducible uncertainty in extended regions about the presence of entire object parts when given ambiguous object views. To treat this important case, we propose two novel methods for predicting such uncertain regions as straightforward extensions of any method for predicting local spatial occupancy, one through postprocessing occupancy scores, the other through direct prediction of an uncertainty indicator. We compare these methods together with two known approaches to probabilistic shape completion. Moreover, we generate a dataset, derived from ShapeNet [1], of realistically rendered depth images of object views with ground-truth annotations for the uncertain regions. We train on this dataset and test each method in shape completion and prediction of uncertain regions for known and novel object instances and on synthetic and real data. While direct uncertainty prediction is by far the most accurate in the segmentation of uncertain regions, both novel methods outperform the two baselines in shape completion and uncertain region prediction, and avoiding the predicted uncertain regions increases the quality of grasps for all tested methods. Web: https://github.com/DLR-RM/shape-completion
|
|
09:24-09:30, Paper MoAT12.10 | Add to My Program |
Structure from Action: Learning Interactions for 3D Articulated Object Structure Discovery |
|
Nie, Neil | Columbia University |
Gadre, Samir Yitzhak | Columbia University |
Ehsani, Kiana | Allen Institute for Artificial Intelligence |
Song, Shuran | Columbia University |
Keywords: Object Detection, Segmentation and Categorization, Perception-Action Coupling, Deep Learning for Visual Perception
Abstract: We introduce Structure from Action (SfA), a framework to discover 3D part geometry and joint parameters of unseen articulated objects via a sequence of inferred interactions. Our key insight is that 3D interaction and perception should be considered in conjunction to construct 3D articulated CAD models, especially for categories not seen during training. By selecting informative interactions, SfA discovers parts and reveals occluded surfaces, like the inside of a closed drawer. By aggregating visual observations in 3D, SfA accurately segments multiple parts, reconstructs part geometry, and infers all joint parameters in a canonical coordinate frame. Our experiments demonstrate that a SfA model trained in simulation can generalize to many unseen object categories with diverse structures and to real-world objects. Empirically, SfA outperforms a pipeline of state-of-the-art components by 25.4 3D IoU percentage points on unseen categories, while matching already performant joint estimation baselines.
|
|
09:30-09:36, Paper MoAT12.11 | Add to My Program |
Object-Oriented Option Framework for Robotics Manipulation in Clutter |
|
Pang, Jing-Cheng | Nanjing University |
Young, Stalin | Nanjing University |
Xiong-Hui, Chen | National Key Laboratory for Novel Software Technology, Nanjing U |
Yang, Xinyu | Nanjing University |
Yang, Yu | National Key Laboratory for Novel Software Technology, Nanjing U |
Mas, Ma | CloudMinds Robotics |
Ziqi, Guo | CloudMinds Robotics |
Yang, Howard | CloundMinds |
Huang, Bill | CloudMinds Technologies Inc |
Keywords: Reinforcement Learning, Deep Learning in Grasping and Manipulation
Abstract: Domestic service robots are becoming increasingly popular due to their ability to help people with household tasks. These robots often encounter the challenge of manipulating objects in cluttered environments (MoC), which is difficult due to the complexity of effective planning and control. Previous solutions involved designing specific action primitives and planning paradigms. However, the pre-coded action primitives can limit the agility and task-solving scope of robots. In this paper, we propose a general approach for MoC called the Object-Oriented Option Framework (O3F), which uses the option framework (OF) to learn planning and control. The standard OF discovers options from scratch based on reinforcement learning, which can lead to collapsed options and hurt learning. To address this limitation, O3F introduces the concept of an object-oriented option space for OF, which focuses specifically on object movement and overcomes the challenges associated with collapsed options. Based on this, we train an object-oriented option planner to determine the option to execute and a universal object-oriented option executor to complete the option. Simulation experiments on the Ginger XR1 robot and robot arm show that O3F is generally applicable to various types of robot and manipulation tasks. Furthermore, O3F achieves success rates of 72.4% and 90% in grasping and object collecting tasks, respectively, significantly outperforming baseline methods.
|
|
09:36-09:42, Paper MoAT12.12 | Add to My Program |
Non-Contact Tactile Perception for Hybrid-Active Gripper |
|
Pereira, Jonathas Henrique Mariano | IFSP - Institute Technology of Sao Paulo, Campus Registro |
Joventino, Carlos Fernando | IFSP - Institute Technology of Sao Paulo, Campus Registro |
Fabro, João Alberto | Federal University of Technology - Parana (UTFPR) |
de Oliveira, Andre Schneider | Federal University of Technology - Parana |
Keywords: Object Detection, Segmentation and Categorization, Perception for Grasping and Manipulation, Manipulation Planning
Abstract: This paper presents a novel approach to object recognition using a reconfigurable gripper with multiple time-of-flight (ToF) sensors attached to the fingers and palm, introducing the concept of noncontact tactile perception. This approach aims to promote aproprioceptive sense in the gripper workspace, allowing object prediction in manipulation tasks. The Hybrid-Active (H-A) gripper can adapt its topology to achieve different object reading points to generate a reliable object estimation. Non-contact tactile perception uses ToF sensors and gripper reconfiguration degrees-of-freedom for 3D perception and surface estimation of the pick-up object. This method is based on five ToF sensors in the palm that identify the distance and adjust the gripper to the center of the object through its capability to manage the manipulator. The H-A gripper also has twelve sensors distributed over its three fingers: four sensors on each finger, two on the distal phalanx, and two on the middle phalanx. Fingers have a rotational mobility of 180°, allowing the sensing of all faces of the object at different angles to the tridimensional reconstruction. The proposed approach was evaluated in four experiments that analyzed the influence of resolution, object complexity, finger tilt, and angular sampling over 13 objects with different complexities. The experimentation set allows the overall evaluation of non-contact tactile perception and the specification of its performance parameters.
|
|
MoAT13 Regular session, 260 Portside Ballroom |
Add to My Program |
Visual Learning |
|
|
Chair: Wang, Yu-Xiong | University of Illinois Urbana-Champaign |
Co-Chair: Watanabe, Tetsuyou | Kanazawa University |
|
08:30-08:36, Paper MoAT13.1 | Add to My Program |
ILabel: Revealing Objects in Neural Fields |
|
Zhi, Shuaifeng | National University of Defense Technology |
Sucar, Edgar | Imperial College London |
Mouton, Andre | Dyson Ltd |
Haughton, Iain | Dyson Ltd |
Laidlow, Tristan | Boston Dynamics |
Davison, Andrew J | Imperial College London |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Representation Learning
Abstract: A neural field trained with self-supervision to efficiently represent the geometry and colour of a 3D scene tends to automatically decompose it into coherent and accurate object-like regions, which can be revealed with sparse labelling interactions to produce a 3D semantic scene segmentation. Our real-time iLabel system takes input from a hand-held RGB-D camera, requires zero prior training data, and works in an `open set' manner, with semantic classes defined on the fly by the user. iLabel's underlying model is a simple multilayer perceptron (MLP), trained from scratch to learn a neural representation of a single 3D scene. The model is updated continually and visualised in real-time, allowing the user to focus interactions to achieve extremely efficient semantic segmentation. A room-scale scene can be accurately labelled into 10+ semantic categories with around 100 clicks, taking less than 5 minutes. Quantitative labelling accuracy scales powerfully with the number of clicks, and rapidly surpasses standard pre-trained semantic segmentation methods. We also demonstrate a hierarchical labelling variant of iLabel and a `hands-free' mode where the user only needs to supply label names for automatically-generated locations.
|
|
08:36-08:42, Paper MoAT13.2 | Add to My Program |
Weakly Supervised Referring Expression Grounding via Dynamic Self-Knowledge Distillation |
|
Mi, Jinpeng | USST |
Chen, Zhiqian | University of Shanghai for Science and Technology |
Zhang, Jianwei | University of Hamburg |
Keywords: Visual Learning, Deep Learning for Visual Perception
Abstract: Weakly supervised referring expression grounding (WREG) is an attractive and challenging task for grounding target regions in images by understanding given referring expressions. WREG learns to ground target objects without the manual annotations between image regions and referring expressions during the model training phase. Different from the predominant grounding pattern of existing models, which locates target objects by reconstructing the region-expression correspondence, we investigate WREG from a novel perspective and enrich the prevailing pattern with self-knowledge distillation. Specifically, we propose a target-guided self-knowledge distillation approach that adopts the target prediction knowledge learned from the previous training iterations as the teacher to guide the subsequent training procedure. In order to avoid the misleading caused by the teacher knowledge with low prediction confidence, we present an uncertainty-aware knowledge refinement strategy to adaptively rectify the teacher knowledge by learning dynamic threshold values based on the model prediction uncertainty. To validate the proposed approach, we implement extensive experiments on three benchmark datasets, i.e., RefCOCO, RefCOCO+, and RefCOCOg. Our approach achieves new state-of-the-art results on several splits of the benchmark datasets, showcasing the advantage of the proposed framework for WREG. The implementation codes and trained models are available at: https://github.com/dami23/WREG_Self_KD.
|
|
08:42-08:48, Paper MoAT13.3 | Add to My Program |
EventTransAct: A Video Transformer-Based Framework for Event-Camera Based Action Recognition |
|
de Blegiers, Tristan | University of Central Florida |
Dave, Ishan Rajendrakumar | University of Central Florida |
Yousaf, Adeel | University of Central Florida |
Shah, Mubarak | University of Central Florida |
Keywords: Gesture, Posture and Facial Expressions, Visual Learning, Computer Vision for Automation
Abstract: Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing. Event cameras, with their ability to capture fast-moving objects at a high temporal resolution, offer new opportunities compared to standard action recognition in RGB videos. However, previous research on event camera action recognition has primarily focused on sensor-specific network architectures and image encoding, which may not be suitable for new sensors and limit the use of recent advancement in transformer-based architectures. In this study, we employ using a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame and then utilizes a temporal self-attention mechanism. This approach separates the spatial and temporal operations, resulting in VTN being more computationally efficient than other video transformers that process spatio-temporal volumes directly. In order to better adopt the VTN for the sparse and finegrained nature of event data, we design Event-Contrastive Loss (mathcal{L}_{EC}) and event specific augmentations. Proposed mathcal{L}_{EC} promotes learning fine-grained spatial cues in the spatial backbone of VTN by contrasting temporally misaligned frames. We evaluate our method on real-world action recognition of N-EPIC Kitchens dataset, and achieve state-of-the-art results on both protocols - testing in seen kitchen (textbf{74.9%} accuracy) and testing in unseen kitchens (textbf{42.43% and 46.66% Accuracy}). Our approach also takes less computation time compared to competitive prior approaches. We also evaluate our method on the standard DVS Gesture recognition dataset, achieving a competitive accuracy of textbf{97.9%} compared to prior work that uses dedicated architectures and image-encoding for the DVS dataset. These results demonstrate the potential of our framework textit{EventTransAct} for real-world applications of event-camera based action recognition. Project Page: url{https://tristandb8.github.io/EventTransAct_webpage/}
|
|
08:48-08:54, Paper MoAT13.4 | Add to My Program |
Virtual Ski Training System That Allows Beginners to Acquire Ski Skills Based on Physical and Visual Feedbacks |
|
Okada, Yushi | Waseda University |
Seo, Chanjin | Waseda University |
Miyakawa, Shunichi | Waseda University |
Taniguchi, Motofumi | Waseda University |
Kanosue, Kazuyuki | Waseda University |
Ogata, Hiroyuki | Seikei University |
Ohya, Jun | Waseda University |
Keywords: Virtual Reality and Interfaces, Visual Learning, Sensorimotor Learning
Abstract: This paper proposes a ski training system using VR (Virtual Reality) that enables beginners to acquire skiing skills without going to an actual ski ground. The proposed system obtains the speed of skiing based on the center of pressure (COP) of each player's foot. The first-person perspective of skiing at the obtained speed down a ski slope is fed back to the player as a VR image. Experiments were conducted to evaluate the effectiveness of the proposed system and the VR interface. Specifically, beginner skiers were categorized into three groups: "a group trained with the proposed VR system", "a group trained with a system that provides feedback of the skiing speed calculated from the COP by increasing or decreasing the gauge (a bar-shaped graph representing changes in numerical values), instead of VR", and "a group that does not train with the system". After training under each of these conditions, a sliding test was conducted on an actual ski slope to check the degree of skill acquisition. The results show that subjects trained with the proposed system acquired more skiing skills than subjects who did not use the system on actual ski slopes. Furthermore, there was no clear difference in the result of the sliding test between subjects trained by the VR interface and those trained by the gauge interface, but the VR interface yields better deceleration postures.
|
|
08:54-09:00, Paper MoAT13.5 | Add to My Program |
Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars |
|
Rochow, Andre | University of Bonn |
Schwarz, Max | University Bonn |
Behnke, Sven | University of Bonn |
Keywords: Gesture, Posture and Facial Expressions, Visual Learning, Human-Robot Collaboration
Abstract: Facial animation in virtual reality environments is essential for applications that necessitate clear visibility of the user’s face and the ability to convey emotional signals. In our scenario, we animate the face of an operator who controls a robotic Avatar system. The use of facial animation is particularly valuable when the perception of interacting with a specific individual, rather than just a robot, is intended. Purely keypoint-driven animation approaches struggle with the complexity of facial movements. We present a hybrid method that uses both keypoints and direct visual guidance from a mouth camera. Our method generalizes to unseen operators and requires only a quick enrolment step with capture of two short videos. Multiple source images are selected with the intention to cover different facial expressions. Given a mouth camera frame from the HMD, we dynamically construct the target keypoints and apply an attention mechanism to determine the importance of each source image. To resolve keypoint ambiguities and animate a broader range of mouth expressions, we propose to inject visual mouth camera information into the latent space. We enable training on large-scale speaking head datasets by simulating the mouth camera input with its perspective differences and facial deformations. Our method outperforms a baseline in quality, capability, and temporal consistency. In addition, we highlight how the facial animation contributed to our victory at the ANA Avatar XPRIZE Finals.
|
|
09:00-09:06, Paper MoAT13.6 | Add to My Program |
Test-Time Adaptation for Point Cloud Upsampling Using Meta-Learning |
|
Hatem, Ahmed | University of Manitoba |
Qian, Yiming | University of Manitoba |
Wang, Yang | Concordia University |
Keywords: Visual Learning, Deep Learning Methods, Transfer Learning
Abstract: Affordable 3D scanners often produce sparse and non-uniform point clouds that negatively impact downstream applications in robotic systems. While existing point cloud upsampling architectures have demonstrated promising results on standard benchmarks, they tend to experience significant performance drops when the test data have different distributions from the training data. To address this issue, this paper proposes a test-time adaption approach to enhance model generality of point cloud upsampling. The proposed approach leverages meta-learning to explicitly learn network parameters for test-time adaption. Our method does not require any prior information about the test data. During meta-training, the model parameters are learned from a collection of instance-level tasks, each of which consists of a sparse-dense pair of point clouds from the training data. During meta-testing, the trained model is fine-tuned with a few gradient updates to produce a unique set of network parameters for each test instance. The updated model is then used for the final prediction. Our framework is generic and can be applied in a plug-and-play manner with existing backbone networks in point cloud upsampling. Extensive experiments demonstrate that our approach improves the performance of state-of-the-art models.
|
|
09:06-09:12, Paper MoAT13.7 | Add to My Program |
Revisiting Event-Based Video Frame Interpolation |
|
Chen, Jiaben | University of California, San Diego |
Zhu, Yichen | Shanghaitech University |
Lian, Dongze | National University of Singapore |
Yang, Jiaqi | ShanghaiTech University |
Wang, Yifu | ShanghaiTech University |
Zhang, Renrui | Peking University |
Liu, Xinhang | HKUST |
Qian, Shenhan | Technical University of Munich |
Kneip, Laurent | ShanghaiTech University |
Gao, Shenghua | Shanghaitech University |
Keywords: Visual Learning, Sensor Fusion, Deep Learning for Visual Perception
Abstract: Dynamic vision sensors or event cameras provide rich complementary information for video frame interpolation. Existing state-of-the-art methods follow the paradigm of combining both synthesis-based and warping networks. However, few of those methods fully respect the intrinsic characteristics of events streams. Given that event cameras only encode intensity changes and polarity rather than color intensities, estimating optical flow from events is arguably more difficult than from RGB information. We therefore propose to incorporate RGB information in an event-guided optical flow refinement strategy. Moreover, in light of the quasi-continuous nature of the time signals provided by event cameras, we propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages rather than in a single, long stage. Extensive experiments on both synthetic and real-world datasets show that these modifications lead to more reliable and realistic intermediate frame results than previous video frame interpolation methods. Our findings underline that a careful consideration of event characteristics such as high temporal density and elevated noise benefits interpolation accuracy.
|
|
09:12-09:18, Paper MoAT13.8 | Add to My Program |
Revisiting Deformable Convolution for Depth Completion |
|
Sun, Xinglong | Stanford & UIUC |
Ponce, Jean | Ecole Normale Supérieure |
Wang, Yu-Xiong | University of Illinois Urbana-Champaign |
Keywords: RGB-D Perception, Visual Learning
Abstract: Depth completion, which aims to generate high-quality dense depth maps from sparse depth maps, has attracted increasing attention in recent years. Previous work usually employs RGB images as guidance, and introduces iterative spatial propagation to refine estimated coarse depth maps. However, most of the propagation refinement methods require several iterations and suffer from a fixed receptive field, which may contain irrelevant and useless information with very sparse input. In this paper, we address these two challenges simultaneously by revisiting the idea of deformable convolution. We propose an effective architecture that leverages deformable kernel convolution as a single-pass refinement module, and empirically demonstrate its superiority. To better understand the function of deformable convolution and exploit it for depth completion, we further systematically investigate a variety of representative strategies. Our study reveals that, different from prior work, deformable convolution needs to be applied on an estimated depth map with a relatively high density for better performance. We evaluate our model on the large-scale KITTI dataset and achieve state-of-the-art level performance in both accuracy and inference speed. Our code is available at https://github.com/AlexSunNik/ReDC.
|
|
09:18-09:24, Paper MoAT13.9 | Add to My Program |
Long-Distance Gesture Recognition Using Dynamic Neural Networks |
|
Bhatnagar, Shubhang | University of Illinois at Urbana-Champaign |
Gopal, Sharath | Bosch |
Ahuja, Narendra | Univ. of Illinois |
Ren, Liu | Robert Bosch North America Research Technology Center |
Keywords: Gesture, Posture and Facial Expressions, Visual Learning, Recognition
Abstract: Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.
|
|
09:24-09:30, Paper MoAT13.10 | Add to My Program |
Neural Implicit Vision-Language Feature Fields |
|
Blomqvist, Kenneth | ETH Zurich |
Milano, Francesco | ETH Zurich |
Chung, Jen Jen | The University of Queensland |
Ott, Lionel | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Keywords: Semantic Scene Understanding, Visual Learning, Representation Learning
Abstract: Recently, groundbreaking results have been presented on open-vocabulary semantic image segmentation. Such methods segment each pixel in an image into arbitrary categories provided at run-time in the form of text prompts, as opposed to a fixed set of classes defined at training time. In this work, we present a method for volumetric open-vocabulary semantic scene segmentation. Our method builds on the insight that we can fuse 2D image features from a vision-language model into a neural implicit representation. We show that the resulting feature field can be segmented into different classes by assigning points to the closest natural language text prompt. Using an implicit volumetric representation enables us to segment the scene both in 3D and 2D by rendering feature maps from any given viewpoint of the scene. We show that our method works on noisy real-world data and can run in real-time on live sensor data dynamically adjusting to text prompts. We also present quantitative comparisons on the diverse ScanNet dataset.
|
|
09:30-09:36, Paper MoAT13.11 | Add to My Program |
Language Guided Robotic Grasping with Fine-Grained Instructions |
|
Sun, Qiang | Fudan University |
Lin, Haitao | Fudan University |
Fu, Ying | Beijing Institute of Technology |
Fu, Yanwei | Fudan University |
Xue, Xiangyang | Fudan University |
Keywords: Visual Learning, Semantic Scene Understanding, Grasping
Abstract: Given a single RGB image and the attribute-rich language instructions, this paper investigates the novel problem of using Fine-grained instructions for the Language guided robotic Grasping FLarG. This problem is made challenging by learning fine-grained language descriptions to ground target objects. Recent advances have been made in visually grounding the objects simply by several coarse attributes. However, these methods have poor performance as they cannot well align the multi-modal features, and do not make the best of recent powerful large pre-trained vision and language models, e.g., Clip. To this end, this paper proposes a FLarG pipeline including stages of clip-guided object localization, and 6-DoF category-level object pose estimation for grasping. Specially, we first take the Clip-based segmentation model CRIS as the backbone and propose an end-to-end DyCRIS model that uses a novel dynamic mask strategy to well fuse the multi-level language and vision features. Then, the well-trained instance segmentation backbone Mask R-CNN is adopted to further improve the predicted mask of our DyCRIS. Finally, the target object pose is inferred for the robotics grasping by using the recent 6-DoF object pose estimation method. To validate our CLIP-enhanced pipeline, we also construct a validation dataset for our FLarG task and name it RefNOCS. Extensive results on RefNOCS have shown the utility and effectiveness of our proposed method. The project homepage is available at https://sunqiang85.github.io/FLarG.
|
|
09:36-09:42, Paper MoAT13.12 | Add to My Program |
Whole Shape Estimation of Transparent Object from Its Contour Using Statistical Shape Model |
|
Okada, Kaihei | Kanazawa University |
Kobayashi, Riku | Kanazawa University |
Tsuji, Tokuo | Kanazawa University |
Hiramitsu, Tatsuhiro | Kanazawa University |
Seki, Hiroaki | Kanazawa University |
Nishimura, Toshihiro | Kanazawa University |
Suzuki, Yosuke | Kanazawa University |
Watanabe, Tetsuyou | Kanazawa University |
Keywords: Computer Vision for Automation, Computer Vision for Manufacturing
Abstract: This paper presents a method for estimating the 3D shape of transparent objects from an RGB-D image using a statistical shape model. Statistical shape models compress dimensions from multiple shapes to represent variations in shape with fewer parameters. It is difficult to measure the depth of a transparent object with any sensor. Therefore, the statistical shape model is deformed to fit the contour extracted from the RGB image to estimate the shape of the object. The depth image is only used for detecting the plane on which transparent objects are placed. The proposed method estimates the whole shape of transparent objects, unlike other estimation methods. The estimation accuracy of the proposed method is compared with that of a machine learning based method. In addition, the estimated whole shape was compared with the measured data of a 3D scanner.
|
|
MoAT14 Regular session, 320 |
Add to My Program |
Localization I |
|
|
Chair: Malis, Ezio | Inria |
Co-Chair: Lopez, Brett | University of California, Los Angeles |
|
08:30-08:36, Paper MoAT14.1 | Add to My Program |
A Hierarchical Multi-Task Visual Relocalization System |
|
Yin, Jiahao | Beihang University |
Xiao, Huahui | BUAA |
Li, Wei | Beihang University |
Zhou, Xinyu | University of International Business and Economics |
Liu, Zhili | Yihang Intellitech Co., Ltd |
Li, Xue | Yihang Intellitech Co., Ltd |
Fan, Shengyin | Yihang Intellitech Co., Ltd |
Keywords: Localization, SLAM, Autonomous Vehicle Navigation
Abstract: Locating the 6DoF pose of a camera in a known scene graph is a fundamental problem of SLAM. Hierarchical relocalization methods, which retrieve images first and match feature points later, have been widely studied by scholars for their high accuracy. In this paper, based on hierarchical relocalization, HAPOR (Hierarchical-features Aligned Projection Optimization for Relocalization), an end-to-end relocalization system, is proposed to combine image retrieval and iterative pose optimization. Through an attention mechanism branch, foreground dynamic objects and repeating textures are filtered out. We further design an image retrieval system (GTLGR) in HAPOR and generate an initial pose based on the co-visibility graph for subsequent iterative optimization. In addition, relying on GPS as ground truth for image retrieval training is quite inefficient, thus, we model the common visible area of two camera's view in 3D field, which significantly reduces the training time. Finally, we apply HAPOR to the ORB-SLAM2 system and obtain the state-of-the-art relocalization results. Here is a demo: https://www.youtube.com/watch?v=rCLpWCxN31M
|
|
08:36-08:42, Paper MoAT14.2 | Add to My Program |
RI-LIO: Reflectivity Image Assisted Tightly-Coupled LiDAR-Inertial Odometry |
|
Zhang, Yanfeng | Institute of Automation, Chinese Academy of Sciences |
Tian, Yunong | Institute of Automation, Chinese Academy of Sciences |
Wang, Wanguo | State Grid Intelligence Technology Co., Ltd |
Yang, Guodong | Institute of Automation, Chinese Academy of Sciences |
Li, Zhishuo | Chinese Academy of Sciences |
Jing, Fengshui | Institute of Automation, CAS |
Tan, Min | Institute of Automation, Chinese Academy of Sciences |
Keywords: Localization, SLAM, Mapping
Abstract: In this letter, we propose RI-LIO, a new reflectivity image assisted tightly-coupled LiDAR-inertial odometry (LIO) framework that introduces additional reflectivity texture information to efficiently reduce the drift of geometric-only methods. To achieve this, we construct an iterated extended Kalman filter framework by blending the point-to-plane geometric measurement and the reflectivity image measurement. Specifically, the geometric measurement is defined as the distance from the raw point of a new scan to its nearest neighbor plane in the global incremental kd-tree map. The searched nearest neighbor point is used to render a sparse reflectivity image after LiDAR motion distortion information is given by its corresponding raw point. Then, the reflectivity measurement is built to align the sparse reflectivity image with the dense reflectivity image of the current scan by minimizing the photometric errors directly. In addition, based on the mechanism of high-resolution LiDAR, a corrected spherical projection model is proposed to project spatial points into the image frame. Finally, extensive experiments are conducted using different mobile robots in structured, unstructured and challenging open field scenarios. The results demonstrate that the proposed method outperforms existing geometric-only methods in terms of robustness and accuracy, especially in the rotation direction.
|
|
08:42-08:48, Paper MoAT14.3 | Add to My Program |
Off the Radar: Uncertainty-Aware Radar Place Recognition with Introspective Querying and Map Maintenance |
|
Yuan, Jianhao | University of Oxford |
Newman, Paul | Oxford University |
Gadd, Matthew | University of Oxford |
Keywords: Localization, Mapping, Deep Learning Methods
Abstract: Localisation with Frequency-Modulated Continuous-Wave (FMCW) radar has gained increasing interest due to its inherent resistance to challenging environments. However, complex artefacts of the radar measurement process require appropriate uncertainty estimation – to ensure the safe and reliable application of this promising sensor modality. In this work, we propose a multi-session map management system which constructs the “best” maps for further localisation based on learned variance properties in an embedding space. Using the same variance properties, we also propose a new way to introspectively reject localisation queries that are likely to be incorrect. For this, we apply robust noise-aware metric learning, which both leverages the short-timescale variability of radar data along a driven path (for data augmentation) and predicts the downstream uncertainty in metric-space-based place recognition. We prove the effectiveness of our method over extensive cross-validated tests of the Oxford Radar RobotCar and MulRan dataset. In this, we outperform the current state-of-the-art in radar place recognition and other uncertainty-aware methods when using only single nearest-neighbour queries. We also show consistent performance increases when rejecting queries based on uncertainty over a difficult test environment, which we did not observe for a competing uncertainty-aware place recognition system.
|
|
08:48-08:54, Paper MoAT14.4 | Add to My Program |
Global Localization in Unstructured Environments Using Semantic Object Maps Built from Various Viewpoints |
|
Ankenbauer, Jacqueline | Massachusetts Institute of Technology |
Lusk, Parker C. | Massachusetts Institute of Technology |
Thomas, Annika | Massachusetts Institute of Technology |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Localization, Mapping, SLAM
Abstract: We present a novel framework for global localization and guided relocalization of a vehicle in an unstructured environment. Compared to existing methods, our pipeline does not rely on cues from urban fixtures (e.g., lane markings, buildings), nor does it make assumptions that require the vehicle to be navigating on a road network. Instead, we achieve localization in both urban and non-urban environments by robustly associating and registering the vehicle’s local semantic object map with a compact semantic reference map, potentially built from other viewpoints, time periods, and/or modalities. Robustness to noise, outliers, and missing objects is achieved through our graph-based data association algorithm. Further, the guided relocalization capability of our pipeline mitigates drift inherent in odometry-based localization after the initial global localization. We evaluate our pipeline on two publicly- available, real-world datasets to demonstrate its effectiveness at global localization in both non-urban and urban environments. The Katwijk Beach Planetary Rover dataset [1] is used to show our pipeline’s ability to perform accurate global localization in unstructured environments. Demonstrations on the KITTI dataset [2] achieve an average pose error of 3.8m across all 35 localization events on Sequence 00 when localizing in a reference map created from aerial images. Compared to existing works, our pipeline is more general because it can perform global localization in unstructured environments using maps built from different viewpoints.
|
|
08:54-09:00, Paper MoAT14.5 | Add to My Program |
Constructing Metric-Semantic Maps Using Floor Plan Priors for Long-Term Indoor Localization |
|
Zimmerman, Nicky | University of Bonn |
Sodano, Matteo | Photogrammetry and Robotics Lab, University of Bonn |
Marks, Elias Ariel | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Localization, Mapping
Abstract: Object-based maps are relevant for scene understanding since they integrate geometric and semantic information of the environment, allowing autonomous robots to robustly localize and interact with on objects. In this paper, we address the task of constructing a metric-semantic map for the purpose of long-term object-based localization. We exploit 3D object detections from monocular RGB frames for both, the object-based map construction, and for globally localizing in the constructed map. To tailor the approach to a target environment, we propose an efficient way of generating 3D annotations to finetune the 3D object detection model. We evaluate our map construction in an office building, and test our long-term localization approach on challenging sequences recorded in the same environment over nine months. The experiments suggest that our approach is suitable for constructing metric-semantic maps, and that our localization approach is robust to long-term changes. Both, the mapping algorithm and the localization pipeline can run online on an onboard computer. We release an open-source C++/ROS implementation of our approach.
|
|
09:00-09:06, Paper MoAT14.6 | Add to My Program |
DisPlacing Objects: Improving Dynamic Vehicle Detection Via Visual Place Recognition under Adverse Conditions |
|
Hausler, Stephen | CSIRO |
Garg, Sourav | Queensland University of Technology |
Chakravarty, Punarjay | Planet |
Shrivastava, Shubham | Ford Greenfield Labs |
Vora, Ankit | Ford Motor Company |
Milford, Michael J | Queensland University of Technology |
Keywords: Autonomous Vehicle Navigation, Object Detection, Segmentation and Categorization, Localization
Abstract: Can knowing where you are assist in perceiving objects in your surroundings, especially under adverse weather and lighting conditions? In this work we investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map or pixel-level map-query correspondences. We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map. We begin by using visual place recognition (VPR) to retrieve a prior map image for a given query image, then use a binary classification neural network that compares the query and prior map image regions to validate the query detection. Once our classification network is trained, on approximately 1000 query-map image pairs, it is able to improve the performance of vehicle detection when combined with an existing off-the-shelf vehicle detector. We demonstrate our approach using standard datasets across two cities (Oxford and Zurich) under different settings of train-test separation of map-query traverse pairs. We further emphasize the performance gains of our approach against alternative design choices and show that VPR suffices for the task, eliminating the need for precise ground truth localization.
|
|
09:06-09:12, Paper MoAT14.7 | Add to My Program |
FM-Loc: Using Foundation Models for Improved Vision-Based Localization |
|
Mirjalili, Reihaneh | University of Technology Nuremberg |
Krawez, Michael | University of Technology Nuremberg |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Localization, SLAM, Vision-Based Navigation
Abstract: Visual place recognition is essential for vision-based robot localization and SLAM. Despite the tremendous progress made in recent years, place recognition in changing environments remains challenging. A promising approach to cope with appearance variations is to leverage high-level semantic features like objects or place categories. In this paper, we propose FM-Loc which is a novel image-based localization approach based on Foundation Models that uses the Large Language Model GPT-3 in combination with the Visual-Language Model CLIP to construct a semantic image descriptor that is robust to severe changes in scene geometry and camera viewpoint. We deploy CLIP to detect objects in an image, GPT-3 to suggest potential room labels based on the detected objects, and CLIP again to propose the most likely location label. The object labels and the scene label constitute an image descriptor that we use to calculate a similarity score between the query and database images. We validate our approach on real-world data that exhibit significant changes in camera viewpoints and object placement between the database and query trajectories. The experimental results demonstrate that our method is applicable to a wide range of indoor scenarios without the need for training or fine-tuning.
|
|
09:12-09:18, Paper MoAT14.8 | Add to My Program |
Joint On-Manifold Gravity and Accelerometer Intrinsics Estimation for Inertially Aligned Mapping |
|
Nemiroff, Ryan | University of California, Los Angeles |
Chen, Kenny | University of California, Los Angeles |
Lopez, Brett | University of California, Los Angeles |
Keywords: Localization, Mapping, SLAM
Abstract: Aligning a robot's trajectory or map to the inertial frame is a critical capability that is often difficult to do accurately even though inertial measurement units (IMUs) can observe absolute roll and pitch with respect to gravity. Accelerometer biases and scale factor errors from the IMU's initial calibration are often the major source of inaccuracies when aligning the robot's odometry frame with the inertial frame, especially for low-grade IMUs. Practically, one would simultaneously estimate the true gravity vector, accelerometer biases, and scale factor to improve measurement quality but these quantities are not observable unless the IMU is sufficiently excited. While several methods estimate accelerometer bias and gravity, they do not explicitly address the observability issue nor do they estimate scale factor. We present a fixed-lag factor-graph-based estimator to address both of these issues. In addition to estimating accelerometer scale factor, our method mitigates limited observability by optimizing over a time window an order of magnitude larger than existing methods with significantly lower computational burden. The proposed method, which estimates accelerometer intrinsics and gravity separately from the other states, is enabled by a novel, velocity-agnostic measurement model for intrinsics and gravity, as well as a new method for gravity vector optimization on S2. Accurate IMU state prediction, gravity-alignment, and roll/pitch drift correction are experimentally demonstrated on public and self-collected datasets in diverse environments.
|
|
09:18-09:24, Paper MoAT14.9 | Add to My Program |
I2P-Rec: Recognizing Images on Large-Scale Point Cloud Maps through Bird's Eye View Projections |
|
Zheng, Shuhang | Zhejiang University |
Li, Yixuan | Zhejiang University |
Yu, Zhu | Zhejiang University |
Yu, Beinan | Zhejiang University |
Cao, Siyuan | Zhejiang University |
Wang, Minhang | HAOMO.AI Technology Co., Ltd |
Xu, Jintao | HAOMO.AI Technology Co., Ltd |
Ai, Rui | HAOMO.AI Technology Co., Ltd |
Gu, Weihao | HAOMO.AI Technology Co., Ltd |
Luo, Lun | Zhejiang University |
Shen, Hui-liang | Zhejaing University |
Keywords: Localization, SLAM, Recognition
Abstract: Place recognition is an important technique for autonomous cars to achieve full autonomy since it can provide an initial guess to online localization algorithms. Although current methods based on images or point clouds have achieved satisfactory performance, localizing the images on a large-scale point cloud map remains a fairly unexplored problem. This cross-modal matching task is challenging due to the difficulty in extracting consistent descriptors from images and point clouds. In this paper, we propose the I2P-Rec method to solve the problem by transforming the cross-modal data into the same modality. Specifically, we leverage on the recent success of depth estimation networks to recover point clouds from images. We then project the point clouds into Bird's Eye View (BEV) images. Using the BEV image as an intermediate representation, we extract global features with a Convolutional Neural Network followed by a NetVLAD layer to perform matching. The experimental results evaluated on the KITTI dataset show that, with only a small set of training data, I2P-Rec achieves recall rates at Top-1% over 80% and 90%, when localizing monocular and stereo images on point cloud maps, respectively. We further evaluate I2P-Rec on a 1 km trajectory dataset collected by an autonomous logistics car and show that I2P-Rec can generalize well to previously unseen environments.
|
|
09:24-09:30, Paper MoAT14.10 | Add to My Program |
CoPR: Towards Accurate Visual Localization with Continuous Place-Descriptor Regression (I) |
|
Zaffar, Mubariz | Delft University of Technology |
Nan, Liangliang | TU Delft |
Kooij, Julian Francisco Pieter | TU Delft |
Keywords: Localization, Mapping, SLAM, Visual Place Recognition
Abstract: Visual Place Recognition (VPR) is an image-based localization method that estimates the camera location of a query image by retrieving the most similar reference image from a map of geo-tagged reference images. In this work, we look into two fundamental bottlenecks for its localization accuracy: reference map sparseness and viewpoint invariance. Firstly, the reference images for VPR are only available at sparse poses in a map, which enforces an upper bound on the maximum achievable localization accuracy through VPR. We therefore propose Continuous Place-descriptor Regression (CoPR) to densify the map and improve localization accuracy. We study various interpolation and extrapolation models to regress additional place descriptors from only the existing references. Secondly, we compare different feature encoders and show that CoPR presents value for all of them. We evaluate our models on three existing public datasets and report on average around 30% improvement in VPR-based localization accuracy using CoPR, on top of the 15% increase by using a viewpoint-variant loss for the feature encoder. The complementary relation between CoPR and relative pose estimation is also discussed.
|
|
09:30-09:36, Paper MoAT14.11 | Add to My Program |
Complete Closed-Form and Accurate Solution to Pose Estimation from 3D Correspondences |
|
Malis, Ezio | Inria |
Keywords: Localization, SLAM, Autonomous Vehicle Navigation
Abstract: Computing the pose from 3D data acquired in two different frames is a of high importance for several robotic tasks like odometry, SLAM and place recognition. The pose is generally obtained by solving a least-squares problem given points-to-points, points-to-planes or points to lines correspondences. The non-linear least-squares problem can be solved by iterative optimization or, more efficiently, in closed-form by using solvers of polynomial systems. In this paper, a complete and accurate closed-form solution for a weighted least-squares problem is proposed. Adding weights for each correspondence allow to increase robustness to outliers. Contrary to existing methods, the proposed approach is complete since it is able to solve the problem in any non-degenerate case and it is accurate since it is guaranteed to find the global optimal estimate of the weighted least-squares problem. Simulations and experiments on real data demonstrate the superior accuracy and robustness of the proposed algorithm compared to previous approaches.
|
|
09:36-09:42, Paper MoAT14.12 | Add to My Program |
Toward Consistent and Efficient Map-Based Visual-Inertial Localization: Theory Framework and Filter Design (I) |
|
Zhang, Zhuqing | Zhejiang University |
Song, Yang | University of Technology Sydney |
Huang, Shoudong | University of Technology, Sydney |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Localization, Sensor Fusion, SLAM, Consistent Filter
Abstract: This paper focuses on designing a consistent and efficient filter for visual-inertial localization given a pre-built map. First, we propose a new Lie group with its algebra, based on which a novel invariant extended Kalman filter (invariant EKF) is designed. We theoretically prove that, when we do not consider the uncertainty of map information, the proposed invariant EKF is able to naturally preserve the correct observability properties of the system. To consider the uncertainty of map information, we introduce a Schmidt filter. With the Schmidt filter, the uncertainty of map information can be taken into consideration to avoid over-confident estimation while the computation cost only increases linearly with the size of the map keyframes. In addition, we introduce an easily implemented observability-constrained technique because directly combining the invariant EKF with the Schmidt filter cannot maintain the correct observability properties of the system that considers the uncertainty of map information. Finally, we validate our proposed system's high consistency, accuracy, and efficiency via extensive simulations and real world experiments.
|
|
09:42-09:48, Paper MoAT14.13 | Add to My Program |
WiFi Similarity-Based Odometry (I) |
|
Ismail, Khairuldanial | Singapore University of Technology and Design |
Liu, Ran | Southwest University of Science and Technology |
Athukorala, Achala | Singapore University of Technology and Design |
Ng, Benny Kai Kiat | Singapore University of Technology and Design |
Yuen, Chau | Nanyang Technological University |
Tan, U-Xuan | Singapore University of Techonlogy and Design |
Keywords: Localization
Abstract: Odometry is commonly used in localization applications especially with wheeled platforms since encoders are readily available. It is often used by itself or fused with other sensor data to obtain a better estimate. However, its limitation is its exclusivity to wheeled platforms whereas it is often desired to have similar encoder odometry options on other systems. Given that WiFi is ubiquitous in most commercial and industrial areas, in this paper, a method is proposed for obtaining odometry from WiFi scans for position estimation. The method is not constrained to wheel robots such as the case for wheeled odometry and does not rely on the traditional fingerprinting method. The proposed method involves training a neural network model to predict the distance moved based on features extracted from WiFi scans in the environment. These distances moved are then summed up to obtain the trajectory. Experiments are conducted and the methods are evaluated based on Root Mean Square Error (RMSE). Experimental results showed that the proposed method is able to achieve an RMSE of at most 8.39m for the various test cases.
|
|
MoAT15 Regular session, 321 |
Add to My Program |
Sensor Fusion for SLAM |
|
|
Chair: Huang, Guoquan | University of Delaware |
Co-Chair: Li, Lu | Carnegie Mellon University |
|
08:30-08:36, Paper MoAT15.1 | Add to My Program |
LIO-PPF: Fast LiDAR-Inertial Odometry Via Incremental Plane Pre-Fitting and Skeleton Tracking |
|
Chen, Xingyu | Peking University |
Wu, Peixi | Peking University |
Li, Ge | Peking University Shenzhen Graduate School |
Li, Thomas H. | Advanced Institute of Information Technology, Peking University; |
Keywords: SLAM, Mapping, Localization
Abstract: As a crucial infrastructure of intelligent mobile robots, LiDAR-Inertial odometry (LIO) provides the basic capability of state estimation by tracking LiDAR scans. The high-accuracy tracking generally involves the kNN search, which is used with minimizing the point-to-plane distance. The cost for this, however, is maintaining a large local map and performing kNN plane fit for each point. In this work, we reduce both time and space complexity of LIO by saving these unnecessary costs. Technically, we design a plane pre-fitting (PPF) pipeline to track the basic skeleton of the 3D scene. In PPF, planes are not fitted individually for each scan, let alone for each point, but are updated incrementally as the scene 'flows'. Unlike kNN, the PPF is more robust to noisy and non-strict planes with our iterative Principal Component Analyse (iPCA) refinement. Moreover, a simple yet effective sandwich layer is introduced to eliminate false point-to-plane matches. Our method was extensively tested on a total number of 22 sequences across 5 open datasets, and evaluated in 3 existing state-of-the-art LIO systems. By contrast, LIO-PPF can consume only 36% of the original local map size to achieve up to 4x faster residual computing and 1.92x overall FPS, while maintaining the same level of accuracy. We fully open source our implementation at https://github.com/xingyuuchen/LIO-PPF.
|
|
08:36-08:42, Paper MoAT15.2 | Add to My Program |
EDI: ESKF-Based Disjoint Initialization for Visual-Inertial SLAM Systems |
|
Wang, Weihan | Stevens Institute of Technology |
Li, Jiani | Vanderbilt University |
Ming, Yuhang | Hangzhou Dianzi University |
Mordohai, Philippos | Stevens Institute of Technology |
Keywords: Visual-Inertial SLAM, SLAM, Localization
Abstract: Visual-inertial initialization can be classified into joint and disjoint approaches. Joint approaches tackle both the visual and the inertial parameters together by aligning observations from feature-bearing points based on IMU integration then use a closed-form solution with visual and acceleration observations to find initial velocity and gravity. In contrast, disjoint approaches independently solve the Structure from Motion (SFM) problem and determine inertial parameters from up-to-scale camera poses obtained from pure monocular SLAM. However, previous disjoint methods have limitations, like assuming negligible acceleration bias impact or accurate rotation estimation by pure monocular SLAM. To address these issues, we propose EDI, a novel approach for fast, accurate, and robust visual-inertial initialization. Our method incorporates an Error-state Kalman Filter (ESKF) to estimate gyroscope bias and correct rotation estimates from monocular SLAM, overcoming dependence on pure monocular SLAM for rotation estimation. To estimate the scale factor without prior information, we offer a closed-form solution for initial velocity, scale, gravity, and acceleration bias estimation. To address gravity and acceleration bias coupling, we introduce weights in the linear least-squares equations, ensuring acceleration bias observability and handling outliers. Extensive evaluation on the EuRoC dataset shows that our method achieves an average scale error of 5.8% in less than 3 seconds, outperforming other state-of-the-art disjoint visual-inertial initialization approaches, even in challenging environments and with artificial noise corruption.
|
|
08:42-08:48, Paper MoAT15.3 | Add to My Program |
SELVO: A Semantic-Enhanced Lidar-Visual Odometry |
|
Jiang, Kun | UCAS |
Gao, Shuang | OPPO Research Institute |
Zhang, Xudong | OPPO Research Institute |
Li, Jijunnan | OPPO Research Institute |
Guo, Yandong | OPPO Research Institute |
Shijie, Liu | Hangzhou Institute for Advanced Study, UCAS |
Li, Chunlai | Shanghai Institute of Technical Physics (SITP) , Chinese Academy |
Wang, Jianyu | Shanghai Institute of Technical Physics of the Chinese Academy O |
Keywords: SLAM, Localization, Computer Vision for Automation
Abstract: In the face of complex external environment, single sensor information can no longer meet the accuracy requirements of low-drift SLAM. In this paper, we focus on the fusion scheme of cameras and lidar, and explore the gain of semantic information to SLAM system. A Semantic-Enhanced Lidar-Visual Odometry (SELVO) is proposed to achieve pose estimation with high accuracy and robustness by applying semantics and utilizing strategies of initialization and sensor fusion. In loop closure detection thread, we propose a novel place recognition method based on semantic information to maintain the global consistency of the map. In the back-end, we design a joint optimization framework including visual odometry, lidar odometry and loop closure detection, and innovatively propose to recognize degraded scenes with semantic information. We have conducted a large number of experiments on KITTI and KITTI-360 dataset, and the results show that our system can achieve the high accuracy and competitive performance in comparison with state-of-the-art methods.
|
|
08:48-08:54, Paper MoAT15.4 | Add to My Program |
LIWO: Lidar-Inertial-Wheel Odometry |
|
Yuan, Zikang | Huazhong University, Wuhan, 430073, China |
Lang, Fengtian | Huazhong University of Science and Technology |
Xu, Tianle | Huazhong University of Science and Technology |
Yang, Xin | Huazhong University of Science and Technology |
Keywords: SLAM, Localization
Abstract: LiDAR-inertial odometry (LIO), which fuses complementary information of a LiDAR and an Inertial Measurement Unit (IMU), is an attractive solution for state estimation.In LIO, both pose and velocity are regarded as state variables that need to be solved. However, the widely-used Iterative Closest Point (ICP) algorithm can only provide constraint for pose, while the velocity can only be constrained by IMU pre-integration. As a result, the velocity estimates inclined to be updated accordingly with the pose results. In this paper, we propose LIWO, an accurate and robust LiDAR-inertialwheel (LIW) odometry, which fuses the measurements from LiDAR, IMU and wheel encoder in a bundle adjustment (BA) based optimization framework. The involvement of a wheel encoder could provide velocity measurement as an important observation, which assists LIO to provide a more accurate state prediction. In addition, constraining the velocity variable by the observation from wheel encoder in optimization can further improve the accuracy of state estimation. Experiment results on two public datasets demonstrate that our system outperforms all state-of-the-art LIO systems in terms of smaller absolute trajectory error (ATE), and embedding a wheel encoder can greatly improve the performance of LIO based on the BA framework.
|
|
08:54-09:00, Paper MoAT15.5 | Add to My Program |
VIW-Fusion: Extrinsic Calibration and Pose Estimation for Visual-IMU-Wheel Encoder System |
|
Qiao, Chunxiao | Northeastern University, College of Information Science and Engi |
Zhao, Shuying | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Wang, Yahui | UISEE (Beijing) Ltd |
Zhang, Dan | Uisee Technology (Beijing) Co., Ltd |
Keywords: Visual-Inertial SLAM, Localization, Sensor Fusion
Abstract: The data fusion of camera, IMU, and wheel encoder measurements has proved its effectiveness in localizing ground robots, and obtaining accurate sensor extrinsic parameters is its premise. We propose an extrinsic parameter calibration algorithm and a multi-sensor-based pose estimation algorithm for the camera-IMU-wheel encoder system. First, we propose a joint calibration algorithm for the extrinsic parameters of the camera-IMU-wheel encoder system, which improves the accuracy and robustness of the camera-wheel encoder calibration. We then extend the visual-inertial odometry (VIO) to incorporate the measurements from the wheel encoder and weight the wheel encoder measurements according to angular velocity in global optimization to improve the performance. We further propose a novel method for VIO initialization by integrating wheel encoder information, which significantly reduces the scale error in initialization. We conduct extrinsic parameter calibration experiments on a real self-driving car and validate the performance of our multi-sensor-based localization system on the KAIST dataset and a dataset collected by our self-driving vehicles by performing an exhaust comparison with the state-of-the-art algorithms. Our implementations are open source https://github.com/chunxiaoqiao/VIW-Fusion.git.
|
|
09:00-09:06, Paper MoAT15.6 | Add to My Program |
LiDAR-Inertial SLAM with Efficiently Extracted Planes |
|
Chen, Chao | Zhejiang University |
Wu, Hangyu | Zhejiang University |
Ma, Yukai | Zhejiang Unicersity |
Lv, Jiajun | Zhejiang University |
Li, Laijian | Zhejiang University |
Liu, Yong | Zhejiang University |
Keywords: Mapping, Localization, SLAM
Abstract: This paper proposes a LiDAR-Inertial SLAM with efficiently extracted planes, which couples the planes in the odometry to improve accuracy and in the mapping for consistency. The proposed method consists of three parts: an efficient PointtoLinetoPlane extraction algorithm, a LiDAR-Inertial-Plane tightly coupled odometry, and plane-aided mapping with global planes. Specifically, we leverage the ring field of the LiDAR point cloud to accelerate the region-growing-based plane extraction algorithm. We propose a plane-distance-insensitive criterion for better plane association. We tightly coupled the IMU pre-integration factor, LiDAR odometry factor, and plane factor in the odometry to obtain a more accurate initial pose for mapping. Furthermore, we propose a plane map management strategy based on spatial voxel hashing to improve the speed and accuracy of global map plane associations.Experimental results show that our plane extraction method is efficient, and the proposed plane-aided LiDAR-Inertial SLAM significantly improves the accuracy and consistency compared to the other state-of-the-art algorithms with only a small increase in time consumption.
|
|
09:06-09:12, Paper MoAT15.7 | Add to My Program |
Learning to Map Efficiently by Active Echolocation |
|
Hu, Xixi | UT Austin |
Purushwalkam, Senthil | Salesforce Research |
Harwath, David | UT Austin |
Grauman, Kristen | UT Austin and Facebook AI Research |
Keywords: Audio-Visual SLAM, SLAM
Abstract: Using visual SLAM to map new environments requires time-consuming visits to all regions for data collection. We propose an approach to estimate maps of areas beyond the visible regions using a cheap and readily available modality of data---sound. We introduce the idea of an active audio-visual mapping agent. Besides collecting visual data, the proposed agent emits sounds during navigation, captures the echoes, and uses them to accurately map unknown areas. We propose a reinforcement learning-based method that simultaneously trains models to 1) estimate a map from the visual data, 2) output navigation actions, 3) output the decision to emit a sound and 4) refine estimated maps using the captured audio. Our agent is trained and tested on 85 real-world homes from the Matterport3D dataset using the Habitat and SoundSpaces simulators for visual and audio data. Our method, unlike visual-data reliant approaches, yields more accurate maps with broader environmental coverage. In addition, compared to an agent that continually emits sounds, we observe that intelligently choosing emph{when} to emit sounds leads to accurate maps obtained with greater efficiency.
|
|
09:12-09:18, Paper MoAT15.8 | Add to My Program |
Visual-LiDAR-Inertial Odometry: A New Visual-Inertial SLAM Method Based on an iPhone 12 Pro |
|
Ye, Cang | Virginia Commonwealth University |
Jin, Lingqiu | Virginia Commonwealth University |
Keywords: Visual-Inertial SLAM, Range Sensing
Abstract: As today’s smartphone integrates various imaging sensors and Inertial Measurement Units (IMU) and becomes computationally powerful, there is a growing interest in developing smartphone-based visual-inertial (VI) SLAM methods for robotics and computer vision applications. In this paper, we introduce a new SLAM method, called Visual-LiDAR-Inertial Odometry (VLIO), based on an iPhone 12 Pro. VLIO formulates device pose estimation as an optimization problem that minimizes a cost function based on the residuals of the inertial, visual, and depth measurements. We present the first work that 1) characterizes the iPhone’s LiDAR in depth measurement and identifies the models for the measurement error and standard deviation, and 2) characterizes pose change estimation with LiDAR data. The measurement models are then used to compute the depth-related and visual-feature-related residuals for the cost function. Also, VLIO tracks varying camera intrinsic parameters (CIP) in real-time and uses them in computing these residuals. Both approaches result in more accurate residual terms and thus more accurate pose estimation. The CIP tracking method eliminates the need of a sophisticated model-fitting process that includes camera calibration and paring of the CIPs and IMU measurements with various phone orientations. Experimental results validate the efficacy of VLIO.
|
|
09:18-09:24, Paper MoAT15.9 | Add to My Program |
Optimization-Based VINS: Consistency, Marginalization, and FEJ |
|
Chen, Chuchu | University of Delaware |
Geneva, Patrick | University of Delaware |
Peng, Yuxiang | University of Delaware |
Lee, Woosik | University of Delaware |
Huang, Guoquan | University of Delaware |
Keywords: Visual-Inertial SLAM, Localization, SLAM
Abstract: In this work, we present a comprehensive analysis of the application of the First-estimates Jacobian (FEJ) design methodology in nonlinear optimization-based Visual-Inertial Navigation Systems (VINS). The FEJ approach fixes system linearization points to preserve proper observability properties of VINS and has been shown to significantly improve the estimation performance of state-of-the-art filtering-based methods. However, its direct application to optimization-based estimators holds challenges and pitfalls, which we addressed in this paper. Specifically, we carefully examine the observability and its relation to inconsistency and FEJ, based on this, we explain how to properly apply and implement FEJ within four marginalization archetypes commonly used in non-linear optimization-based frameworks. FEJ's effectiveness and applications to VINS are investigated and demonstrate significant performance improvements. Additionally, we offer a detailed discussion of results and guidelines on how to properly implement FEJ in optimization-based estimators.
|
|
09:24-09:30, Paper MoAT15.10 | Add to My Program |
Visual-Inertial-Laser-Lidar (VILL) SLAM: Real-Time Dense RGB-D Mapping for Pipe Environments |
|
Tian, Yu | Carnegie Mellon University |
Wang, Luyuan | Carnegie Mellon University |
Yan, Xinzhi | Carnegie Mellon University |
Ruan, Fujun | Carnegie Mellon University |
Ganapathy Subramanian, Jaya Aadityaa | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Li, Lu | Carnegie Mellon University |
Keywords: Visual-Inertial SLAM, RGB-D Perception, Sensor Fusion
Abstract: Robotic solutions for pipeline inspection promise enhancement of human labor by automating data acquisition for pipe condition assessments, which are vital for the early detection of pipe anomalies and the prevention of hazardous leakages and explosions. Through simultaneous localization and mapping (SLAM), colorized 3D reconstructions of the pipe's inner surface can be generated, providing a more comprehensive digital record of the pipes compared to conventional vision-only inspection. Designed for generic environments, most SLAM methods suffer limited accuracy and substantial accumulative drift in confined and featureless spaces such as pipelines, due to a lack of suitable sensor hardware and state estimation techniques. In this research, we present VILL-SLAM: a dense RGB-D SLAM algorithm that combines a monocular camera (V), an inertial sensor (I), a ring-shaped laser profiler (L), and a Lidar (L) into a compact sensor package optimized for in-pipe operations. By fusing complementary visual and depth information from the color camera, laser profiling, and Lidar measurement, our method overcomes the challenges of metric scale mapping in conventional SLAM methods, despite its monocular configuration. To further improve localization accuracy, we utilize the pipe geometry to formulate two unique optimization factors that effectively constrain odometer drift. To validate our method, we conducted real-world experiments in physical pipes, comparing the performance of our approach against other state-of-the-art algorithms. The proposed SLAM framework achieved 6.6 times drift improvement with 0.84% mean odometry drift over 22 meters and a mean pointwise 3D scanning error of 0.88mm in 12-inch diameter pipes. This research represents a significant advancement in miniature in-pipe inspection, localization, and mapping sensing techniques. It has the potential to become a core enabling technology for the next generation of highly capable in-pipe robots, capable of reconstructing photo-realistic 3D pipe scans and providing disruptive pipe locating and georeferencing capabilities.
|
|
09:30-09:36, Paper MoAT15.11 | Add to My Program |
Know What You Don't Know: Consistency in Sliding Window Filtering with Unobservable States Applied to Visual-Inertial SLAM |
|
Lisus, Daniil | University of Toronto |
Cohen, Mitchell | McGill University |
Forbes, James Richard | McGill University |
Keywords: Visual-Inertial SLAM, Autonomous Vehicle Navigation, SLAM
Abstract: Estimation algorithms, such as the sliding window filter, produce an estimate and uncertainty of desired states. This task becomes challenging when the problem involves unobservable states. In these situations, it is critical for the algorithm to ``know what it doesn't know'', meaning that it must maintain the unobservable states as unobservable during algorithm deployment. This letter presents general requirements for maintaining consistency in sliding window filters involving unobservable states. The value of these requirements when designing a navigation solution is experimentally shown within the context of visual-inertial SLAM making use of IMU preintegration.
|
|
09:36-09:42, Paper MoAT15.12 | Add to My Program |
Versatile LiDAR-Inertial Odometry with SE(2) Constraints for Ground Vehicles |
|
Jiaying, Chen | Nanyang Technological University |
Wang, Han | Nanyang Technological University |
Hu, Minghui | Nanyang Technological University |
Suganthan, Ponnuthurai Nagaratnam | Nanyang Technological University |
Keywords: SLAM, Localization, Industrial Robots
Abstract: LiDAR SLAM has become one of the major localization systems for ground vehicles since LiDAR Odometry And Mapping (LOAM). Many extension works on LOAM mainly leverage one specific constraint to improve the performance, e.g., information from on-board sensors such as loop closure and inertial state; prior conditions such as ground level and motion dynamics. In many robotic applications, these conditions are often known partially, hence a SLAM system can be a comprehensive problem due to the existence of numerous constraints. Therefore, we can achieve a better SLAM result by fusing them properly. In this paper, we propose a hybrid LiDAR-inertial SLAM framework that leverages both the on-board perception system and prior information such as motion dynamics to improve localization performance. In particular, we consider the case for ground vehicles, which are commonly used for autonomous driving and warehouse logistics. We present a computationally efficient LiDAR-inertial odometry method that directly parameterizes ground vehicle poses on SE(2). The out-of-SE(2) motion perturbations are not neglected but incorporated into an integrated noise term of a novel SE(2)-constraints model. For odometric measurement processing, we propose a versatile, tightly coupled LiDAR-inertial odometry to achieve better pose estimation than traditional LiDAR odometry.
|
|
09:42-09:48, Paper MoAT15.13 | Add to My Program |
ESVIO: Event-Based Stereo Visual Inertial Odometry |
|
Chen, Peiyu | The University of Hong Kong |
Guan, Weipeng | The University of Hong Kong |
Lu, Peng | The University of Hong Kong |
Keywords: Visual-Inertial SLAM, Sensor Fusion, Aerial Systems: Perception and Autonomy
Abstract: Event cameras that asynchronously output low-latency event streams provide great opportunities for state estimation under challenging situations. Despite event-based visual odometry having been extensively studied in recent years, most of them are based on monocular and few research on stereo event vision. In this paper, we present ESVIO, the first event-based stereo visual-inertial odometry, which leverages the complementary advantages of event streams, standard images and inertial measurements. Our proposed pipeline achieves spatial and temporal associations between consecutive stereo event streams, thereby obtaining robust state estimation. In addition, the motion compensation method is designed to emphasize the edge of scenes by warping each event to reference moments with IMU and ESVIO back-end. We validate that both ESIO (purely event-based) and ESVIO (event with image-aided) have superior performance compared with other image-based and event-based baseline methods on public and self-collected datasets. Furthermore, we use our pipeline to perform onboard quadrotor flights under low-light environments. A real-world large-scale experiment is also conducted to demonstrate long-term effectiveness. We highlight that this work is a real-time, accurate system that is aimed at robust state estimation under challenging environments.
|
|
MoAT16 Regular session, 330A |
Add to My Program |
Autonomous Agents |
|
|
Chair: Xiao, Jing | Worcester Polytechnic Institute (WPI) |
Co-Chair: Keren, Sarah | Technion - Israel Institute of Technology |
|
08:30-08:36, Paper MoAT16.1 | Add to My Program |
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds |
|
Younes, Abdelrahman | KIT |
Honerkamp, Daniel | Albert Ludwigs Universität Freiburg |
Welschehold, Tim | Albert-Ludwigs-Universität Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Autonomous Agents, Reactive and Sensor-Based Planning, Reinforcement Learning
Abstract: Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment. While recent approaches have demonstrated the benefits of audio input to detect and find the goal, they focus on clean and static sound sources and struggle to generalize to unheard sounds. In this work, we propose the novel dynamic audio-visual navigation benchmark which requires catching a moving sound source in an environment with noisy and distracting sounds, posing a range of new challenges. We introduce a reinforcement learning approach that learns a robust navigation policy for these complex settings. To achieve this, we propose an architecture that fuses audio-visual information in the spatial feature space to learn correlations of geometric information inherent in both local maps and audio signals. We demonstrate that our approach consistently outperforms the current state-of-the-art by a large margin across all tasks of moving sounds, unheard sounds, and noisy environments, on two challenging 3D scanned real-world environments, namely Matterport3D and Replica. The benchmark is available at http://dav-nav.cs.uni-freiburg.de.
|
|
08:36-08:42, Paper MoAT16.2 | Add to My Program |
Joint Imitation Learning of Behavior Decision and Control for Autonomous Intersection Navigation |
|
Zhu, Zeyu | Key Labarotary of Machine Perception, Peking University |
Zhao, Huijing | Peking University |
Keywords: Autonomous Agents, Control Architectures and Programming
Abstract: Modern autonomous driving systems face substantial challenges when navigating dense intersections due to the high uncertainty introduced by other road users. Due to the complexity of the task, the autonomous vehicle needs to generate policies at multiple levels of abstraction. However, previous deep imitation learning methods focused on learning control policies while using simple rule-based behavior models. To bridge this gap and achieve human-like driving, we develop a hierarchy of high-level behavior decision and low-level control, where both policies are jointly learned from human demonstrations based on imitation learning. Over 60 hours of driving data from 10 drivers at six intersections was collected. The proposed method is extensively evaluated in challenging intersection scenarios. Empirical results demonstrate the method's superior performance over baselines in terms of task completion and control quality. We demonstrate the importance of learning human-like behavior decisions as well as joint learning of behavior and control policies. The capability of imitating different driving styles is also illustrated.
|
|
08:42-08:48, Paper MoAT16.3 | Add to My Program |
Improving the Performance of Backward Chained Behavior Trees That Use Reinforcement Learning |
|
Kartašev, Mart | KTH Royal Institute of Technology |
Salér, Justin | KTH |
Ogren, Petter | Royal Institute of Technology (KTH) |
Keywords: Behavior-Based Systems, Autonomous Agents, Control Architectures and Programming
Abstract: In this paper we show how to improve the performance of backward chained behavior trees (BTs) that include policies trained with reinforcement learning (RL). BTs represent a hierarchical and modular way of combining control policies into higher level control policies. Backward chaining is a design principle for the construction of BTs that combines reactivity with goal directed actions in a structured way. The backward chained structure has also enabled convergence proofs for BTs, identifying a set of local conditions to be satisfied for the convergence of all trajectories to a set of desired goal states. The key idea of this paper is to improve performance of backward chained BTs by using the conditions identified in a theoretical convergence proof to configure the RL problems for individual controllers. Specifically, previous analysis identified so-called active constraint conditions (ACCs), that should not be violated in order to avoid having to return to work on previously achieved subgoals. We propose a way to set up the RL problems, such that they do not only achieve each immediate subgoal, but also avoid violating the identified ACCs. The resulting performance improvement depends on how often ACC violations occurred before the change, and how much effort, in terms of execution time, was needed to re-achieve them. The proposed approach is illustrated in a dynamic simulation environment.
|
|
08:48-08:54, Paper MoAT16.4 | Add to My Program |
Fast Decision Support for Air Traffic Management at Urban Air Mobility Vertiports Using Graph Learning |
|
KrisshnaKumar, Prajit | University at Buffalo |
Witter, Jhoel | University at Buffalo |
Paul, Steve | University at Buffalo |
Cho, Hanvit | State University of New York at Buffalo |
Dantu, Karthik | University of Buffalo |
Chowdhury, Souma | University at Buffalo, State University of New York |
Keywords: Intelligent Transportation Systems, Multi-Robot Systems, Reinforcement Learning
Abstract: Urban Air Mobility (UAM) promises a new dimension to decongested, safe, and fast travel in urban and suburban hubs. These UAM aircraft are conceived to operate from small airports called vertiports each comprising multiple take-off/landing and battery-recharging spots. Since they might be situated in dense urban areas and need to handle many aircraft landings and take-offs each hour, managing this schedule in real-time becomes challenging for a traditional air-traffic controller but instead calls for an automated solution. This paper provides a novel approach to this problem of Urban Air Mobility - Vertiport Schedule Management (UAM-VSM), which leverages graph reinforcement learning to generate decision-support policies. Here the designated physical spots within the vertiport's airspace and the vehicles being managed are represented as two separate graphs, with feature extraction performed through a graph convolutional network (GCN). Extracted features are passed onto perceptron layers to decide actions such as continue to hover or cruise, continue idling or take-off, or land on an allocated vertiport spot. Performance is measured based on delays, safety (no. of collisions) and battery consumption. Through realistic simulations in AirSim applied to scaled down multi-rotor vehicles, our results demonstrate the suitability of using graph reinforcement learning to solve the UAM-VSM problem and its superiority to basic reinforcement learning (with graph embeddings) or random choice baselines.
|
|
08:54-09:00, Paper MoAT16.5 | Add to My Program |
Scaling Vision-Based End-To-End Autonomous Driving with Multi-View Attention Learning |
|
Xiao, Yi | Computer Vision Center, Universitat Autònoma De Barcelona |
Codevilla, Felipe | Mila/ Independent Robotics |
Porres, Diego | Computer Vision Center, Universitat Autònoma De Barcelona |
Lopez, Antonio M. | Computer Vision Center, Universitat Autonoma De Barcelona |
Keywords: Autonomous Agents, Imitation Learning, Intelligent Transportation Systems
Abstract: On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.
|
|
09:00-09:06, Paper MoAT16.6 | Add to My Program |
Value of Assistance for Mobile Agents |
|
Amuzig, Adi | Technion - Israel Institute of Technology |
Dovrat, David | Technion |
Keren, Sarah | Technion - Israel Institute of Technology |
Keywords: Autonomous Agents, Probability and Statistical Methods, Localization
Abstract: Mobile robotic agents often suffer from localization uncertainty which grows with time and with the agents' movement. This can hinder their ability to accomplish their task. In some settings, it may be possible to perform assistive actions that reduce uncertainty about a robot’s location. For example, in a collaborative multi-robot system, a wheeled robot can request assistance from a drone that can fly to its estimated location and reveal its exact location on the map or accompany it to its intended location. Since assistance may be costly and limited, and may be requested by different members of a team, there is a need for principled ways to support the decision of which assistance to provide to an agent and when, as well as to decide which agent to help within a team. For this purpose, we propose Value of Assistance (VOA) to represent the expected cost reduction that assistance will yield at a given point of execution. We offer ways to compute VOA based on estimations of the robot's future uncertainty, modeled as a Gaussian process. We specify conditions under which our VOA measures are valid and empirically demonstrate the ability of our measures to predict the agent's average cost reduction when receiving assistance in both simulated and real-world robotic settings.
|
|
09:06-09:12, Paper MoAT16.7 | Add to My Program |
Feature Explanation for Robust Trajectory Prediction |
|
Zhai, Xukai | Wuhan University of Technology |
Hu, Renze | Wuhan University of Technology |
Yin, Zhishuai | Wuhan Universuty of Technology |
Keywords: Autonomous Agents, AI-Based Methods, Deep Learning Methods
Abstract: Trajectory prediction of neighboring agents is a critical task for high-speed robotics such as autonomous vehicles. In order to obtain fine-grained and robust scene representations, existing works attempt to consider abundant information that is deemed relevant. The cost, however, is the heavy computational burden and more importantly the inevitable interference brought by redundant information. In this paper, we exploit the explainable AI (XAI) techniques and propose a model in the framework of "Encoder-Decoder" named parallel explainable Transformer (PXT) to identify the contributive features for robust trajectory prediction. A two-branch encoder is designed to disentangle the roadway information and agents’ historical trajectories for better feature explanation. Two stages of feature explanation are incorporated into the encoder. In the first stage, an explainable Transformer (XT) comprising a Layer-wise Relevance Propagation (LRP)-based interpretation module is designed and implemented in both branches to score and filter the contextual and motion features. In the second stage of interpretation, the ProbSparse attention mechanism is innovatively adopted to measure the level of interactivity with sparsity, so that the relationships among highly interactive agents are focused on. The results on the Argoverse Benchmark show that our proposal achieves state-of-the-art (SOTA) performance without delicate and tedious network design, demonstrating the effectiveness of tracing and retaining contributive features in enhancing the performance of trajectory prediction.
|
|
09:12-09:18, Paper MoAT16.8 | Add to My Program |
Adversarial Driving Behavior Generation Incorporating Human Risk Cognition for Autonomous Vehicle Evaluation |
|
Liu, Zhen | Jilin University |
Gao, Hang | Jilin University |
Ma, Hao | Jilin University |
Cai, Shuo | Jilin University |
Hu, Yunfeng | Jilin University |
Qu, Ting | Jilin University |
Chen, Hong | Tongji University |
Gong, Xun | Jilin University |
Keywords: Autonomous Agents, Cognitive Modeling, Reinforcement Learning
Abstract: Autonomous vehicle (AV) evaluation has been the subject of increased interest in recent years both in industry and in academia. This paper focuses on the development of a novel framework for generating adversarial driving behavior of background vehicle interfering against the AV to expose effective and rational risky events. Specifically, the adversarial behavior is learned by a reinforcement learning (RL) approach incorporated with the cumulative prospect theory (CPT) which allows representation of human risk cognition. Then, the extended version of deep deterministic policy gradient (DDPG) technique is proposed for training the adversarial policy while ensuring training stability as the CPT action-value function is leveraged. A comparative case study regarding the cut-in scenario is conducted on a high fidelity Hardware-in-the-Loop (HiL) platform and the results demonstrate the adversarial effectiveness to infer the weakness of the tested AV.
|
|
09:18-09:24, Paper MoAT16.9 | Add to My Program |
Predicting Center of Mass by Iterative Pushing for Object Transportation and Manipulation |
|
Hyland, Steven Michael | Worcester Polytechnic Institute |
Xiao, Jing | Worcester Polytechnic Institute (WPI) |
Onal, Cagdas | WPI |
Keywords: Autonomous Agents, Wheeled Robots, Manipulation Planning
Abstract: Robotic manipulation tasks rely on a plethora of environmental and payload information. One critical piece of information for accurate manipulation is the center of mass (CoM) of the object, which is essential for estimating the dynamic response of the system and determining the payload placement. Traditionally, the CoM of a payload is provided prior to manipulation. In order to create a more robust and comprehensive system, this information should be collected by the robotic agent before or during the task run time. This paper presents a method for approximating the CoM of a planar object using a small-scale mobile robot to inform manipulation tasks. On average, our system is able to converge on a CoM estimate in under 30 seconds in simulation and 20 seconds in experiment, with a relative error of 4.95% and 5.46%, respectively.
|
|
09:24-09:30, Paper MoAT16.10 | Add to My Program |
The Impact of Overall Optimization on Warehouse Automation |
|
Yoshitake, Hiroshi | Hitachi America Ltd |
Abbeel, Pieter | UC Berkeley |
Keywords: Discrete Event Dynamic Automation Systems, Reinforcement Learning, Multi-Robot Systems
Abstract: In this study, we propose a novel approach for investigating optimization performance by flexible robot coordination in automated warehouses with multi-agent reinforcement learning (MARL)-based control. Automated systems using robots are expected to achieve efficient operations compared with manual systems in terms of overall optimization performance. However, the impact of overall optimization on performance remains unclear in most automated systems due to a lack of suitable control methods. Thus, we proposed a centralized training-and-decentralized execution MARL framework as a practical overall optimization control method. In the proposed framework, we also proposed a single shared critic, trained with global states and rewards, applicable to a case in which heterogeneous agents make decisions asynchronously. Our proposed MARL framework was applied to the task selection of material handling equipment through automated order picking simulation, and its performance was evaluated to determine how far overall optimization outperforms partial optimization by comparing it with other MARL frameworks and rule-based control methods.
|
|
09:30-09:36, Paper MoAT16.11 | Add to My Program |
Kinematics-Only Differential Flatness Based Trajectory Tracking for Autonomous Racing |
|
Dighe, Yashom | University at Buffalo, State University of New York |
Kim, Youngjin | University at Buffalo |
Rajguru, Smit | State University of New York at Buffalo |
Turkar, Yash | University at Buffalo |
Singh, Tarunraj | University at Buffalo |
Dantu, Karthik | University of Buffalo |
Keywords: Autonomous Agents, Wheeled Robots, Kinematics
Abstract: In autonomous racing, accurately tracking the race line at the limits of handling is essential to guarantee competitiveness. In this study, we show the effectiveness of Differential Flatness based control for high-speed trajectory tracking for car-like robots. We compare the tracking performance of our controller against Nonlinear Model Predictive Control and resource use while running on embedded hardware and show that on average KFC reduces the computation resource usage by 50 % while performing on par with NMPC. Our implementation of the proposed controller, the simulation environment and detailed results is open-sourced on https://github.com/droneslab/
|
|
09:36-09:42, Paper MoAT16.12 | Add to My Program |
LEF: Late-To-Early Temporal Fusion for LiDAR 3D Object Detection |
|
He, Tong | Waymo LLC |
Sun, Pei | Waymo |
Leng, Zhaoqi | Waymo LLC |
Liu, Chenxi | Waymo |
Anguelov, Dragomir | Waymo |
Tan, Mingxing | Waymo Research |
Keywords: Autonomous Agents, Object Detection, Segmentation and Categorization, Semantic Scene Understanding
Abstract: We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds. Our main motivation is fusing object-aware latent embeddings into the early stages of a 3D object detector. This feature fusion strategy enables the model to better capture the shapes and poses for challenging objects, compared with learning from raw points directly. Our method conducts late-to-early feature fusion in a recurrent manner. This is achieved by enforcing window-based attention blocks upon temporally calibrated and aligned sparse pillar tokens. Leveraging bird's eye view foreground pillar segmentation, we reduce the number of sparse history features that our model needs to fuse into its current frame by 10x. We also propose a stochastic-length FrameDrop training technique, which generalizes the model to variable frame lengths at inference for improved performance without retraining. We evaluate our method on the widely adopted Waymo Open Dataset and demonstrate improvement on 3D object detection against the baseline model, especially for the challenging category of large objects.
|
|
09:42-09:48, Paper MoAT16.13 | Add to My Program |
Learning Behavior Trees from Planning Experts Using Decision Tree and Logic Factorization |
|
Gugliermo, Simona | Örebro Univeristy, Scania |
Schaffernicht, Erik | Örebro University, AASS Research Center |
Koniaris, Christos | Scania |
Pecora, Federico | Amazon Robotics |
Keywords: Behavior-Based Systems, Learning from Demonstration, Intelligent Transportation Systems
Abstract: The increased popularity of Behavior Trees (BTs) in different fields of robotics requires efficient methods for learning BTs from data instead of tediously handcrafting them. Recent research in learning from demonstration reported encouraging results that this paper extends, improves and generalizes to arbitrary planning domains. We propose BT-Factor as a new method for learning expert knowledge by representing it in a BT. Execution traces of previously manually designed plans are used to generate a BT employing a combination of decision tree learning and logic factorization techniques originating from circuit design. We test BT-Factor in an industrially-relevant simulation environment from a mining scenario and compare it against a state-of-the-art BT learning method. The results show that our method generates compact BTs easy to interpret, and capable to capture accurately the relations that are implicit in the training data.
|
|
MoAT17 Regular session, 330B |
Add to My Program |
Imitation Learning |
|
|
Chair: Igl, Maximilian | Waymo LLC |
Co-Chair: Cui, Yuchen | Stanford University |
|
08:30-08:36, Paper MoAT17.1 | Add to My Program |
Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks |
|
Ablett, Trevor | University of Toronto |
Chan, Bryan | University of Alberta |
Kelly, Jonathan | University of Toronto |
Keywords: Imitation Learning, Reinforcement Learning, Transfer Learning
Abstract: Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naı̈ve approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and BC, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.
|
|
08:36-08:42, Paper MoAT17.2 | Add to My Program |
Hierarchical Decision Transformer |
|
Correia, André | Universidade Da Beira Interior and NOVA LINCS |
Alexandre, Luís A. | Univ. Beira Interior and NOVA LINCS |
Keywords: Imitation Learning, Deep Learning Methods, Machine Learning for Robot Control
Abstract: Sequence models in reinforcement learning require task knowledge to estimate the task policy. This paper presents the hierarchical decision transformer (HDT). HDT is a hierarchical behavior cloning algorithm that improves the performance of transformer methods in imitation learning, improving their robustness to tasks with longer episodes and/or sparse rewards, without requiring task knowledge or user interaction currently present in the state-of-the-art. The high-level mechanism guides the low-level controller through the task by selecting sub-goals for the latter to reach. This sequence replaces the returns-to-go of previous methods, improving its performance overall, especially in tasks with longer episodes and scarcer rewards. We validate our method in multiple tasks of OpenAI Gym, D4RL, and RoboMimic benchmarks. Our method outperforms the baselines in twenty three out of thirty one settings of varied horizons and reward frequencies without prior task knowledge, showing the advantages of the hierarchical model approach for learning from demonstrations using a sequence model. We also evaluate the method on a reaching task on a physical robot.
|
|
08:42-08:48, Paper MoAT17.3 | Add to My Program |
ProDMPs: A Unified Perspective on Dynamic and Probabilistic Movement Primitives |
|
Li, Ge | Karlsruhe Institute of Technology (KIT) |
Jin, Zeqi | Karlsruhe Institute of Technology |
Volpp, Michael | Karlsruhe Institute of Technology |
Otto, Fabian | Bosch Center for AI, University of Tuebingen |
Lioutikov, Rudolf | Karlsruhe Institute of Technology |
Neumann, Gerhard | Karlsruhe Institute of Technology |
Keywords: Imitation Learning, Machine Learning for Robot Control
Abstract: Abstract— Movement Primitives (MPs) are a well-known concept to represent and generate modular trajectories. MPs can be broadly categorized into two types: (a) dynamics-based approaches that generate smooth trajectories from any initial state, e. g., Dynamic Movement Primitives (DMPs), and (b) probabilistic approaches that capture higher-order statistics of the motion, e. g., Probabilistic Movement Primitives (ProMPs). To date, however, there is no MP method that unifies both, i. e. that can generate smooth trajectories from an arbitrary initial state while capturing higher-order statistics. In this paper, we introduce a unified perspective of both approaches by solving the ODE underlying the DMPs. We convert expensive online numerical integration of DMPs into position and velocity basis functions that can be used to represent trajectories or trajectory distributions similar to ProMPs while maintaining all the properties of dynamical systems. Since we inherit the properties of both methodologies, we call our proposed model Probabilistic Dynamic Movement Primitives (ProDMPs). Additionally, we embed ProDMPs in deep neural network architecture and propose a new cost function for efficient end-to-end learning of higher-order trajectory statistics. To this end, we leverage Bayesian Aggregation for nonlinear iterative conditioning on sensory inputs. Our proposed model achieves smooth trajectory generation, goal-attractor convergence, correlation analysis, non-linear conditioning, and online re-planing in one framework. Our code can be found in https://github.com/BruceGeLi/ProDMP RAL.
|
|
08:48-08:54, Paper MoAT17.4 | Add to My Program |
Imitation-Guided Multimodal Policy Generation from Behaviourally Diverse Demonstrations |
|
Zhu, Shibei | Aalto University |
Kaushik, Rituraj | Aalto University, Finland |
Kaski, Samuel | Aalto University, University of Manchester |
Kyrki, Ville | Aalto University |
Keywords: Evolutionary Robotics, Imitation Learning, Reinforcement Learning
Abstract: Learning policies from multiple demonstrators is often difficult because different individuals perform the same task differently due to hidden factors such as preferences. In the context of policy learning, this leads to multimodal policies. Existing policy learning methods often converge to a single solution mode, failing to capture the diversity in the solution space. In this paper, we introduce an imitation-guided reinforcement learning framework to solve the multimodal policy learning problem from a limited number of state-only demonstrations. Then, we propose LfBD (Learning from Behaviourally diverse Demonstration), an algorithm that builds a parameterised solution space to capture the variability in the behaviour space defined by demonstrations. To this end, we define a projection function based on the state density distributions from demonstrations to define such space. Our goal is not only to learn how to solve the task as the human demonstrator but also to extrapolate beyond the provided demonstrations. In addition, we show that with our method, we can perform a post-hoc policy search in the built solution space to recover policies that satisfy specific constraints or to find a policy that matches a given (state-only) behaviour.
|
|
08:54-09:00, Paper MoAT17.5 | Add to My Program |
Model-Based Adversarial Imitation Learning from Demonstrations and Human Reward |
|
Huang, Jie | Ocean University of China |
Hao, Jiangshan | Ocean University of China |
Juan, Rongshun | Tianjin University |
Gomez, Randy | Honda Research Institute Japan Co., Ltd |
Nakamura, Keisuke | Honda Research Institute Japan Co., Ltd |
Li, Guangliang | Ocean University of China |
Keywords: Imitation Learning, Human-Robot Collaboration, Reinforcement Learning
Abstract: Reinforcement learning (RL) can potentially be applied to real-world robot control in complex and uncertain environments. However, it is difficult or even unpractical to design an efficient reward function for various tasks, especially those large and high-dimensional environments. Generative adversarial imitation learning (GAIL) --- a general model-free imitation learning method, allows robots to directly learn policies from expert trajectories in large and high-dimensional environments. However, GAIL is still sample inefficient in terms of environmental interaction. In this paper, to solve this problem, we propose a model-based adversarial imitation learning from demonstrations and human reward (MAILDH), a novel model-based interactive imitation framework combining the advantages of GAIL, interactive RL and model-based RL. We tested our method in eight physics-based discrete and continuous control tasks for RL. Our results show that MAILDH can greatly improve the sample efficiency and robustness compared to the original GAIL.
|
|
09:00-09:06, Paper MoAT17.6 | Add to My Program |
Interpretable Motion Planner for Urban Driving Via Hierarchical Imitation Learning |
|
Wang, Bikun | Horizon Robotics |
Wang, Zhipeng | Horizon Robotics |
Zhu, Chenhao | Horizon Robotics |
Zhang, Zhiqiang | Horizon Robotics |
Wang, Zhichen | Horizon Robotics |
Lin, Penghong | Horizon Robotics |
Liu, Jingchu | Horizon Robotics |
Zhang, Qian | Horizon Robotics |
Keywords: Imitation Learning, Computer Vision for Automation, Task and Motion Planning
Abstract: Learning-based approaches have achieved remarkable performance in the domain of autonomous driving. Leveraging the impressive ability of neural networks and large amounts of human driving data, complex patterns and rules of driving behavior can be encoded as a model to benefit the autonomous driving system. Besides, an increasing number of data-driven works have been studied in the decision-making and motion planning module. However, the reliability and the stability of the neural network is still full of uncertainty. In this paper, we introduce a hierarchical planning architecture including a high-level grid-based behavior planner and a low-level trajectory planner, which is highly interpretable and controllable. As the high-level planner is responsible for finding a consistent route, the low-level planner generates a feasible trajectory. We evaluate our method both in closed-loop simulation and real world driving, and demonstrate the neural network planner has outstanding performance in complex urban autonomous driving scenarios
|
|
09:06-09:12, Paper MoAT17.7 | Add to My Program |
Hierarchical Imitation Learning for Stochastic Environments |
|
Igl, Maximilian | Waymo LLC |
Shah, Punit | Waymo |
Mougin, Paul | Waymo |
Srinivasan, Sirish | ETH Zürich |
Gupta, Tarun | University of Oxford |
White, Brandyn | Waymo |
Shiarlis, Kyriacos | Waymo |
Whiteson, Shimon | Waymo |
Keywords: Imitation Learning, Representation Learning, Deep Learning Methods
Abstract: Many applications of imitation learning require the agent to generate the full distribution of observed behaviour in the training data. For example, to evaluate the safety of autonomous vehicles in simulation, accurate and diverse behaviour models of other road users are paramount. Existing methods that improve this distributional realism typically rely on hierarchical policies. These condition the policy on types such as goals or personas that give rise to the multi-modal behaviour. However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors. Because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors that are under the agent's control are encoded in the type. Encoding future information about external factors leads to inappropriate agent reactions during testing, when the future is unknown and types must be drawn randomly.
|
|
09:12-09:18, Paper MoAT17.8 | Add to My Program |
Efficient Deep Learning of Robust, Adaptive Policies Using Tube MPC-Guided Data Augmentation |
|
Zhao, Tong | Massachusetts Institute of Technology |
Tagliabue, Andrea | Massachusetts Institute of Technology |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Imitation Learning, Machine Learning for Robot Control, Robust/Adaptive Control
Abstract: The deployment of agile autonomous systems in challenging, unstructured environments requires adaptation capabilities and robustness to uncertainties. Existing robust and adaptive controllers, such as those based on model predictive control (MPC), can achieve impressive performance at the cost of heavy online onboard computations. Strategies that efficiently learn robust and onboard-deployable policies from MPC have emerged, but they still lack fundamental adaptation capabilities. In this work, we extend an existing efficient Imitation Learning (IL) algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties. The key idea of our approach consists in modifying the IL procedure by conditioning the policy on a learned lower-dimensional model/environment representation that can be efficiently estimated online. We tailor our approach to the task of learning an adaptive position and attitude control policy to track trajectories under challenging disturbances on a multirotor. Evaluations in simulation show that a high-quality adaptive policy can be obtained in about 1.3 hours. We additionally empirically demonstrate rapid adaptation to in- and out-of-training-distribution uncertainties, achieving a 6.1 cm average position error under wind disturbances that correspond to about 50% of the weight of the robot, and that are 36% larger than the maximum wind seen during training.
|
|
09:18-09:24, Paper MoAT17.9 | Add to My Program |
Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations |
|
Hao, Yilun | Stanford University |
Wang, Ruinan | Stanford University |
Cao, Zhangjie | Stanford University |
Wang, Zihan | Stanford University |
Cui, Yuchen | Stanford University |
Sadigh, Dorsa | Stanford University |
Keywords: Imitation Learning, Learning from Demonstration
Abstract: Multimodal demonstrations provide robots with an abundance of information to make sense of the world. However, such abundance may not always lead to good performance when it comes to learning sensorimotor control policies from human demonstrations. Extraneous data modalities can lead to state over-specification, where the state contains modalities that are not only useless for decision-making but also can change data distribution across environments. State over-specification leads to issues such as the learned policy not generalizing outside of the training data distribution. In this work, we propose Masked Imitation Learning (MIL) to address state over-specification by selectively using informative modalities. Specifically, we design a masked policy network with a binary mask to block certain modalities. We develop a bi-level optimization algorithm that learns this mask to accurately filter over-specified modalities. We demonstrate empirically that MIL outperforms baseline algorithms in simulated domains and effectively recovers the environment-invariant modalities on a multimodal dataset collected on a real robot. Videos and supplemental details are at: https://tinyurl.com/masked-il
|
|
09:24-09:30, Paper MoAT17.10 | Add to My Program |
Does Unpredictability Influence Driving Behavior? |
|
Samavi, Sepehr | University of Toronto |
Shkurti, Florian | University of Toronto |
Schoellig, Angela P. | TU Munich |
Keywords: Human-Aware Motion Planning, Imitation Learning
Abstract: In this paper we investigate the effect of the unpredictability of surrounding cars on an ego-car performing a driving maneuver. We use Maximum Entropy Inverse Reinforcement Learning to model reward functions for an ego-car conducting a lane change in a highway setting. We define a new feature based on the unpredictability of surrounding cars and use it in the reward function. We learn two reward functions from human data: a baseline and one that incorporates our defined unpredictability feature, then compare their performance with a quantitative and qualitative evaluation. Our evaluation demonstrates that incorporating the unpredictability feature leads to a better fit of human-generated test data. These results encourage further investigation of the effect of unpredictability on driving behavior.
|
|
09:30-09:36, Paper MoAT17.11 | Add to My Program |
From Temporal-Evolving to Spatial-Fixing: A Keypoints-Based Learning Paradigm for Visual Robotic Manipulation |
|
Riou, Kevin | Nantes University |
Dong, Kaiwen | China University of Mining and Technology, Xuzhou, 221116, China |
Subrin, Kévin | Université De Nantes / LS2N |
Sun, Yanjing | School of Information and Control Engineering, China University |
Le Callet, Patrick | Nante University |
Keywords: Imitation Learning, Representation Learning, Sensorimotor Learning
Abstract: The current learning pipelines for robotics ma- nipulation infer movement primitives sequentially along the temporal-evolving axis, which can result in an accumulation of prediction errors and subsequently cause the visual observa- tions to fall out of the training distribution. This paper proposes a novel hierarchical behavior cloning approach which tries to dissociate standard behaviour cloning (BC) pipeline to two stages. The intuition of this approach is to eliminate accumu- lation errors using a fixed spatial representation. At first stage, a high-level planner will be employed to translate the initial observation of the scene into task-specific spatial waypoints. Then, a low-level robotic path planner takes over the task of guiding the robot by executing a set of pre-defined elementary movements or actions known as primitives, with the goal of reaching the previously predicted waypoints. Our hierarchical keypoints-based paradigm aims to simplify existing temporal- evolving approach to a more simple way: directly spatialize the whole sequential primitives as a set of 8D waypoints only from the very first observation. Plentiful experiments demon- strate that our paradigm can achieve comparable results with Reinforcement Learning (RL) and outperforms existing offline BC approaches, with only a single-shot inference from the initial observation. Code and models are available at : https: //github.com/KevinRiou22/spatial-fixing-il
|
|
09:36-09:42, Paper MoAT17.12 | Add to My Program |
Disturbance Injection under Partial Automation: Robust Imitation Learning for Long-Horizon Tasks |
|
Tahara, Hirotaka | NARA Institute of Science and Technology |
Sasaki, Hikaru | Nara Institute of Science and Technology |
Oh, Hanbit | Nara Institute of Science and Technology |
Anarossi, Edgar | Nara Institute of Science and Technology |
Matsubara, Takamitsu | Nara Institute of Science and Technology |
Keywords: Imitation Learning, Learning from Demonstration
Abstract: Partial Automation (PA) with intelligent support systems has been introduced in industrial machinery and advanced automobiles to reduce the burden of long hours of human operation. Under PA, operators perform manual operations (providing actions) and operations that switch to automatic/manual mode (mode-switching). Since PA reduces the total duration of manual operation, these two action and mode-switching operations can be replicated by imitation learning with high sample efficiency. To this end, this paper proposes Disturbance Injection under Partial Automation (DIPA) as a novel imitation learning framework. In DIPA, mode and actions (in the manual mode) are assumed to be observables in each state and are used to learn both action and mode-switching policies. The above learning is robustified by injecting disturbances into the operator's actions to optimize the disturbance's level for minimizing the covariate shift under PA. We experimentally validated the effectiveness of our method for long-horizon tasks in two simulations and a real robot environment and confirmed that our method outperformed the previous methods and reduced the demonstration burden.
|
|
09:42-09:48, Paper MoAT17.13 | Add to My Program |
Training Robots without Robots: Deep Imitation Learning for Master-To-Robot Policy Transfer |
|
Kim, Heecheol | The University of Tokyo |
Ohmura, Yoshiyuki | The University of Tokyo |
Nagakubo, Akihiko | National Institute of Advanced IndustrialScienceandTechnology |
Kuniyoshi, Yasuo | The University of Tokyo |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Dual Arm Manipulation
Abstract: Deep imitation learning is promising for robot manipulation because it only requires demonstration samples. In this study, deep imitation learning is applied to tasks that require force feedback. However, existing demonstration methods have deficiencies; bilateral teleoperation requires a complex control scheme and is expensive, and kinesthetic teaching suffers from visual distractions from human intervention. This research proposes a new master-to-robot (M2R) policy transfer system that does not require robots to teach force feedback-based manipulation tasks to robots. The human directly demonstrates a task using a controller. This controller resembles the kinematic parameters of the robot arm and uses the same end-effector with force/torque (F/T) sensors to measure the force feedback. Using this controller, the operator can feel force feedback without a bilateral system. The proposed method can overcome domain gaps between the master and robot using gaze-based imitation learning and a simple calibration method. Furthermore, a Transformer is applied to infer policy from F/T sensory input. The proposed system was evaluated on a bottle-cap-opening task that requires force feedback.
|
|
09:48-09:54, Paper MoAT17.14 | Add to My Program |
Imitrob: Imitation Learning Dataset for Training and Evaluating 6D Object Pose Estimators |
|
Sedlar, Jiri | Czech Technical University |
Stepanova, Karla | Czech Technical University |
Skoviera, Radoslav | Czech Institute of Informatics, Robotics, and Cybernetics; Czech |
Behrens, Jan Kristof | Czech Technical University in Prague, CIIRC |
Tuna, Matúš | Comenius University in Bratislava |
Sejnova, Gabriela | Czech Technical University in Prague |
Sivic, Josef | Czech Technical University |
Babuska, Robert | Delft University of Technology |
Keywords: Imitation Learning, Object Detection, Segmentation and Categorization, Computer Vision for Manufacturing
Abstract: This paper introduces a dataset for training and evaluating methods for 6D pose estimation of hand-held tools in task demonstrations captured by a standard RGB camera. Despite the significant progress of 6D pose estimation methods, their performance is usually limited for heavily occluded objects, which is a common case in imitation learning, where the object is typically partially occluded by the manipulating hand. Currently, there is a lack of datasets that would enable the development of robust 6D pose estimation methods for these conditions. To overcome this problem, we collect a new dataset (Imitrob) aimed at 6D pose estimation in imitation learning and other applications where a human holds a tool and performs a task. The dataset contains image sequences of nine different tools and twelve manipulation tasks with two camera viewpoints, four human subjects, and left/right hand. Each image is accompanied by an accurate ground truth measurement of the 6D object pose obtained by the HTC Vive motion tracking device. The use of the dataset is demonstrated by training and evaluating a recent 6D object pose estimation method (DOPE) in various setups. The dataset and code are publicly available at http://imitrob.ciirc.cvut.cz/imitrobdataset.php.
|
|
MoAT18 Regular session, 331ABC |
Add to My Program |
Calibration and Identification |
|
|
Chair: Leutenegger, Stefan | Technical University of Munich |
Co-Chair: Lee, Dongjun | Seoul National University |
|
08:30-08:36, Paper MoAT18.1 | Add to My Program |
Accurate and Interactive Visual-Inertial Sensor Calibration with Next-Best-View and Next-Best-Trajectory Suggestion |
|
Choi, Christopher | Imperial College London |
Xu, Binbin | University of Toronto |
Leutenegger, Stefan | Technical University of Munich |
Keywords: Calibration and Identification, Visual-Inertial SLAM, SLAM
Abstract: Visual-Inertial (VI) sensors are popular in robotics, self-driving vehicles, and augmented and virtual reality applications. In order to use them for any computer vision or state-estimation task, a good calibration is essential. However, collecting informative calibration data in order to render the calibration parameters observable is not trivial for a non-expert. In this work, we introduce a novel VI calibration pipeline that guides a non-expert with the use of a graphical user interface and information theory in collecting informative calibration data with Next-Best-View and Next-Best-Trajectory suggestions to calibrate the intrinsics, extrinsics, and temporal misalignment of a VI sensor. We show through experiments that our method is faster, more accurate, and more consistent than state-of-the-art alternatives. Specifically, we show how calibrations with our proposed method achieve higher accuracy estimation results when used by state-of-the-art VI Odometry as well as VI-SLAM approaches.
|
|
08:36-08:42, Paper MoAT18.2 | Add to My Program |
A ROS-Based Kinematic Calibration Tool for Serial Robots |
|
Pascal, Caroline | ENSTA Paris |
Doaré, Olivier | UME ENSTA Paris |
Chapoutot, Alexandre | ENSTA Paris |
Keywords: Calibration and Identification, Software-Hardware Integration for Robot Systems, Kinematics
Abstract: The use of serial robots for industrial and research purposes is often limited by a flawed positioning accuracy, caused by the differences between the robot nominal model, and the real one. Such an issue can be solved by means of kinematic calibration, which is usually a tedious and intricate task. In this paper, we propose a complete kinematic calibration procedure relying on established geometric modeling, measurements design and parameters identification methods, as well as multiple integration tools, to provide a high adaptability and a simplified handling. The overall process was bundled up in a ROS-based modular and user-friendly package, whose main objective is to offer a smooth and fully integrated framework for the kinematic calibration of serial robots. Our solution was successfully tested using a motion tracking device, and allowed to increase the overall positioning accuracy of two different serial robots by 75% in a matter of hours.
|
|
08:42-08:48, Paper MoAT18.3 | Add to My Program |
FUSE-D: Framework for UAV System-Parameter Estimation with Disturbance Detection |
|
Böhm, Christoph | University Klagenfurt |
Weiss, Stephan | Universität Klagenfurt |
Keywords: Calibration and Identification, Force and Tactile Sensing, Autonomous Vehicle Navigation
Abstract: Modern unmanned aerial vehicles (UAVs) with sophisticated mechanics ask for extended online system identification to aid model-based controls in task execution. In addition, UAVs in adverse environmental conditions require a more detailed environmental disturbance understanding. The necessary combination of online system identification, sensor suite self-calibration, and external disturbance analysis to tackle these issues holistically is currently an open issue. Our proposed FUSE-D approach combines these elements based on a system model at the rotor-speed level and a single global pose sensor (e.g., a tracking system like Optitrack). Besides sensor intrinsics and extrinsics, the framework allows estimating the UAV’s rotor geometry, mass, moments of inertia, and the rotors’ aerodynamic properties, as well as an external force and where it acts on the UAV. The general formulation allows us to extend the approach to an N-rotor (multi-rotor) UAV and classify the type of external disturbance. We perform a detailed non-linear observability analysis for the 43 + 7N states and do a statistically relevant embedded hardware-in-the-loop performance analysis in the realistic simulation environment Gazebo with RotorS.
|
|
08:48-08:54, Paper MoAT18.4 | Add to My Program |
Multiplanar Self-Calibration for Mobile Cobot 3D Object Manipulation Using 2D Detectors and Depth Estimation |
|
Dang, Tuan | University Taxes at Arlington |
Nguyen, Khang | University of Texas at Arlington |
Huber, Manfred | University of Texas at Arlington |
Keywords: AI-Enabled Robotics, Human-Robot Collaboration, Software Architecture for Robotic and Automation
Abstract: Calibration is the first and foremost step in dealing with sensor displacement errors that can appear during extended operation and off-time periods to enable robot object manipulation with precision. In this paper, we present a novel multiplanar self-calibration between the camera system and the robot's end-effector for 3D object manipulation. Our approach first takes the robot end-effector as ground truth to calibrate the camera's position and orientation while the robot arm moves the object in multiple planes in 3D space, and a 2D state-of-the-art vision detector identifies the object's center in the image coordinates system. The transformation between world coordinates and image coordinates is then computed using 2D pixels from the detector and 3D known points obtained by robot kinematics. Next, an integrated stereo-vision system estimates the distance between the camera and the object, resulting in 3D object localization. We test our proposed method on the Baxter robot with two 7-DOF arms and a 2D detector that can run in real time on an onboard GPU. After self-calibrating, our robot can localize objects in 3D using an RGB camera and depth image. The source code is available at https://github.com/tuantdang/calib_cobot.
|
|
08:54-09:00, Paper MoAT18.5 | Add to My Program |
Labelling Lightweight Robot Energy Consumption: A Mechatronics-Based Benchmarking Metric Set |
|
Heredia, Juan | University of Southern Denmark |
Kirschner, Robin Jeanne | TU Munich, Institute for Robotics and Systems Intelligence |
Abdolshah, Saeed | Technical University of Munich |
Schlette, Christian | University of Southern Denmark (SDU) |
Haddadin, Sami | Technical University of Munich |
Mikkel, Kjærgaard | University of Southern Denmark |
Keywords: Performance Evaluation and Benchmarking, Energy and Environment-Aware Automation, Actuation and Joint Mechanisms
Abstract: Compliance with global guidelines for sustainable and responsible production in modern industry requires a comparative analysis of consumer devices' energy consumption (EC). This also holds true for the newly established generation of lightweight industrial robots (LIRs). To identify potential strategies for energy optimization, standardized benchmarking procedures are required. However, to the best of the authors' knowledge, there is currently no standardized method for benchmarking the EC of manipulators. In response to this need, we have developed a comprehensive benchmarking framework to evaluate the EC of various LIR designs, delving into the theoretical power consumption under both static and dynamic conditions. Our analysis has led to the proposal of seven proposed metrics—three static and four dynamic. The static metrics—controller consumption, joint electronics consumption, and mechanical brakes' consumption—evaluate the maintenance EC of the robot. Meanwhile, we suggest three dynamic metrics that gauge the system’s energy efficiency during motion, with or without payload. We extend this metrics selection by introducing the cost of transportation map for manipulators. For each of the metrics, we suggest a standardized measurement procedure based on state-of-the-art norms and literature. The metric set and experimental procedures are demonstrated using five manipulators (UR3e, UR5e, FR3, M0609, Gen3). Among the results, we can see interesting trends for future optimization of the electronic components and their architecture, e.g., reducing the robot's EC by decentralizing computation via low-consumption onboard controllers for basic tasks and external servers for complex ones.
|
|
09:00-09:06, Paper MoAT18.6 | Add to My Program |
The Role of Absolute Positioning Error in Hand-Eye Calibration and Robotic Guidance Systems: An Analysis |
|
Chalus, Michal | University of West Bohemia |
Vanicek, Ondrej | University of West Bohemia |
Liska, Jindrich | University of West Bohemia |
Keywords: Calibration and Identification, Computer Vision for Manufacturing, Industrial Robots
Abstract: Robotic manipulators deal with serious issues due to their absolute positioning error. This error is usually compensated by an operator in classical robot programming using the teach-and-play method. However, it has a significant effect on accuracy of robotic guidance systems (RGS) that automatically generate process tool trajectory based on the measured data from a sensor. In this paper, we firstly describe the various components of an RGS that affect its overall accuracy. We then introduce a proposed model for the calibration process (MCP) that can be used to analyze the effect of absolute positioning errors on the accuracy of hand-eye calibration, six-point calibration of a process tool and mutual transformation between these tools. Simulations were used to evaluate the proposed MCP model. The results of this analysis are crucial for the practical use of RGS.
|
|
09:06-09:12, Paper MoAT18.7 | Add to My Program |
Robotic Kinematic Calibration with Only Position Data and Consideration of Non-Geometric Errors Using POE-Based Model and Gaussian Mixture Models |
|
Luo, Xiao | The Chinese University of Hong Kong |
Xian, Yitian | The Chinese University of Hong Kong |
Lei, Man Cheong | The Chinese University of Hong Kong |
Li, Jian | The Chinese University of Hong Kong |
Xie, Ke | The Chinese University of Hong Kong |
Zou, Limin | The Chinese University of Hong Kong |
Li, Zheng | The Chinese University of Hong Kong |
Keywords: Calibration and Identification, Kinematics, Probability and Statistical Methods
Abstract: Kinematic calibration is crucial to improve the positioning accuracy of serial robots. This paper proposes a novel algorithm for robotic kinematic calibration based on an augmented product of exponentials (POE)-based kinematic model using Gaussian mixture models (GMMs) with only position data. In this algorithm, non-geometric errors that cannot be fitted by varying the parameters within the traditional robot model are also considered and compensated. This approach involving a three-stage calibration process which is used to identify the kinematic model parameters and to train the GMMs will be presented in this paper. Finally, this algorithm will be applied to two serial robots for simulation and experimental validation. The effectiveness of the proposed algorithm is verified from both results and significant improvement on error reduction from 26 % to 96% can be observed through the comparison with other existing approaches.
|
|
09:12-09:18, Paper MoAT18.8 | Add to My Program |
MOISST: Multimodal Optimization of Implicit Scene for SpatioTemporal Calibration |
|
Herau, Quentin | Huawei, University of Burgundy |
Piasco, Nathan | Huawei Technologies France |
Bennehar, Moussab | Lirmm - Umr 5506 |
Roldao, Luis | Huawei |
Tsishkou, Dzmitry | Huawei Technologies |
Migniot, Cyrille | U Bourgogne |
Vasseur, Pascal | Université De Picardie Jules Verne |
Demonceaux, Cédric | Université De Bourgogne |
Keywords: Sensor Fusion, Calibration and Identification, Computer Vision for Transportation
Abstract: With the recent advances in autonomous driving and the decreasing cost of LiDARs, the use of multimodal sensor systems is on the rise. However, in order to make use of the information provided by a variety of complimentary sensors, it is necessary to accurately calibrate them. We take advantage of recent advances in computer graphics and implicit volumetric scene representation to tackle the problem of multi-sensor spatial and temporal calibration. Thanks to a new formulation of the Neural Radiance Field (NeRF) optimization, we are able to jointly optimize calibration parameters along with scene representation based on radiometric and geometric measurements. Our method enables accurate and robust calibration from data captured in uncontrolled and unstructured urban environments, making our solution more scalable than existing calibration solutions. We demonstrate the accuracy and robustness of our method in urban scenes typically encountered in autonomous driving scenarios.
|
|
09:18-09:24, Paper MoAT18.9 | Add to My Program |
Automatic Spatial Radar Camera Calibration Via Geometric Constraints with Doppler-Optical Flow Fusion |
|
Ge, Jintian | Nanyang Technological University |
Yanxin, Zhou | Nanyang Technological University |
Lou, Baichuan | Nanyang Technological University |
Lv, Chen | Nanyang Technological University |
Keywords: Calibration and Identification, Sensor Fusion, Computer Vision for Automation
Abstract: Many intelligent robots use a combination of radar and camera sensors to capture environmental information. Robust and accurate perception highly relies on the result of multi-sensor calibration. Most current spatial calibration methods require a calibration board or a special marker as the target. In this paper, we provide a novel calibration method for RGBD camera and millimeter-wave radar, which automatically estimates the extrinsic parameters. Our proposed method includes the following two stages: rough extrinsic parameters are estimated by using object contours as geometric constraints, and meanwhile, the optimum is reached via optimizing based on the difference of velocity obtained from camera and radar. It only needs an object moving past sensors, but does not require for a calibration board. We validate our method through simulation experiments and real-world experiments. We construct a simulation environment in CARLA to verify the performance of our proposed method against different angles. Furthermore, different levels of zero mean Gaussian noise are added to evaluate the stability of our method. In addition, real-world experiments with different hardware setups are taken to verify the feasibility of our method in real-world conditions.
|
|
09:24-09:30, Paper MoAT18.10 | Add to My Program |
Extrinsic Calibration of Camera to LIDAR Using a Differentiable Checkerboard Model |
|
Fu, Lanke Frank Tarimo | University of Oxford |
Chebrolu, Nived | University of Oxford |
Fallon, Maurice | University of Oxford |
Keywords: Calibration and Identification
Abstract: Multi-modal sensing often involves determining correspondences between each domain’s signals, which in turn depends on the accurate extrinsic calibration of the sensors. Challengingly, the camera-LIDAR sensor modalities are quite dissimilar and the narrow field of view of most commercial LIDARs means that they observe only a partial view of the camera frustum. We present a framework for extrinsic calibration of a camera and a LIDAR using only a simple off-the-shelf checkerboard. It is designed to operate even when the LIDAR observes a significantly truncated portion of the checkerboard. Current state-of-the-art methods often require bespoke manufactured markers or full observation of the entire checkerboard in both camera and LIDAR data which is prohibitive. By contrast, our novel algorithm directly aligns the LIDAR intensity pattern to the camera-detected checkerboard pattern using our differentiable formulation. The key step for achieving accurate extrinsics estimation is the use of the spatial derivatives provided by the differentiable checkerboard pattern, and jointly optimizing over all views. In our experiments, we achieve calibration accuracy in the order of 2-4 mm and demonstrate a 30% error reduction compared to state-of-the-art approaches. We are able to achieve this improvement while using only partial LIDAR views of the checkerboard which allows for a simpler data capture process. We also demonstrate the generalizability of our approach to different combinations of LIDARs and cameras with varying sparsity patterns and noise levels.
|
|
09:30-09:36, Paper MoAT18.11 | Add to My Program |
Graph-Based Visual-Kinematic Fusion and Monte Carlo Initialization for Fast-Deployable Cable-Driven Robots |
|
Khorrambakht, Rooholla | New York University |
Damirchi, Hamed | University of Adelaide |
Dindarloo, Mohammad Reza | K. N. Toosi University of Technology |
Saki, Aria | K.N Toosi University of Tehcnology |
Khalilpour, S. Ahmad | K. N. Toosi University of Technology |
Taghirad, Hamid | K. N. Toosi University of Technology |
Weiss, Stephan | Universität Klagenfurt |
Keywords: Parallel Robots, Calibration and Identification, Sensor Fusion
Abstract: Ease of calibration and high-accuracy task-space state-estimation purely based on onboard sensors is a key requirement for enabling easily deployable cable robots in real-world applications. In this work, we incorporate the onboard camera and kinematic sensors to drive a statistical fusion framework that presents a unified localization and calibration system which requires no initial values for the kinematic parameters. This is achieved by formulating a Monte-Carlo algorithm that initializes a factor-graph representation of the calibration and localization problem. With this, we are able to jointly identify both the kinematic parameters and the visual odometry scale alongside their corresponding uncertainties. We demonstrate the practical applicability of the framework using our state-estimation dataset recorded with the ARAS-CAM suspended cable driven parallel robot, and published as part of this manuscript.
|
|
09:36-09:42, Paper MoAT18.12 | Add to My Program |
P2O-Calib: Camera-LiDAR Calibration Using Point-Pair Spatial Occlusion Relationship |
|
Wang, Su | Robert Bosch |
Zhang, Shini | Nanyang Technological University, Singapore |
Qiu, Xuchong | Bosch |
Keywords: Calibration and Identification, Sensor Fusion, Deep Learning Methods
Abstract: The accurate and robust calibration result of sensors is considered as an important building block to the follow-up research in the autonomous driving and robotics domain. The current works involving extrinsic calibration between 3D LiDARs and monocular cameras mainly focus on target-based and target-less methods. The target-based methods are often utilized offline because of restrictions, such as additional target design and target placement limits. The current target-less methods suffer from feature indeterminacy and feature mismatching in various environments. To alleviate these limitations, we propose a novel target-less calibration approach that is based on the 2D-3D edge point extraction using the occlusion relationship in 3D space. Based on the extracted 2D-3D point pairs, we further propose an occlusion-guided point-matching method that improves the calibration accuracy and reduces computation costs. To validate the effectiveness of our approach, we evaluate the method performance qualitatively and quantitatively on real images from the KITTI dataset. The results demonstrate that our method outperforms the existing target-less methods and achieves low error and high robustness that can contribute to the practical applications relying on high-quality Camera-LiDAR calibration.
|
|
09:42-09:48, Paper MoAT18.13 | Add to My Program |
Wrench Estimation of Modular Manipulator with External Actuation and Joint Locking |
|
Kim, Yonghyeok | Seoul National University |
Lee, Hasun | Seoul National University |
Lee, Jeongseob | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Aerial Systems: Mechanics and Control, Distributed Robot Systems, Force Control
Abstract: This paper proposes an external wrench estimation method for modular manipulator, where each link module is driven with external actuation (e.g., rotors, thrusters) and inter-module joints can be locked to increase end-effector stiffness or workforce of the manipulator. For such systems, the commonly-used momentum-based observer (MBO) is not suitable due to the presence of unknown joint locking (JL) torque and also the degeneracy of Jacobian transpose relation with the system degree-of-freedom (DOF) becoming less than six with the joint locking. To overcome this, we propose two novel external wrench estimation algorithms: distributed algorithm based on recursive Newton-Euler dynamics and centralized algorithm based on D'Alembert's principle, both using an F/T (force/torque) sensor at the base. Experiments are conducted to demonstrate the effectiveness of the proposed algorithms.
|
|
09:48-09:54, Paper MoAT18.14 | Add to My Program |
Observability-Aware Online Multi-Lidar Extrinsic Calibration |
|
Das, Sandipan | KTH |
af Klinteberg, Ludvig | Scania |
Fallon, Maurice | University of Oxford |
Chatterjee, Saikat | KTH Royal Institute of Technology |
Keywords: Calibration and Identification, Intelligent Transportation Systems, Localization
Abstract: Accurate and robust extrinsic calibration is necessary for deploying autonomous systems which need multiple sensors for perception. In this paper, we present a robust system for real-time extrinsic calibration of multiple lidars in vehicle base frame without the need for any fiducial markers or features. We base our approach on matching absolute GNSS and estimated lidar poses in real-time. Comparing rotation components allows us to improve the robustness of the solution than traditional least-square approach comparing translation components only. Additionally, instead of comparing all corresponding poses, we select poses comprising maximum mutual information based on our novel observability criteria. This allows us to identify a subset of the poses helpful for real-time calibration. We also provide stopping criteria for ensuring calibration completion. To validate our approach extensive tests were carried out on data collected using Scania test vehicles (7 sequences for a total of ~ 6.5 Km). The results presented in this paper show that our approach is able to accurately determine the extrinsic calibration for various combinations of sensor setups.
|
|
MoAT19 Regular session, 360 Ambassador Ballroom |
Add to My Program |
Deep Learning Methods I |
|
|
Chair: Arnold, Solvi | Shinshu Univeristy |
Co-Chair: Ben Amor, Heni | Arizona State University |
|
08:30-08:36, Paper MoAT19.1 | Add to My Program |
Recognising Affordances in Predicted Futures to Plan with Consideration of Non-Canonical Affordance Effects |
|
Arnold, Solvi | Shinshu Univeristy |
Kuroishi, Mami | EPSON AVASYS |
Karashima, Rin | EPSON AVASYS |
Adachi, Tadashi | EPSON AVASYS |
Yamazaki, Kimitoshi | Shinshu University |
Keywords: Deep Learning Methods, Task and Motion Planning, Neurorobotics
Abstract: We propose a novel system for action sequence planning based on a combination of affordance recognition and a neural forward model predicting the effects of affordance execution. By performing affordance recognition on predicted futures, we avoid reliance on explicit affordance effect definitions for multi-step planning. Because the system learns affordance effects from experience data, the system can foresee not just the canonical effects of an affordance, but also situation-specific side-effects. This allows the system to avoid planning failures due to such non-canonical effects, and makes it possible to exploit non-canonical effects for realising a given goal. We evaluate the system in simulation, on a set of test tasks that require consideration of canonical and non-canonical affordance effects.
|
|
08:36-08:42, Paper MoAT19.2 | Add to My Program |
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness |
|
Yang, Yizhuo | Nangyang Technological Univercity |
Yuan, Shenghai | Nanyang Technological University |
Cao, Muqing | Nanyang Technological University |
Yang, Jianfei | Nanyang Technological University |
Xie, Lihua | NanyangTechnological University |
Keywords: Deep Learning Methods, Sensor Fusion, Human Detection and Tracking
Abstract: In this study, we introduce AV-PedAware, a self-supervised audio-visual fusion system designed to improve dynamic pedestrian awareness for robotics applications. Pedestrian awareness is a critical requirement in many robotics applications. However, traditional approaches that rely on cameras and LIDARs to cover multiple views can be expensive and susceptible to issues such as changes in illumination, occlusion, and weather conditions. Our proposed solution replicates human perception for 3D pedestrian detection using low-cost audio and visual fusion. This study represents the first attempt to employ audio-visual fusion to monitor footstep sounds for the purpose of predicting the movements of pedestrians in the vicinity. The system is trained through self-supervised learning based on LIDAR-generated labels, making it a cost-effective alternative to LIDAR-based pedestrian awareness. AV-PedAware achieves comparable results to LIDAR-based systems at a fraction of the cost. By utilizing an attention mechanism, it can handle dynamic lighting and occlusions, overcoming the limitations of traditional LIDAR and camera-based systems. To evaluate our approach's effectiveness, we collected a new multimodal pedestrian detection dataset and conducted experiments that demonstrate the system's ability to provide reliable 3D detection results using only audio and visual data, even in extreme visual conditions. We will make our collected dataset and source code available online for the community to encourage further development in the field of robotics perception systems.
|
|
08:42-08:48, Paper MoAT19.3 | Add to My Program |
A Multitask and Kernel Approach for Learning to Push Objects with a Target-Parameterized Deep Q-Network |
|
Ewerton, Marco | Idiap Research Institute |
Villamizar, Michael | IDIAP |
Jankowski, Julius | Idiap Research Institute and EPFL |
Calinon, Sylvain | Idiap Research Institute |
Odobez, Jean-Marc | IDIAP |
Keywords: Deep Learning Methods, Deep Learning for Visual Perception, Perception for Grasping and Manipulation
Abstract: Pushing is an essential motor skill involved in several manipulation tasks, and has been an important research topic in robotics. Recent works have shown that Deep Q-Networks (DQNs) can learn pushing policies (when, where to push, and how) to solve manipulation tasks, potentially in synergy with other skills (e.g. grasping). Nevertheless, DQNs often assume a fixed setting and task, which may limit their deployment in practice. Furthermore, they suffer from sparse-gradient backpropagation when the action space is very large, a problem exacerbated by the fact that they are trained to predict state-action values based on a single reward function aggregating several facets of the task, rendering the model training challenging. To address these issues, we propose a multi-head target-parameterized DQN to learn robotic manipulation tasks, in particular pushing policies, and make the following contributions: i) we show that learning to predict different reward and task aspects can be beneficial compared to predicting a single value function where reward factors are not disentangled; ii) we study several alternatives to generalize a policy by encoding the target parameters either into the network layers or visually in the input; iii) we propose a kernelized version of the loss function, allowing to obtain better, faster and more stable training performance. Extensive experiments on simulations validate our design choices, and we show that our architecture learned on simulated data can achieve high performance in a real-robot setup involving a Franka Emika robot arm and unseen objects.
|
|
08:48-08:54, Paper MoAT19.4 | Add to My Program |
DRKF: Distilled Rotated Kernel Fusion for Efficient Rotation Invariant Descriptors in Local Feature Matching |
|
Huang, Ranran | Meituan |
Cai, Jiancheng | Meituan |
Li, Chao | Beijing University of Posts and Telecommunications |
Wu, Zhuoyuan | Meituan |
Liu, Xinmin | Meituan |
Chai, Zhenhua | Meituan |
Keywords: Deep Learning Methods, Visual Learning, Deep Learning for Visual Perception
Abstract: The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameterization, no extra computational costs will be introduced in the inference stage. Moreover, we present Multi-oriented Feature Aggregation (MOFA) which aggregates features extracted from multiple rotated versions of the input image and can provide auxiliary knowledge for the training of RKF by leveraging the distillation strategy. We refer to the distilled RKF model as DRKF. Besides the evaluation on a rotation-augmented version of the public dataset HPatches, we also contribute a new dataset named DiverseBEV which is collected during the drone’s flight and consists of bird’s eye view images with large viewpoint changes and camera rotations. Extensive experiments show that our method can outperform other state-of-the-art techniques when exposed to large rotation variations.
|
|
08:54-09:00, Paper MoAT19.5 | Add to My Program |
Efficient Q-Learning Over Visit Frequency Maps for Multi-Agent Exploration of Unknown Environments |
|
Chen, Xuyang | Cognitive Robot Autonomy and Learning Lab |
Iyer, Ashvin | Purdue University |
Wang, Zixing | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Deep Learning Methods, Reinforcement Learning, Multi-Robot Systems
Abstract: The robot exploration task has been widely studied with applications spanning from novel environment mapping to item delivery. For some time-critical tasks, such as rescue catastrophes, the agent is required to explore as efficiently as possible. Recently, Visit Frequency-based map representation achieved great success in such scenarios by discouraging repetitive visits with a frequency-based penalty. However, its relatively large size and single-agent settings hinder its further development. In this context, we propose Integrated Visit Frequency Map, which encodes identical information as Visit Frequency Map with a more compact size, and a visit frequency-based multi-agent information exchange and control scheme that is able to accommodate both representations. Through tests in diverse settings, the results indicate our proposed methods can achieve a comparable level of performance of VFM with lower bandwidth requirements and generalize well to different multi-agent setups including real-world environments.
|
|
09:00-09:06, Paper MoAT19.6 | Add to My Program |
Real-Time Trajectory-Based Social Group Detection |
|
Jahangard, Simindokht | Monash University |
Hayat, Munawar | Monash University |
Rezatofighi, Hamid | Monash University |
Keywords: Deep Learning Methods, Human and Humanoid Motion Analysis and Synthesis, Human Detection and Tracking
Abstract: Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDB-Act dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.
|
|
09:06-09:12, Paper MoAT19.7 | Add to My Program |
Point2Point : A Framework for Efficient Deep Learning on Hilbert Sorted Point Clouds with Applications in Spatio-Temporal Occupancy Prediction |
|
Pandhare, Athrva Atul | University of Pennsylvania |
Keywords: Deep Learning Methods, Deep Learning for Visual Perception, Mapping
Abstract: The irregularity and permutation invariance of point cloud data pose challenges for effective learning. Conventional methods for addressing this issue involve converting raw point clouds to intermediate representations such as 3D voxel grids or range images. While such intermediate representations solve the problem of permutation invariance, they can result in significant loss of information. Approaches that do learn on raw point clouds either have trouble in resolving neighborhood relationships between points or are too complicated in their formulation. In this paper, we propose a novel approach to representing point clouds as a locality preserving 1D ordering induced by the Hilbert space-filling curve. We also introduce Point2Point, a neural architecture that can effectively learn on Hilbert-sorted point clouds. We show that Point2Point shows competitive performance on point cloud segmentation and generation tasks. Finally, we show the performance of Point2Point on Spatio-temporal Occupancy prediction from Point clouds.
|
|
09:12-09:18, Paper MoAT19.8 | Add to My Program |
Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models |
|
Mueller Carvalho, Joao Andre | Technische Universität Darmstadt |
Le, An Thai | Technische Universität Darmstadt |
Baierl, Mark | Technical University of Darmstadt |
Koert, Dorothea | Technische Universitaet Darmstadt |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Deep Learning Methods, Learning from Experience
Abstract: Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior works propose several ways on utilizing this prior to bootstrapping the motion planning problem. Either sampling the prior for initializations or using the prior distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we propose learning diffusion models as priors. We then can sample directly from the posterior trajectory distribution conditioned on task goals, by leveraging the inverse denoising process of diffusion models. Furthermore, diffusion has been recently shown to effectively encode data multimodality in high-dimensional settings, which is particularly well-suited for large trajectory dataset. To demonstrate our method efficacy, we compare our proposed method - Motion Planning Diffusion - against several baselines in simulated planar robot and 7-dof robot arm manipulator environments. To assess the generalization capabilities of our method, we test it in environments with previously unseen obstacles. Our experiments show that diffusion models are strong priors to encode high-dimensional trajectory distributions of robot motions.
|
|
09:18-09:24, Paper MoAT19.9 | Add to My Program |
Active Task Randomization: Learning Robust Skills Via Unsupervised Generation of Diverse and Feasible Tasks |
|
Fang, Kuan | University of California, Berkeley |
Migimatsu, Toki | Stanford University |
Mandlekar, Ajay Uday | NVIDIA |
Fei-Fei, Li | Stanford University |
Bohg, Jeannette | Stanford University |
Keywords: Deep Learning Methods, Deep Learning in Grasping and Manipulation, Representation Learning
Abstract: Solving real-world manipulation tasks requires robots to be equipped with a repertoire of skills that can be applied to diverse scenarios. While learning-based methods can enable robots to acquire skills from interaction data, their success relies on collecting training data that covers the diverse range of tasks that the robot may encounter during the test time. However, creating diverse and feasible training tasks often requires extensive domain knowledge and non-trivial manual labor. We introduce Active Task Randomization (ATR), an approach that learns robust skills through the unsupervised generation of training tasks. ATR selects suitable training tasks—which consist of an environment configuration and manipulation goal—by actively balancing their diversity and feasibility. In doing so, ATR effectively creates a curriculum that gradually increases task diversity while maintaining a moderate level of feasibility, which leads to more complex tasks as the skills become more capable. ATR predicts task diversity and feasibility with a compact task representation that is learned concurrently with the skills. The selected tasks are then procedurally generated in simulation with a graph-based parameterization. We demonstrate that the learned skills can be composed by a task planner to solve unseen sequential manipulation problems based on visual inputs. Compared to baseline methods, ATR can achieve superior success rates in single-step and sequential manipulation tasks.
|
|
09:24-09:30, Paper MoAT19.10 | Add to My Program |
Robust Self-Supervised Extrinsic Self-Calibration |
|
Kanai, Takayuki | Toyota Research Institute |
Vasiljevic, Igor | Toyota Research Institute |
Guizilini, Vitor | Toyota Research Institute |
Gaidon, Adrien | Toyota Research Institute |
Ambrus, Rares | Toyota Research Institute |
Keywords: Deep Learning Methods, Calibration and Identification
Abstract: Autonomous vehicles and robots need to operate over a wide variety of scenarios in order to complete tasks efficiently and safely. Multi-camera self-supervised monocular depth estimation from videos is a promising way to reason about the environment, as it generates metrically scaled geometric predictions from visual data without requiring additional sensors. However, most works assume well-calibrated extrinsics to fully leverage this multi-camera setup, even though accurate and efficient calibration is still a challenging problem. In this work, we introduce a novel method for extrinsic calibration that builds upon the principles of self-supervised monocular depth and ego-motion learning. Our proposed curriculum learning strategy uses monocular depth and pose estimators with velocity supervision to estimate extrinsics, and then jointly learns extrinsic calibration along with depth and pose for a set of overlapping cameras rigidly attached to a moving vehicle. Experiments on a benchmark multi-camera dataset (DDAD) demonstrate that our method enables self-calibration in various scenes robustly and efficiently compared to a traditional vision-based pose estimation pipeline. Furthermore, we demonstrate the benefits of extrinsics self-calibration as a way to improve depth prediction via joint optimization. Project page: https://sites.google.com/tri.global/tri-sesc
|
|
09:30-09:36, Paper MoAT19.11 | Add to My Program |
Do More with Less: Single-Model, Multi-Goal Architectures for Resource-Constrained Robots |
|
Wang, Zili | Boston University |
Threatt, Drew | Boston University |
Andersson, Sean | Boston University |
Tron, Roberto | Boston University |
Keywords: Deep Learning Methods, Autonomous Agents
Abstract: Deep learning methods are widely used in robotic applications. By learning from prior experience, the robot can abstract knowledge of the environment, and use this knowledge to accomplish different goals, such as object search, frontier exploration, or scene understanding, with a smaller amount of resources than might be needed without that knowledge. Most existing methods typically require a significant amount of sensing, which in turn has significant costs in terms of power consumption for acquisition and processing, and typically focus on models that are tuned for each specific goal, leading to the need to train, store and run each one separately. These issues are particularly important in a resource-constrained setting, such as with small-scale robots or during long-duration missions. We propose a single, multi-task deep learning architecture that takes advantage of the structure of the partial environment to predict different abstractions of the environment (thus reducing the need for rich sensing), and to leverage these predictions to simultaneously achieve different high-level goals (thus sharing computation between goals). As an example application of the proposed architecture, we consider the specific example of a robot equipped with a 2-D laser scanner and an object detector, tasked with searching for an object (such as an exit) in a residential building while constructing a topological map that can be used for future missions. The prior knowledge of the environment is encoded using a U-Net deep network architecture. In this context, our work leads to an object search algorithm that is complete, and that outperforms a more traditional frontier-based approach. The topological map we produce uses scene trees to qualitatively represent the environment as a graph at a fraction of the cost of existing SLAM-based solutions. Our results demonstrate that it is possible to extract multi-task semantic information that is useful for navigation and mapping directly from bare-bone, non-semantic measurements.
|
|
09:36-09:42, Paper MoAT19.12 | Add to My Program |
Enhancing State Estimation in Robots: A Data-Driven Approach with Differentiable Ensemble Kalman Filters |
|
Liu, Xiao | Arizona State University |
Clark, Geoffrey | ASU |
Campbell, Joseph | Carnegie Mellon University |
Zhou, Yifan | Arizona State University |
Ben Amor, Heni | Arizona State University |
Keywords: Deep Learning Methods, Deep Learning for Visual Perception, Deep Learning in Grasping and Manipulation
Abstract: This paper introduces a novel state estimation framework for robots using differentiable ensemble Kalman filters (DEnKF). DEnKF is a reformulation of the traditional ensemble Kalman filter that employs stochastic neural networks to model the process noise implicitly. Our work is an extension of previous research on differentiable filters, which has provided a strong foundation for our modular and end-to-end differentiable framework. This framework enables each component of the system to function independently, leading to improved flexibility and versatility in implementation. Through a series of experiments, we demonstrate the flexibility of this model across a diverse set of real-world tracking tasks, including visual odometry and robot manipulation. Moreover, we show that our model effectively handles noisy observations, is robust in the absence of observations, and outperforms state-of-theart differentiable filters in terms of error metrics. Specifically, we observe a significant improvement of at least 59% in translational error when using DEnKF with noisy observations. Our results underscore the potential of DEnKF in advancing state estimation for robotics. Code for DEnKF is available at https://github.com/ir-lab/DEnKF
|
|
09:42-09:48, Paper MoAT19.13 | Add to My Program |
Self-Supervised Category-Level 6D Object Pose Estimation with Optical Flow Consistency |
|
Zaccaria, Michela | E80Group S.p.A., University of Parma |
Manhardt, Fabian | Google |
Di, Yan | Technical University of Munich |
Tombari, Federico | Technische Universität München |
Aleotti, Jacopo | University of Parma |
Giorgini, Mikhail | University of Parma, Elettric 80 S.p.A |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, RGB-D Perception
Abstract: Category-level 6D object pose estimation aims at determining the pose of an object of a given category. Most current state-of-the-art methods require a significant amount of real training data to supervise their models. Moreover, annotating the 6D pose is very time consuming, error-prone, and it does not scale well to a large amount of object classes. Therefore, a handful of methods have recently been proposed to use unlabelled data to establish weak supervision. In this letter we propose a self-supervised method that leverages the 2D optical flow as a proxy for supervising the 6D pose. To this purpose, we estimate the 2D optical flow between consecutive frames based on the pose estimation. Then, we harness an off-the-shelf optical flow method to enable weak supervision using a 2D-3D optical flow based consistency loss. Experiments show that our approach for self-supervised learning yields state-of-the-art performance on the NOCS benchmark, and it reaches comparable results with some fully-supervised approaches.
|
|
MoAIP Interactive session, Hall E |
|
Poster M1 |
|
|
|
Subsession MoAIP-01, Hall E | |
Clone of 'Semantic Scene Understanding' Regular session, 14 papers |
|
Subsession MoAIP-02, Hall E | |
Clone of 'Wearable and Assistive Devices' Regular session, 12 papers |
|
Subsession MoAIP-03, Hall E | |
Clone of 'Collision Avoidance I' Regular session, 13 papers |
|
Subsession MoAIP-04, Hall E | |
Clone of 'Control Applications' Regular session, 14 papers |
|
Subsession MoAIP-05, Hall E | |
Clone of 'Mechanism Design I' Regular session, 14 papers |
|
Subsession MoAIP-06, Hall E | |
Clone of 'Modeling, Control, and Learning for Soft Robots I' Regular session, 13 papers |
|
Subsession MoAIP-07, Hall E | |
Clone of 'Cooperating Robots' Regular session, 13 papers |
|
Subsession MoAIP-08, Hall E | |
Clone of 'Legged Robots I' Regular session, 12 papers |
|
Subsession MoAIP-09, Hall E | |
Clone of 'Motion and Path Planning I' Regular session, 13 papers |
|
Subsession MoAIP-10, Hall E | |
Clone of 'Learning for Manipulation I' Regular session, 13 papers |
|
Subsession MoAIP-11, Hall E | |
Clone of 'Aerial Systems - Applications I' Regular session, 14 papers |
|
Subsession MoAIP-12, Hall E | |
Clone of 'Perception for Grasping and Manipulation I' Regular session, 12 papers |
|
Subsession MoAIP-13, Hall E | |
Clone of 'Visual Learning' Regular session, 12 papers |
|
Subsession MoAIP-14, Hall E | |
Clone of 'Localization I' Regular session, 13 papers |
|
Subsession MoAIP-15, Hall E | |
Clone of 'Sensor Fusion for SLAM' Regular session, 13 papers |
|
Subsession MoAIP-16, Hall E | |
Clone of 'Autonomous Agents' Regular session, 13 papers |
|
Subsession MoAIP-17, Hall E | |
Clone of 'Imitation Learning' Regular session, 14 papers |
|
Subsession MoAIP-18, Hall E | |
Clone of 'Calibration and Identification' Regular session, 14 papers |
|
Subsession MoAIP-19, Hall E | |
Clone of 'Deep Learning Methods I' Regular session, 13 papers |
|
10:00-11:30, Subsession MoAIP-20, Hall E | |
Late Breaking Posters I Late breaking, 33 papers |
|
MoAIP-01 Regular session, Hall E |
Add to My Program |
Clone of 'Semantic Scene Understanding' |
|
|
|
10:00-11:30, Paper MoAIP-01.1 | Add to My Program |
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data |
|
Zeller, Matthias | CARIAD SE |
Behley, Jens | University of Bonn |
Heidingsfeld, Michael | CARIAD SE |
Stachniss, Cyrill | University of Bonn |
Keywords: Semantic Scene Understanding, Deep Learning Methods
Abstract: Scene understanding is crucial for autonomous robots in dynamic environments for making future state predictions, avoiding collisions, and path planning. Camera and LiDAR perception made tremendous progress in recent years, but face limitations under adverse weather conditions. To leverage the full potential of multi-modal sensor suites, radar sensors are essential for safety critical tasks and are already installed in most new vehicles today. In this paper, we address the problem of semantic segmentation of moving objects in radar point clouds to enhance the perception of the environment with another sensor modality. Instead of aggregating multiple scans to densify the point clouds, we propose a novel approach based on the self-attention mechanism to accurately perform sparse, single-scan segmentation. Our approach, called Gaussian Radar Transformer, includes the newly introduced Gaussian transformer layer, which replaces the softmax normalization by a Gaussian function to decouple the contribution of individual points. To tackle the challenge of the transformer to capture long-range dependencies, we propose our attentive up- and downsampling modules to enlarge the receptive field and capture strong spatial relations. We compare our approach to other state-of-the-art methods on the RadarScenes data set and show superior segmentation quality in diverse environments, even without exploiting temporal information.
|
|
10:00-11:30, Paper MoAIP-01.2 | Add to My Program |
Mask-Based Panoptic LiDAR Segmentation for Autonomous Driving |
|
Marcuzzi, Rodrigo | University of Bonn |
Nunes, Lucas | University of Bonn |
Wiesmann, Louis | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Semantic Scene Understanding, Deep Learning Methods
Abstract: Autonomous vehicles need to understand their surroundings geometrically and semantically to plan and act appropriately in the real world. Panoptic segmentation of LiDAR scans provides a description of the surroundings by unifying semantic and instance segmentation. It is usually solved in a bottom-up manner, consisting of two steps. Predicting the semantic class for 3D each point, using this information to filter out ”stuff” points, and cluster the ”thing” points to obtain instance segmentation. The clustering is a post-processing step that often needs hyperparameter tuning, which usually does not adapt to instances of different sizes or different datasets. To this end, we propose MaskPLS, an approach to perform panoptic segmentation of LiDAR scans in an end-to-end manner by predicting a set of non-overlapping binary masks and semantic classes, fully avoiding the clustering step. As a result, each mask represents a single instance belonging to a ”thing” class or a complete ”stuff” class. Experiments on SemanticKITTI show that the end-to-end learnable mask generation leads to superior performance compared to state-of-the-art heuristic approaches.
|
|
10:00-11:30, Paper MoAIP-01.3 | Add to My Program |
SCENE: Reasoning about Traffic Scenes Using Heterogeneous Graph Neural Networks |
|
Schmidt, Julian | Mercedes-Benz AG, Ulm University |
Monninger, Thomas | Mercedes-Benz AG, University of Stuttgart |
Rupprecht, Jan | Mercedes-Benz AG |
Raba, David | Mercedes Benz AG |
Jordan, Julian | Mercedes-Benz AG |
Frank, Daniel | University of Stuttgart |
Staab, Steffen | University of Stuttgart |
Dietmayer, Klaus | University of Ulm |
Keywords: Semantic Scene Understanding, AI-Based Methods, Behavior-Based Systems
Abstract: Understanding traffic scenes requires considering heterogeneous information about dynamic agents and the static infrastructure. In this work we propose SCENE, a methodology to encode diverse traffic scenes in heterogeneous graphs and to reason about these graphs using a heterogeneous Graph Neural Network encoder and task-specific decoders. The heterogeneous graphs, whose structures are defined by an ontology, consist of different nodes with type-specific node features and different relations with type-specific edge features. In order to exploit all the information given by these graphs, we propose to use cascaded layers of graph convolution. The result is an encoding of the scene. Task-specific decoders can be applied to predict desired attributes of the scene. Extensive evaluation on two diverse binary node classification tasks show the main strength of this methodology: despite being generic, it even manages to outperform task-specific baselines. The further application of our methodology to the task of node classification in various knowledge graphs shows its transferability to other domains.
|
|
10:00-11:30, Paper MoAIP-01.4 | Add to My Program |
Prototypical Contrastive Transfer Learning for Multimodal Language Understanding |
|
Otsuki, Seitaro | Keio University |
Ishikawa, Shintaro | Keio University |
Sugiura, Komei | Keio University |
Keywords: Transfer Learning, Semantic Scene Understanding, Multi-Modal Perception for HRI
Abstract: Although domestic service robots are expected to assist individuals who require support, they cannot currently interact smoothly with people through natural language. For example, given the instruction "Bring me a bottle from the kitchen," it is difficult for such robots to specify the bottle in an indoor environment. Most conventional models have been trained on real-world datasets that are labor-intensive to collect, and they have not fully leveraged simulation data through a transfer learning framework. In this study, we propose a novel transfer learning approach for multimodal language understanding called Prototypical Contrastive Transfer Learning (PCTL), which uses a new contrastive loss called Dual ProtoNCE. We introduce PCTL to the task of identifying target objects in domestic environments according to free-form natural language instructions. To validate PCTL, we built new real-world and simulation datasets. Our experiment demonstrated that PCTL outperformed existing methods. Specifically, PCTL achieved an accuracy of 78.1%, whereas simple fine-tuning achieved an accuracy of 73.4%.
|
|
10:00-11:30, Paper MoAIP-01.5 | Add to My Program |
Re-Thinking Classification Confidence with Model Quality Quantification |
|
Pan, Yancheng | Peking University |
Zhao, Huijing | Peking University |
Keywords: Semantic Scene Understanding, Autonomous Agents
Abstract: Deep neural networks using for real-world classification task require high reliability and robustness. However, the Softmax output by the last layer of network is often over-confident. We propose a novel confidence estimation method by considering model quality for deep classification models. Two metrics, MQ-Repres and MQ-Discri are developed accordingly to evaluate the model quality, and also provide a new confidence estimation called MQ-Conf for online inference. We demonstrate the capability of the proposed method by the 3D semantic segmentation tasks using three different deep networks. Through confusion analysis and feature visualization we show the rationality and reliability of the model quality quantification method.
|
|
10:00-11:30, Paper MoAIP-01.6 | Add to My Program |
Self-Supervised Drivable Area Segmentation Using LiDAR’s Depth Information for Autonomous Driving |
|
Ma, Fulong | The Hong Kong University of Science and Technology |
Liu, Yang | The Hong Kong University of Science and Technology |
Wang, Sheng | Hong Kong University of Science and Technology |
Jin, Wu | UESTC |
Qi, Weiqing | HKUST |
Liu, Ming | Hong Kong University of Science and Technology |
Keywords: Semantic Scene Understanding, Perception for Grasping and Manipulation, Mapping
Abstract: Drivable area segmentation is an essential component of the visual perception system for autonomous driving vehicles. Recent efforts in deep neural networks have significantly improved semantic segmentation performance for autonomous driving. However, most DNN-based methods need a large amount of data to train the models, and collecting large-scale datasets with manually labeled ground truth is costly, tedious, time consuming and requires the availability of experts, making DNN-based methods often difficult to implement in real world applications. Hence, in this paper, we introduce a novel module named automatic data labeler (ADL), which leverages a deterministic LiDAR-based method for ground plane segmentation and road boundary detection to create large datasets suitable for training DNNs. Furthermore, since the data generated by our ADL module is not as accurate as the manually annotated data, we introduce uncertainty estimation to compensate for the gap between the human labeler and our ADL. Finally, we train the semantic segmentation neural networks using our automatically generated labels on the KITTI dataset and KITTI-CARLA dataset. The experimental results demonstrate that our proposed ADL method not only achieves impressive performance compared to manual labeling but also exhibits more robust and accurate results than both traditional methods and state-of-the-art self-supervised methods.
|
|
10:00-11:30, Paper MoAIP-01.7 | Add to My Program |
Vehicle Motion Forecasting Using Prior Information and Semantic-Assisted Occupancy Grid Maps |
|
Asghar, Rabbia | INRIA / Univ. Grenoble Alpes |
Diaz-Zapata, Manuel | Inria Grenoble |
Rummelhard, Lukas | INRIA |
Spalanzani, Anne | INRIA / Univ. Grenoble Alpes |
Laugier, Christian | INRIA |
Keywords: Semantic Scene Understanding, Deep Learning Methods, Autonomous Vehicle Navigation
Abstract: Motion prediction is a challenging task for autonomous vehicles due to uncertainty in the sensor data, the non-deterministic nature of future, and complex behavior of agents. In this paper, we tackle this problem by representing the scene as dynamic occupancy grid maps (DOGMs), associating semantic labels to the occupied cells and incorporating map information. We propose a novel framework that combines deep-learning-based spatio-temporal and probabilistic approaches to predict multimodal vehicle behaviors. Contrary to the conventional OGM prediction methods, evaluation of our work is conducted against the ground truth annotations. We experiment and validate our results on real-world NuScenes dataset and show that our model shows superior ability to predict both static and dynamic vehicles compared to OGM predictions. Furthermore, we perform an ablation study and assess the role of semantic labels and map in the architecture.
|
|
10:00-11:30, Paper MoAIP-01.8 | Add to My Program |
Enhance Local Feature Consistency with Structure Similarity Loss for 3D Semantic Segmentation |
|
Lin, Cheng-Wei | Department of Computer Science, National Yang Ming Chiao Tung Un |
Syu, Fang-Yu | Department of Computer Science, National Yang Ming Chiao Tung Un |
Pan, Yi-Ju | National Yang Ming Chiao Tung University |
Chen, Kuan-Wen | National Yang Ming Chiao Tung University |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Recently, many research studies have been carried out on using deep learning methods for 3D point cloud understanding. However, there is still no remarkable result on 3D point cloud semantic segmentation compared to those of 2D research. One important reason is that 3D data has higher dimensionality but lacks large datasets, which means that the deep learning model is difficult to optimize and easy to overfit. To overcome this, an essential method is to provide more priors to the learning of deep models. In this paper, we focus on semantic segmentation for point clouds in the real world. To provide priors to the model, we propose a novel loss function called Linearity and Planarity to enhance local feature consistency in the regions with similar local structure. Experiments show that the proposed method improves baseline performance on both indoor and outdoor datasets e.g. S3DIS and Semantic3D.
|
|
10:00-11:30, Paper MoAIP-01.9 | Add to My Program |
Lightweight Semantic Segmentation Network for Semantic Scene Understanding on Low-Compute Devices |
|
Son, Hojun | University of Michigan |
Weiland, James | University of Michigan |
Keywords: Semantic Scene Understanding, Embedded Systems for Robotic and Automation, Deep Learning for Visual Perception
Abstract: Semantic scene understanding is beneficial for mobile robots. Semantic information obtained through onboard cameras can improve robots’ navigation performance. However, obtaining semantic information on small mobile robots with constrained power and computation resources is challenging. We propose a new lightweight convolution neural network comparable to previous semantic segmentation algorithms for mobile applications. Our network achieved 73.06% on the Cityscapes validation set and 71.8% on the Cityscapes test set. Our model runs at 116 FPS with 1024x2048, 172 fps with 1024x1024, and 175 FPS with 720x960 on NVIDIA GTX 1080. We analyze a model size, which is defined as the summation of the number of floating operations and the number of parameters. The smaller model size enables tiny mobile robot systems that should operate multiple tasks simultaneously to work efficiently. Our model has the smallest model size compared to the real-time semantic segmentation convolution neural networks ranked on Cityscapes real-time benchmark and other high-performing, lightweight convolution neural networks. On the Camvid test set, our model achieved a mIoU of 73.29% with Cityscapes pre-training, which outperformed the accuracy of other lightweight convolution neural networks. For mobile applicability, we measured frame-per-second on different low-compute devices. Our model operates 35 FPS on Jetson Xavier AGX, 21 FPS on Jetson Xavier NX, and 14 FPS on a ROS ASUS gaming phone. 1024x2048 resolution is used for the Jetson devices, and 512x512 size is utilized for the measurement on the phone. Our network did not use extra datasets such as ImageNet, Coarse Cityscapes, and Mapillary. Additionally, we did not use TensorRT to achieve fast inference speed. Compared to other real-time and lightweight CNNs, our model achieved significantly more efficiency while balancing accuracy, inference speed, and model size.
|
|
10:00-11:30, Paper MoAIP-01.10 | Add to My Program |
LiDAR-SGMOS: Semantics-Guided Moving Object Segmentation with 3D LiDAR |
|
Gu, Shuo | Nanjing University of Science and Technology |
Yao, Suling | Nanjing University of Science and Technology |
Yang, Jian | Nanjing University of Science & Technology |
Xu, Chengzhong | University of Macau |
Kong, Hui | University of Macau |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning Methods
Abstract: Most of the existing moving object segmentation (MOS) methods regard MOS as an independent task, in this paper, we associate the MOS task with semantic segmentation, and propose a semantics-guided network for moving object segmentation (LiDAR-SGMOS). We first transform the range image and semantic features of the past scan into the range view of current scan based on the relative pose between scans. The residual image is obtained by calculating the normalized absolute difference between the current and transformed range images. Then, we apply a Meta-Kernel based cross scan fusion (CSF) module to adaptively fuse the range images and semantic features of current scan, the residual image and transformed features. Finally, the fused features with rich motion and semantic information are processed to obtain reliable MOS results. We also introduce a residual image augmentation method to further improve the MOS performance. Our method outperforms most LiDAR-MOS methods with only two sequential LiDAR scans as inputs on the SemanticKITTI MOS dataset.
|
|
10:00-11:30, Paper MoAIP-01.11 | Add to My Program |
Robust Fusion for Bayesian Semantic Mapping |
|
Morilla-Cabello, David | Universidad De Zaragoza |
Mur Labadia, Lorenzo | University of Zaragoza |
Martinez-Cantin, Ruben | University of Zaragoza |
Montijano, Eduardo | Universidad De Zaragoza |
Keywords: Semantic Scene Understanding, Mapping, Deep Learning for Visual Perception
Abstract: The integration of semantic information in a map allows robots to understand better their environment and make high-level decisions. In the last few years, neural networks have shown enormous progress in their perception capabilities. However, when fusing multiple observations from a neural network in a semantic map, its inherent overconfidence with unknown data gives too much weight to the outliers and decreases the robustness. To mitigate this issue we propose a novel robust fusion method to combine multiple Bayesian semantic predictions. Our method uses the uncertainty estimation provided by a Bayesian neural network to calibrate the way in which the measurements are fused. This is done by regularizing the observations to mitigate the problem of overconfident outlier predictions and using the epistemic uncertainty to weigh their influence in the fusion, resulting in a different formulation of the probability distributions. We validate our robust fusion strategy by performing experiments on photo-realistic simulated environments and real scenes. In both cases, we use a network trained on different data to expose the model to varying data distributions. The results show that considering the model's uncertainty and regularizing the probability distribution of the observations distribution results in a better semantic segmentation performance and more robustness to outliers, compared with other methods.
|
|
10:00-11:30, Paper MoAIP-01.12 | Add to My Program |
ConSOR: A Context-Aware Semantic Object Rearrangement Framework for Partially Arranged Scenes |
|
Ramachandruni, Kartik | Georgia Institute of Technology |
Zuo, Max | Georgia Institute of Technology |
Chernova, Sonia | Georgia Institute of Technology |
Keywords: Semantic Scene Understanding, Deep Learning Methods
Abstract: Object rearrangement is the problem of enabling a robot to identify the correct object placement in a complex environment. Prior work on object rearrangement has explored a diverse set of techniques for following user instructions to achieve some desired goal state. Logical predicates, images of the goal scene, and natural language descriptions have all been used to instruct a robot in how to arrange objects. In this work, we argue that burdening the user with specifying goal scenes is not necessary in partially-arranged environments, such as common household settings. Instead, we show that contextual cues from partially arranged scenes (i.e., the placement of some number of pre-arranged objects in the environment) provide sufficient context to enable robots to perform object rearrangement without any explicit user goal specification. We introduce ConSOR, a Context-aware Semantic Object Rearrangement framework that utilizes contextual cues from a partially arranged initial state of the environment to complete the arrangement of new objects, without explicit goal specification from the user. We demonstrate that ConSOR strongly outperforms two baselines in generalizing to novel object arrangements and unseen object categories. The code and data are available at https://github.com/kartikvrama/consor.
|
|
10:00-11:30, Paper MoAIP-01.13 | Add to My Program |
IDA: Informed Domain Adaptive Semantic Segmentation |
|
Chen, Zheng | Indiana University Bloomington |
Ding, Zhengming | Tulane University |
Gregory, Jason M. | US Army Research Laboratory |
Liu, Lantao | Indiana University |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: Mixup-based data augmentation has been validated to be a critical stage in the self-training framework for unsupervised domain adaptive semantic segmentation (UDA-SS), which aims to transfer knowledge from a well-annotated (source) domain to an unlabeled (target) domain. Existing self-training methods usually adopt the popular region-based mixup techniques with a random sampling strategy, which unfortunately ignores the dynamic evolution of different semantics across various domains as training proceeds. To improve the UDA-SS performance, we propose an Informed Domain Adaptation (IDA) model, a self-training framework that mixes the data based on class-level segmentation performance, which aims to emphasize small-region semantics during mixup. In our IDA model, the class-level performance is tracked by an expected confidence score (ECS). We then use a dynamic schedule to determine the mixing ratio for data in different domains. Extensive experimental results reveal that our proposed method is able to outperform the state-of-the-art UDA-SS method by a margin of 1.1 mIoU in the adaptation of GTA-V to Cityscapes and of 0.9 mIoU in the adaptation of SYNTHIA to Cityscapes.
|
|
10:00-11:30, Paper MoAIP-01.14 | Add to My Program |
Self-Supervised Learning for Panoptic Segmentation of Multiple Fruit Flower Species |
|
Siddique, Abubakar | Marquette University |
Tabb, Amy | USDA-ARS-AFRS |
Medeiros, Henry | University of Florida |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Incremental Learning
Abstract: Convolutional neural networks trained using manually generated labels are commonly used for semantic or instance segmentation. In precision agriculture, automated flower detection methods use supervised models and post-processing techniques that may not perform consistently as the appearance of the flowers and the data acquisition conditions vary. We propose a self-supervised learning strategy to enhance the sensitivity of segmentation models to different flower species using automatically generated pseudo-labels. We employ a data augmentation and refinement approach to improve the accuracy of the model predictions. The augmented semantic predictions are then converted to panoptic pseudo-labels to iteratively train a multi-task model. The self-supervised model predictions can be refined with existing post-processing approaches to further improve their accuracy. An evaluation on a multi-species fruit tree flower dataset demonstrates that our method outperforms state-of-the-art models without computationally expensive post-processing steps, providing a new baseline for flower detection applications.
|
|
MoAIP-02 Regular session, Hall E |
Add to My Program |
Clone of 'Wearable and Assistive Devices' |
|
|
|
10:00-11:30, Paper MoAIP-02.1 | Add to My Program |
Combined Admittance Control with Type II Singularity Evasion for Parallel Robots Using Dynamic Movement Primitives (I) |
|
Escarabajal, Rafael J. | Universidad Politécnica De Valencia |
Pulloquinga, José Luis | Universidad Politécnica De Valencia |
Valera, Angel | Universidad Politécnica De Valencia |
Mata, Vicente | Universidad Politécnica De Valencia |
Valles, Marina | Universitat Politècnica De València |
Castillo-García, Fernando J. | Universidad De Castilla-La Mancha |
Keywords: Rehabilitation Robotics, Parallel Robots, Compliance and Impedance Control, Dynamic Movement Primitives
Abstract: This paper addresses a new way of generating compliant trajectories for control using movement primitives to allow physical human-robot interaction where parallel robots (PRs) are involved. PRs are suitable for tasks requiring precision and performance because of their robust behavior. However, two fundamental issues must be resolved to ensure safe operation: i) the force exerted on the human must be controlled and limited, and ii) Type II singularities should be avoided to keep complete control of the robot. We offer a unified solution under the Dynamic Movement Primitives (DMP) framework to tackle both tasks simultaneously. DMPs are used to get an abstract representation for movement generation and are involved in broad areas such as imitation learning and movement recognition. For force control, we design an admittance controller intrinsically defined within the DMP structure, and subsequently, the Type II singularity evasion layer is added to the system. Both the admittance controller and the evader exploit the dynamic behavior of the DMP and its properties related to invariance and temporal coupling, and the whole system is deployed in a real PR meant for knee rehabilitation. The results show the capability of the system to perform safe rehabilitation exercises.
|
|
10:00-11:30, Paper MoAIP-02.2 | Add to My Program |
A Handle Robot for Providing Bodily Support to Elderly Persons |
|
Bolli, Roberto | MIT |
Bonato, Paolo | Harvard Medical School |
Asada, Harry | MIT |
Keywords: Physically Assistive Devices, Human-Robot Collaboration, Domestic Robotics
Abstract: Age-related loss of mobility and an increased risk of falling remain major obstacles for older adults to live independently. Many elderly people lack the coordination and strength necessary to perform activities of daily living, such as getting out of bed or stepping into a bathtub. A traditional solution is to install grab bars around the home. For assisting in bathtub transitions, grab bars are fixed to a bathroom wall. However, they are often too far to reach and stably support the user; the installation locations of grab bars are constrained by the room layout and are often suboptimal. In this paper, we present a mobile robot that provides an older adult with a handlebar located anywhere in space - “Handle Anywhere”. The robot consists of an omnidirectional mobile base attached to a repositionable handlebar. We further develop a methodology to optimally place the handle to provide the maximum support for the elderly user while performing common postural changes. A cost function with a trade-off between mechanical advantage and manipulability of the user’s arm was optimized in terms of the location of the handlebar relative to the user. The methodology requires only a sagittal plane video of the elderly user performing the postural change, and thus is rapid, scalable, and uniquely customizable to each user. A proof-of-concept prototype was built, and the optimization algorithm for handle location was validated experimentally.
|
|
10:00-11:30, Paper MoAIP-02.3 | Add to My Program |
A Hybrid FNS Generator for Human Trunk Posture Control with Incomplete Knowledge of Neuromusculoskeletal Dynamics |
|
Bao, Xuefeng | Case Western Reserve University |
Friederich, Aidan | Case Western Reserve University |
Triolo, Ronald | Case Western Reserve University |
Audu, Musa. L. | Case Western Reserve University |
Keywords: Rehabilitation Robotics, Modeling and Simulating Humans, Motion Control
Abstract: The trunk movements of an individual paralyzed by spinal cord injury (SCI) can be restored by Functional Neuromuscular Stimulation (FNS), a technique that applies low-level current to motor nerves to activate the muscles generating torques, and thus, produce trunk motions. FNS can be modulated to control trunk movements. However, a stabilizing modulation policy (i.e., control law) is difficult to derive due to the complexity of neuromusculoskeletal dynamics, which consist of skeletal dynamics (i.e., multi-joint rigid body dynamics) and neuromuscular dynamics (i.e., a highly nonlinear, nonautonomous, and input redundant dynamics). Therefore, an FNS-based control method that can stabilize the trunk without knowing the accurate skeletal and neuromuscular dynamics is desired. This work proposed an FNS generator, which consists of a robust nonlinear controller (RNC) that provides stabilizing torque command and an artificial neural network (ANN)- based torque-to-activation (T-A) map to ensure that the muscle generates the stabilizing torque to the skeleton. Due to the robustness and learning capability of this control framework, full knowledge of the trunk neuromusculoskeletal dynamics is not required. The proposed control framework has been tested in a simulation environment where an anatomically realistic 3D musculoskeletal model of the human trunk was manipulated to follow a time-varying reference that moves in the anterior-posterior and medial-lateral directions. From the results, it can be seen that the trunk motion converges to a satisfactory trajectory while the ANN is being updated. The results suggest the potential of this control framework for trunk tracking tasks in a clinical application.
|
|
10:00-11:30, Paper MoAIP-02.4 | Add to My Program |
Insole-Type Walking Assist Device Capable of Inducing Inversion-Eversion of the Ankle Angle to the Neutral Position |
|
Itami, Taku | Aoyama Gakuin University |
Date, Kazuki | Aoyama Gakuin University |
Ishii, Yuuta | Aoyama Gakuin University |
Yoneyama, Jun | Aoyama Gakuin University |
Aoki, Takaaki | Gifu University |
Keywords: Prosthetics and Exoskeletons, Robotics and Automation in Life Sciences, Body Balancing
Abstract: In recent years, the aging of society has become a serious problem, especially in developed countries. Walking is an important element in extending healthy life expectancy in old age. In particular, induction of proper ankle joint alignment at heel contact is important during the gait cycle from the perspective of smooth weight transfer and reduction of burden on the knees and hip. In this study, we focus on the behavior of the ankle joint at heel contact and propose an insole-type assist device that can induce the ankle angle inversion/eversion rotation. The proposed device has tilting of the heel part from left to right in response to the rotation of a stepping motor, and an inertial sensor mounted inside controls the heel part to always maintain a horizontal position. The effectiveness of the proposed device is verified by evaluating the amount of lateral thrust of the knee joint of six healthy male subjects during a foot-stepping motion using motion capture system. The results showed that the amount of lateral thrust is significantly reduced by wearing the device with control.
|
|
10:00-11:30, Paper MoAIP-02.5 | Add to My Program |
Design for Hip Abduction Assistive Device Based on Relationship between Hip Joint Motion and Torque During Running |
|
Lee, Myunghyun | Agency for Defense Development |
Hong, Man Bok | Agency for Defense Development |
Kim, Gwang Tae | Agency for Defense Development |
Kim, Seonwoo | Agency for Defense Development |
Keywords: Physically Assistive Devices, Human Performance Augmentation, Mechanism Design
Abstract: Numerous attempts have been made to reduce metabolic energy while running with the help of assistive devices. A majority of studies on the assistive devices have focused on the assisting torque in the sagittal plane. In the case of running, however, the abduction torque in the frontal plane at the hip joint is greater than the flexion/extension torque in the sagittal plane. During running, as does an elastic body, the abduction torque and the motion of the hip joint have a linear relationship, but are opposite in direction. It is expected that the hip abduction torque can be assisted with a simple passive method by using an elastic body that reflects the movement characteristics of the hip joint. In this study, therefore, a system to assist hip abduction torque using a leaf spring was proposed with a prototype testing. While running with the assist system proposed, the leaf spring aids the abduction torque on the stance phase, and the torque is not generated due to the passive revolute joint on the swing phase. The joint angle is changed with respective to the rotation in the flexion/extension direction to prevent discomfort torque during swing phase and to increase the duration of the torque action during stance phase. A preliminary test was conducted on one subject using the prototype of the hip joint abduction torque assistive device. The participant with the assistive device reduced metabolic energy by 5% compared to the case without abduction torque assist while running at 2.5m/s. In order to increase the amount of metabolic reduction, the device shall be supplemented by system mass reduction and hip joint position optimization.
|
|
10:00-11:30, Paper MoAIP-02.6 | Add to My Program |
Dynamic Hand Proprioception Via a Wearable Glove with Fabric Sensors |
|
Behnke, Lily | Yale University |
Sanchez-Botero, Lina | Yale University |
Johnson, William | Yale University |
Agrawala, Anjali | Yale University |
Kramer-Bottiglio, Rebecca | Yale University |
Keywords: Wearable Robotics, Soft Sensors and Actuators, Soft Robot Materials and Design
Abstract: Continuous enhancement in wearable technologies has led to several innovations in the healthcare, virtual reality, and robotics sectors. One form of wearable technology is wearable sensors for kinematic measurements of human motion. However, measuring the kinematics of human movement is a challenging problem as wearable sensors need to conform to complex curvatures and deform without limiting the user's natural range of motion. In fine motor activities, such challenges are further exacerbated by the dense packing of several joints, coupled joint motions, and relatively small deformations. This work presents the design, fabrication, and characterization of a thin, breathable sensing glove capable of reconstructing fine motor kinematics. The fabric glove features capacitive sensors made from layers of conductive and dielectric fabrics, culminating in a non-bulky and discrete glove design. This study demonstrates that the glove can reconstruct the joint angles of the wearer with a root mean square error of 7.2 degrees, indicating promising applicability to dynamic pose reconstruction for wearable technology and robot teleoperation.
|
|
10:00-11:30, Paper MoAIP-02.7 | Add to My Program |
A Wearable Robotic Rehabilitation System for Neuro-Rehabilitation Aimed at Enhancing Mediolateral Balance |
|
Yu, Zhenyuan | North Carolina State University |
Nalam, Varun | North Carolina State University |
Alili, Abbas | NC State University |
Huang, He (Helen) | North Carolina State University and University of North Carolina |
Keywords: Rehabilitation Robotics, Prosthetics and Exoskeletons, Physical Human-Robot Interaction
Abstract: There is increasing evidence of the role of compromised mediolateral balance in falls and the need for rehabilitation specifically focused on mediolateral direction for various populations with motor deficits. To address this need, we have developed a neurorehabilitation platform by integrating a wearable robotic hip abduction-adduction exoskeleton with a visual interface. The platform is expected to influence and rehabilitate the underlying visuomotor mechanisms in individuals by having users perform motion tasks based on visual feedback while the robot applies various controlled resistances governed by the admittance controller implemented in the robot. A preliminary study was performed on 3 non disabled individuals to analyze the performance of the system and observe any adaptation in hip joint kinematics and kinetics as a result of the visuomotor training under 4 different admittance conditions. All three subjects exhibited increased consistency of motion during training and interlimb coordination to achieve motion tasks, demonstrating the utility of the system. Further analysis of observed human-robot torque interactions and electromyography (EMG) signals, and its implication in neurorehabilitation aimed at populations suffering from chronic stroke are discussed.
|
|
10:00-11:30, Paper MoAIP-02.8 | Add to My Program |
Analysis of Lower Extremity Shape Characteristics in Various Walking Situations for the Development of Wearable Robot |
|
Park, Joohyun | KAIST, KIST |
Choi, Ho Seon | Yonsei University |
In, HyunKi | Korea Institute of Science and Technology |
Keywords: Datasets for Human Motion, Wearable Robotics, Physical Human-Robot Interaction
Abstract: A strap is a frequently utilized component for securing wearable robots to their users in order to facilitate force transmission between humans and the devices. For the appropriate function of the wearable robot, the pressure between the strap and the skin should be maintained appropriately. Due to muscle contraction, the cross-section area of the human limb changes according to the movement of the muscle. The cross-section area change causes the change in the pressure applied by the strap. Therefore, for a new strap design to resolve this, it is necessary to understand the shape change characteristics of the muscle where the strap is applied. In this paper, the change in the circumference of the thigh and the calf during walking was measured and analyzed by multiple string pot sensors. With a treadmill and string pot sensors using potentiometers, torsion springs, and leg circumference changes were measured for different walking speeds and slopes. And, gait cycles were divided according to a signal from the FSR sensor inserted in the right shoe. From the experimental results, there were changes in the circumference of about 8.5mm and 3mm for the thigh and the calf, respectively. And we found tendencies in various walking circumstances such as walking speed and degree of the slope. It is confirmed that they can be used for estimation algorithms of gait cycles or gait circumstances.
|
|
10:00-11:30, Paper MoAIP-02.9 | Add to My Program |
Finding Biomechanically Safe Trajectories for Robot Manipulation of the Human Body in a Search and Rescue Scenario |
|
Peiros, Lizzie | University of California, San Diego |
Chiu, Zih-Yun | University of California, San Diego |
Zhi, Yuheng | University of California, San Diego |
Shinde, Nikhil | University of California San Diego |
Yip, Michael C. | University of California, San Diego |
Keywords: Physical Human-Robot Interaction, Modeling and Simulating Humans, Dynamics
Abstract: There has been increasing awareness of the difficulties in reaching and extracting people from mass casualty scenarios, such as those arising from natural disasters. While platforms have been designed to consider reaching casualties and even carrying them out of harm's way, the challenge of physically repositioning a casualty from its found configuration to one suitable for extraction has not been explicitly explored. Furthermore, this type of planning problem needs to incorporate biomechanical safety considerations for the casualty. Thus, we present the problem formulation for biomechanically safe trajectory generation for repositioning limbs of unconscious human casualties. We describe biomechanical safety in robotics terms, describe mechanical descriptions of the dynamics of the robot-human coupled system, and the planning and trajectory optimization process that considers this coupled and constrained system. We finally evaluate the work over several variations of the problem and provide a live example. This work provides a crucial part of search and rescue that can be used in conjunction with past and present works involving robots and vision systems designed for search and rescue.
|
|
10:00-11:30, Paper MoAIP-02.10 | Add to My Program |
Mechanical Characterisation of Woven Pneumatic Active Textile |
|
Marshall, Ruby | The University of Edinburgh |
Souppez, Jean-Baptiste | Aston University |
Khan, Mariya | Aston University |
Viola, Ignazio Maria | University of Edinburgh |
Nabae, Hiroyuki | Tokyo Institute of Technology |
Suzumori, Koichi | Tokyo Institute of Technology |
Stokes, Adam Andrew | University of Edinburgh |
Giorgio-Serchi, Francesco | University of Edinburgh |
Keywords: Wearable Robotics, Soft Robot Materials and Design, Hydraulic/Pneumatic Actuators
Abstract: Active textiles have shown promising applications in soft robotics owing to their tunable stiffness and design flexibility. Given the breadth of the design space for planar and spatial arrangements of these woven structures, a rig- orous and generalizable characterisation of these systems is not yet available. In order to characterize the response of a stereotypical woven pattern to actuation, we undertake a parametric study of plain weave active fabrics and characterise their mechanical properties in accordance with the relevant ISO standards for varying muscle densities and both monotonically increasing/decreasing pressures. Tensile and flexural tests were undertaken on five plain weave samples made of a nylon 6 (polyamide) warp and EM20 McKibben S-muscle weft, for input pressures ranging from 0.00 MPa to 0.60 MPa, at three muscle densities, namely 100 m^-1, 74.26 m^-1 and 47.62 m^-1. Contrary to intuition, we find that a lower muscle density has a more prominent impact on the thickness, but a significantly lesser one on length, highlighting a critical dependency on the relative orientation among the loading, the passive textile and the muscle filaments. Hysteretic behaviour as large as 10% of the longitudinal contraction is observed on individual filaments and woven textiles, and its onset is identified in the shear between the rubber tube and the outer sleeve of the artificial muscle. Hysteresis is shown to be muscle density-dependent and responsible for a strongly asymmetrical response upon different pressure inputs. These findings provide new insights into the mechanical properties of active textiles with tunable stiffness, and may contribute to future developments in wearable technologies and biomedical devices.
|
|
10:00-11:30, Paper MoAIP-02.11 | Add to My Program |
Adaptive Symmetry Reference Trajectory Generation in Shared Autonomy for Active Knee Orthosis |
|
Liu, Rongkai | University of Science and Technology of China(USTC) |
Ma, Tingting | Chinese Academy of Sciences |
Yao, Ningguang | University of Science and Technology of China |
Li, Hao | Chinese Academy of Sciences |
Zhao, Xinyan | University of Science and Technology of China |
Wang, Yu | University of Science and Technology of China |
Pan, Hongqing | Hefei Institutes of Physical Science |
Song, Quanjun | Chinese Academy of Science |
Keywords: Human-Centered Robotics, Rehabilitation Robotics, Human-Robot Collaboration
Abstract: Gait symmetry training plays an essential role in the rehabilitation of hemiplegic patients and robotics-based gait training has been widely accepted by patients and clinicians. Reference trajectory generation for the affected side using the motion data of the unaffected side is an important way to achieve this. However, online generation gait reference trajectory requires the algorithm to provide correct gait phase delay and could reduce the impact of measurement noise from sensors and input uncertainty from users. Based on an active knee orthosis (AKO) prototype, this work presents an adaptive symmetric gait trajectory generation framework for the gait rehabilitation of hemiplegic patients. Using the adaptive nonlinear frequency oscillators (ANFO) and movement primitives, we implement online gait pattern encoding and adaptive phase delay according to the real-time user input. A shared autonomy (SA) module with online input validation and arbitration has been designed to prevent undesired movements from being transmitted to the actuator on the affected side. The experimental results demonstrate the feasibility of the framework. Overall, this work suggests that the proposed method has the potential to perform gait symmetry rehabilitation in an unstructured environment and provide a kinematic reference for torque-assist AKO.
|
|
10:00-11:30, Paper MoAIP-02.12 | Add to My Program |
Data-Driven Modeling for Gait Phase Recognition in a Wearable Exoskeleton Using Estimated Forces (I) |
|
Park, Kyeong-Won | Republic of Korea Air Force Academy |
Choi, Jungsu | Yeungnam University |
Kong, Kyoungchul | Korea Advanced Institute of Science and Technology |
Keywords: Wearable Robots, AI-Based Methods, Human-Centered Robotics, Robust/Adaptive Control of Robotic Systems
Abstract: Accurate identification of gait phases is critical in effectively assessing the assistance provided by lower-limb exoskeletons. In this study, we propose a novel gait phase recognition system called ObsNet to analyze the gait of individuals with spinal cord injuries (SCI). To ensure the reliable use of exoskeletons, it is essential to maintain practicality and avoid exposing the system to unnecessary risks of fatigue, inaccuracy, or incompatibility with human-centered devices. Therefore, we propose a new approach to characterize exoskeletal-assisted gait by estimating forces on exoskeletal joints during walking. Although these estimated forces are potentially useful for detecting gait phases, their nonlinearities make it challenging for existing algorithms to generalize accurately. To address this challenge, we introduce a data-driven model that simultaneously captures both feature extraction and order dependencies, and enhance its performance through a threshold-based compensational method to filter out momentary errors. We evaluated the effectiveness of ObsNet through robotic walking experiments with two practical users with complete paraplegia. Our results indicate that ObsNet outperformed state-of-the-art methods that use joint information and other recurrent networks in identifying the gait phases of individuals with SCI (p < 0.05). We also observed reliable imitation of ground truth after compensation. Overall, our research highlights the potential of wearable technology to improve the daily lives of individuals with disabilities through accurate and stable state assessment.
|
|
MoAIP-03 Regular session, Hall E |
Add to My Program |
Clone of 'Collision Avoidance I' |
|
|
|
10:00-11:30, Paper MoAIP-03.1 | Add to My Program |
Dynamic Multi-Query Motion Planning with Differential Constraints and Moving Goals |
|
Gentner, Michael | Technical University of Munich and BMW AG |
Zillenbiller, Fabian | Technical University of Munich and BMW AG |
Kraft, André | BMW AG, Germany |
Steinbach, Eckehard | Technical University of Munich |
Keywords: Collision Avoidance, Motion and Path Planning, Industrial Robots
Abstract: Planning robot motions in complex environments is a fundamental research challenge and central to the autonomy, efficiency, and ultimately adoption of robots. While often the environment is assumed to be static, real-world settings, such as assembly lines, contain complex shaped, moving obstacles and changing target states. Therein robots must perform safe and efficient motions to achieve their tasks. In repetitive environments and multi-goal settings, reusable roadmaps can substantially reduce the overall query time. Most dynamic roadmap-based planners operate in state-time-space, which is computationally demanding. Interval-based methods store availabilities as node attributes and thereby circumvent the dimensionality increase. However, current approaches do not consider higher-order constraints, which can ultimately lead to collisions during execution. Furthermore, current approaches must replan when the goal changes. To this end, we propose a novel roadmap-based planner for systems with third-order differential constraints operating in dynamic environments with moving goals. We construct a roadmap with availabilities as node attributes. During the query phase, we use a Double-Integrator Minimum Time (DIMT) solver to recursively build feasible trajectories and accurately estimate arrival times. An exit node set in combination with a moving goal heuristic is used to efficiently find the fastest path through the roadmap to the moving goal. We evaluate our method with a simulated UAV operating in dynamic 2D environments and show that it also transfers to a 6-DoF manipulator. We show higher success rates than other state-of-the-art methods both in collision avoidance and reaching a moving goal.
|
|
10:00-11:30, Paper MoAIP-03.2 | Add to My Program |
Reactive and Safe Co-Navigation with Haptic Guidance |
|
Coffey, Mela | Boston University |
Zhang, Dawei | Boston University |
Tron, Roberto | Boston University |
Pierson, Alyssa | Boston University |
Keywords: Collision Avoidance, Telerobotics and Teleoperation, Human-Robot Collaboration
Abstract: We propose a co-navigation algorithm that enables a human and a robot to work together to navigate to a common goal. In this system, the human is responsible for making high-level steering decisions, and the robot, in turn, provides haptic feedback for collision avoidance and path suggestions while reacting to changes in the environment. Our algorithm uses optimized Rapidly-exploring Random Trees (RRT*) to generate paths to lead the user to the goal, via an attractive force feedback computed using a Control Lyapunov Function (CLF). We simultaneously ensure collision avoidance where necessary using a Control Barrier Function (CBF). We demonstrate our approach using simulations with a virtual pilot, and hardware experiments with a human pilot. Our results show that combining RRT* and CBFs is a promising tool for enabling collaborative human-robot navigation.
|
|
10:00-11:30, Paper MoAIP-03.3 | Add to My Program |
An MCTS-DRL Based Obstacle and Occlusion Avoidance Methodology in Robotic Follow-Ahead Applications |
|
Leisiazar, Sahar | Simon Fraser University |
Park, Edward J. | Simon Fraser University |
Lim, Angelica | Simon Fraser University |
Chen, Mo | Simon Fraser University |
Keywords: Robot Companions, Collision Avoidance, AI-Enabled Robotics
Abstract: We propose a novel methodology for robotic follow-ahead applications that address the critical challenge of obstacle and occlusion avoidance. Our approach effectively navigates the robot while ensuring avoidance of collisions and occlusions caused by surrounding objects. To achieve this, we developed a high-level decision-making algorithm that generates short-term navigational goals for the mobile robot. Monte Carlo Tree Search is integrated with a Deep Reinforcement Learning method to enhance the performance of the decision-making process and generate more reliable navigational goals. Through extensive experimentation and analysis, we demonstrate the effectiveness and superiority of our proposed approach in comparison to the existing follow-ahead human-following robotic methods. Our code is available at https://github.com/saharLeisiazar/follow-ahead-ros.
|
|
10:00-11:30, Paper MoAIP-03.4 | Add to My Program |
Proactive Model Predictive Control with Multi-Modal Human Motion Prediction in Cluttered Dynamic Environments |
|
Heuer, Lukas | Örebro University, Robert Bosch GmbH |
Palmieri, Luigi | Robert Bosch GmbH |
Rudenko, Andrey | Robert Bosch GmbH |
Mannucci, Anna | Robert Bosch GmbH Corporate Research |
Magnusson, Martin | Örebro University |
Arras, Kai Oliver | Bosch Research |
Keywords: Collision Avoidance, Human-Aware Motion Planning, Motion and Path Planning
Abstract: For robots navigating in dynamic environments, exploiting and understanding uncertain human motion prediction is key to generate efficient, safe and legible actions. The robot may perform poorly and cause hindrances if it does not reason over possible, multi-modal future social interactions. With the goal of further enhancing autonomous navigation in cluttered environments, we propose a novel formulation for nonlinear model predictive control including multi-modal predictions of human motion. As a result, our approach leads to less conservative, smooth and intuitive human-aware navigation with reduced risk of collisions, and shows a good balance between task efficiency, collision avoidance and human comfort. To show its effectiveness, we compare our approach against the state of the art in crowded simulated environments, and with real-world human motion data from the THOR dataset. This comparison shows that we are able to improve task efficiency, keep a larger distance to humans and significantly reduce the collision time, when navigating in cluttered dynamic environments. Furthermore, the method is shown to work robustly with different state-of-the-art human motion predictors.
|
|
10:00-11:30, Paper MoAIP-03.5 | Add to My Program |
A Novel Obstacle-Avoidance Solution with Non-Iterative Neural Controller for Joint-Constrained Redundant Manipulators |
|
Li, Weibing | Sun Yat-Sen University |
Yi, Zilian | Sun Yat-Sen University |
Zou, Yanying | Sun Yat-Sen University |
Wu, Haimei | Sun Yat-Sen University |
Yang, Yang | Sun Yat-Sen University |
Pan, Yongping | Sun Yat-Sen University |
Keywords: Collision Avoidance, Optimization and Optimal Control, Redundant Robots
Abstract: Obstacle avoidance (OA) and joint-limit avoidance (JLA) are essential for redundant manipulators to ensure safe and reliable robotic operations. One solution to OA and JLA is to incorporate the involved constraints into a quadratic programming (QP), by solving which OA and JLA can be achieved. There exist a few non-iterative solvers such as zeroing neural networks (ZNNs), which can solve each sampled QP problem using only one iteration, yet no solution is suitable for OA and JLA due to the absence of some derivative information. To tackle these issues, this paper proposes a novel solution with a non-iterative neural controller termed NCP-ZNN for joint-constrained redundant manipulators. Unlike iterative methods, the neural controller involving derivative information proposed in this paper possesses some positive features including non-iterative computing and convergence with time. In this paper, the reestablished OA-JLA scheme is first introduced. Then, the design details of the neural controller are presented. After that, some comparative simulations based on a PA10 robot and an experiment based on a Franka Emika Panda robot are conducted, demonstrating that the proposed neural controller is more competent in OA and JLA.
|
|
10:00-11:30, Paper MoAIP-03.6 | Add to My Program |
TTC4MCP: Monocular Collision Prediction Based on Self-Supervised TTC Estimation |
|
Li, Changlin | Shanghai Jiao Tong University |
Qian, Yeqiang | Shanghai Jiao Tong University |
Sun, Cong | Shanghai Jiao Tong University |
Yan, Weihao | Shanghai Jiao Tong University |
Wang, Chunxiang | Shanghai Jiaotong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Collision Avoidance, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Vision-based collision prediction for autonomous driving is a challenging task due to the dynamic movement of vehicles and diverse types of obstacles. Most existing methods rely on object detection algorithms, which only predict predefined collision targets, such as vehicles and pedestrians, and cannot anticipate emergencies caused by unknown obstacles. To address this limitation, we propose a novel approach using pixel-wise time-to-collision (TTC) estimation for monocular collision prediction (TTC4MCP). Our approach predicts TTC and optical flow from monocular images and identifies potential collision areas using feature clustering and motion analysis. To overcome the challenge of training TTC estimation models without ground truth data in new scenes, we propose a self-supervised TTC training method, enabling collision prediction in a wider range of scenarios. TTC4MCP is evaluated on multiple road conditions and demonstrates promising results in terms of accuracy and robustness.
|
|
10:00-11:30, Paper MoAIP-03.7 | Add to My Program |
DAMON: Dynamic Amorphous Obstacle Navigation Using Topological Manifold Learning and Variational Autoencoding |
|
Dastider, Apan | University of Central Florida |
Mingjie, Lin | University of Central Florida |
Keywords: Collision Avoidance, Deep Learning Methods, Motion and Path Planning
Abstract: DAMON leverages manifold learning and vari- ational autoencoding to achieve obstacle avoidance, allowing for motion planning through adaptive graph traversal in a pre-learned low-dimensional hierarchically-structured manifold graph that captures intricate motion dynamics between a robotic arm and its obstacles. This versatile and reusable approach is applicable to various collaboration scenarios. The primary advantage of DAMON is its ability to embed information in a low-dimensional graph, eliminating the need for repeated computation required by current sampling-based methods. As a result, it offers faster and more efficient motion planning with significantly lower computational overhead and memory footprint. In summary, DAMON is a breakthrough methodology that addresses the challenge of dynamic obstacle avoidance in robotic systems and offers a promising solution for safe and efficient human-robot collaboration. Our approach has been experimentally validated on a 7-DoF robotic manipulator in both simulation and physical settings. DAMON enables the robot to learn and generate skills for avoiding previously-unseen obstacles while achieving predefined objectives. We also optimize DAMON’s design parameters and performance using an analytical framework. Our approach outperforms mainstream methodologies, including RRT, RRT*, Dynamic RRT*, L2RRT, and MpNet, with 40% more trajectory smoothness and over 65% improved latency performance, on average.
|
|
10:00-11:30, Paper MoAIP-03.8 | Add to My Program |
Gatekeeper: Online Safety Verification and Control for Nonlinear Systems in Dynamic Environments |
|
Agrawal, Devansh | University of Michigan |
Chen, Ruichang | University of Michigan |
Panagou, Dimitra | University of Michigan, Ann Arbor |
Keywords: Collision Avoidance, Motion and Path Planning
Abstract: This paper presents the gatekeeper algorithm, a real-time and computationally-lightweight method to ensure that nonlinear systems can operate safely in dynamic environments despite limited perception. Gatekeeper integrates with existing path planners and feedback controllers by introducing an additional verification step that ensures that proposed trajectories can be executed safely, despite nonlinear dynamics subject to bounded disturbances, input constraints and partial knowledge of the environment. Our key contribution is that (A) we propose an algorithm to recursively construct committed trajectories, and (B) we prove that tracking the committed trajectory ensures the system is safe for all time into the future. The method is demonstrated on a complicated firefighting mission in a dynamic environment, and compares against the state-of-the-art techniques for similar problems.
|
|
10:00-11:30, Paper MoAIP-03.9 | Add to My Program |
Combinatorial Disjunctive Constraints for Obstacle Avoidance in Path Planning |
|
Garcia, Raul | Rice University |
Hicks, Illya V. | Rice University |
Huchette, Joey | Google Research |
Keywords: Collision Avoidance, Motion and Path Planning, Optimization and Optimal Control
Abstract: We present a new approach for modeling avoidance constraints in 2D environments, in which waypoints are assigned to obstacle-free polyhedral regions. Constraints of this form are often formulated as mixed-integer programming (MIP) problems employing big-M techniques - however, these are generally not the strongest formulations possible with respect to the MIP's convex relaxation (so called ideal formulations), potentially resulting in larger computational burden. We instead model obstacle avoidance as combinatorial disjunctive constraints and leverage the independent branching scheme to construct small, ideal formulations. As our approach requires a biclique cover for an associated graph, we exploit the structure of this class of graphs to develop a fast subroutine for obtaining biclique covers in polynomial time. We also contribute an open-source Julia library named ClutteredEnvPathOpt to facilitate computational experiments of MIP formulations for obstacle avoidance. Experiments have shown our formulation is more compact and remains competitive on a number of instances compared with standard big-M techniques, for which solvers possess highly optimized procedures.
|
|
10:00-11:30, Paper MoAIP-03.10 | Add to My Program |
Reachability-Aware Collision Avoidance for Tractor-Trailer System with Non-Linear MPC and Control Barrier Function |
|
Tang, Yucheng | University of Applied Sciences Karlsruhe |
Mamaev, Ilshat | Karlsruhe Institute of Technology |
Qin, Jing | Karlsruhe University of Applied Sciences |
Wurll, Christian | Karlsruhe University of Applied Sciences |
Hein, Björn | Karlsruhe University of Applied Sciences |
Keywords: Collision Avoidance, Optimization and Optimal Control, Nonholonomic Motion Planning
Abstract: This paper proposes a reachability-aware model predictive control with a discrete control barrier function for backward obstacle avoidance for a tractor-trailer system. The framework incorporates the state-variant reachable set obtained through sampling-based reachability analysis and symbolic regression into the objective function of model predictive control. By optimizing the intersection of the reachable set and iterative non-safe region generated by the control barrier function, the system demonstrates better performance in terms of safety with a constant decay rate, while enhancing the feasibility of the optimization problem. The proposed algorithm improves real-time performance due to a shorter horizon and outperforms the state-of-the-art algorithms in the simulation environment and on a real robot.
|
|
10:00-11:30, Paper MoAIP-03.11 | Add to My Program |
Continuous Implicit SDF Based Any-Shape Robot Trajectory Optimization |
|
Zhang, Tingrui | Zhejiang University |
Wang, Jingping | Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Alan | Fan'gang |
Gao, Fei | Zhejiang University |
Keywords: Collision Avoidance, Whole-Body Motion Planning and Control, Motion and Path Planning
Abstract: Optimization-based trajectory generation methods are widely used in whole-body planning for robots. However, existing work either oversimplifies the robot’s geometry and environment representation, resulting in a conservative trajectory or suffers from a huge overhead in maintaining additional information such as the Signed Distance Field (SDF). To bridge the gap, we consider the robot as an implicit function, with its surface boundary represented by the zero-level set of its SDF. We further employ another implicit function to lazily compute the signed distance to the swept volume generated by the robot and its trajectory. The computation is efficient by exploiting continuity in space-time, and the implicit function guarantees precise and continuous collision evaluation even for nonconvex robots with complex surfaces. We also propose a trajectory optimization pipeline applicable to the implicit SDF. Simulation and real-world experiments validate the high performance of our approach for arbitrarily shaped robot trajectory optimization. The code will be released at https://github.com/ZJU-FAST-Lab/Implicit-SDF-Planner.
|
|
10:00-11:30, Paper MoAIP-03.12 | Add to My Program |
Robo-Centric ESDF: A Fast and Accurate Whole-Body Collision Evaluation Tool for Any-Shape Robotic Planning |
| |