|
TuAT1 |
Room T1 |
Computer Vision for Automation |
Regular session |
Chair: Gammell, Jonathan | University of Oxford |
|
10:00-10:15, Paper TuAT1.1 | |
>Proactive Estimation of Occlusions and Scene Coverage for Planning Next Best Views in an Unstructured Representation |
> Video Attachment
|
|
Border, Rowan | University of Oxford |
Gammell, Jonathan | University of Oxford |
Keywords: Computer Vision for Automation, Visual Servoing, Computer Vision for Other Robotic Applications
Abstract: The process of planning views to observe a scene is known as the Next Best View (NBV) problem. Approaches often aim to obtain high-quality scene observations while reducing the number of views, travel distance and computational cost. Considering occlusions and scene coverage can significantly reduce the number of views and travel distance required to obtain an observation. Structured representations (e.g., a voxel grid or surface mesh) typically use raycasting to evaluate the visibility of represented structures but this is often computationally expensive. Unstructured representations (e.g., point density) avoid the computational overhead of maintaining and raycasting a structure imposed on the scene but as a result do not proactively predict the success of future measurements. This paper presents proactive solutions for handling occlusions and considering scene coverage with an unstructured representation. Their performance is evaluated by extending the density-based Surface Edge Explorer (SEE). Experiments show that these techniques allow an unstructured representation to observe scenes with fewer views and shorter distances while retaining high observation quality and low computational cost.
|
|
10:15-10:30, Paper TuAT1.2 | |
>Indirect Object-To-Robot Pose Estimation from an External Monocular RGB Camera |
> Video Attachment
|
|
Tremblay, Jonathan | Nvidia |
Tyree, Stephen | NVIDIA |
Mosier, Terry | NVIDIA |
Birchfield, Stan | NVIDIA Corporation |
Keywords: Perception for Grasping and Manipulation, Computer Vision for Automation
Abstract: We present a robotic grasping system that uses a single external monocular RGB camera as input. The object-to-robot pose is computed indirectly by combining the output of two neural networks: one that estimates the object-to-camera pose, and another that estimates the robot-to-camera pose. Both networks are trained entirely on synthetic data, relying on domain randomization to bridge the sim-to-real gap. Because the latter network performs online camera calibration, the camera can be moved freely during execution without affecting the quality of the grasp. Experimental results analyze the effect of camera placement, image resolution, and pose refinement in the context of grasping several household objects. We also present results on a new set of 28 textured household toy grocery objects, which have been selected to be accessible to other researchers. To aid reproducibility of the research, we offer 3D scanned textured models, along with pre-trained weights for pose estimation.
|
|
10:30-10:45, Paper TuAT1.3 | |
>Peg-In-Hole Using 3D Workpiece Reconstruction and CNN-Based Hole Detection |
> Video Attachment
|
|
Nigro, Michelangelo | University of Basilicata |
Sileo, Monica | University of Basilicata |
Pierri, Francesco | Università Della Basilicata |
Genovese, Katia | University of Basilicata |
Bloisi, Domenico | University of Basilicata |
Caccavale, Fabrizio | Università Degli Studi Della Basilicata |
Keywords: Compliant Assembly, Computer Vision for Manufacturing, Compliance and Impedance Control
Abstract: This paper presents a method to cope with autonomous assembly tasks in the presence of uncertainties. To this aim, a Peg-in-Hole operation is considered, where the target workpiece position is unknown and the peg-hole clearance is small. Deep learning based hole detection and 3D surface reconstruction techniques are combined for accurate workpiece localization. In detail, the hole is detected by using a convolutional neural network (CNN), while the target workpiece surface is reconstructed via 3D-Digital Image Correlation (3D-DIC). Peg insertion is performed via admittance control that confers the suitable compliance to the peg. Experiments on a collaborative manipulator confirm that the proposed approach can be promising for achieving a better degree of autonomy for a class of robotic tasks in partially structured environments.
|
|
10:45-11:00, Paper TuAT1.4 | |
>Automated Folding of a Deformable Thin Object through Robot Manipulators |
> Video Attachment
|
|
Cui, Zhenxi | The Hong Kong Polytechnic University |
Huang, Kaicheng | The Hong Kong Polytechnic University |
Lu, Bo | The Chinese University of Hong Kong |
Chu, Henry | The Hong Kong Polytechnic University |
Keywords: Computer Vision for Automation, Visual Servoing, Service Robots
Abstract: This paper presents a model-free approach to automate folding of a deformable object with robot manipulators, where its surface was labeled with markers to facilitate vision-based control and alignment. While performing the task involves solving nonconvex or nonlinear terms, in this paper, linearization was first performed to approximate the problem. By using the Levenberg–Marquardt algorithm, the task of folding a deformable thin object can be reformulated as a convex optimization problem. The mapping relationship between the motions of markers on the image and the joint inputs of the robot manipulator was evaluated through a Jacobian matrix. To account for the uncertainty in the matrix due to the deformable object, a two-stage evaluation scheme, which consists of approximate-rigidity rule and Broyden-update rule, was performed. The performance and the robustness of the proposed approach was examined through simulation using Bullet simulator. The video of the simulation can be retrieved from the attachment. The results confirm that the thin object can be precisely folded together based on different markers labeled on the surface.
|
|
11:00-11:15, Paper TuAT1.5 | |
>Uncertainty Aware Texture Classification and Mapping Using Soft Tactile Sensors |
|
Amini, Alexander | Massachusetts Institute of Technology |
Lipton, Jeffrey | University of Washington |
Rus, Daniela | MIT |
Keywords: Computer Vision for Automation
Abstract: Spatial mapping of surface roughness is a critical enabling technology for automating adaptive sanding operations. We leverage GelSight sensors to convert the problem of surface roughness measurement into a vision classification problem. By combining GelSight sensors with Optitrack positioning systems we attempt to develop an accurate spatial mapping of surface roughness that can compare to human touch, the current state of the art for large scale manufacturing. To perform the classification, we propose the use of Bayesian neural networks in conjunction with uncertainty-aware prediction. We compare the sensor and network with a human baseline for both absolute and relative texture classification. To establish a baseline, we collected performance data from humans on their ability to classify materials into 60, 120, and 180 grit sanded pine boards. Our results showed that the probabilistic network performs at the level of human touch for absolute and relative classifications. Using the Bayesian approach enables establishing a confidence bound on our prediction. We were able to integrate the sensor with Optitrack to provide a spatial map of sanding grit applied to pine boards. From this result, we can conclude that GelSight with Bayesian neural networks can learn accurate representations for sanding, and could be a significant enabling technology for closed loop robotic sanding operations.
|
|
11:15-11:30, Paper TuAT1.6 | |
>Estimating Motion Codes from Demonstration Videos |
|
Alibayev, Maxat | University of South Florida |
Paulius, David Andres | University of South Florida |
Sun, Yu | University of South Florida |
Keywords: Computer Vision for Automation, Imitation Learning, Visual Learning
Abstract: A motion taxonomy can encode manipulations as a binary-encoded representation, which we refer to as motion codes. These motion codes innately represent a manipulation action in an embedded space that describes the motion's mechanical features, including contact and trajectory type. The key advantage of using motion codes for embedding is that motions can be more appropriately defined with robotic-relevant features, and their distances can be more reasonably measured using these motion features. In this paper, we develop a deep learning pipeline to extract motion codes from demonstration videos in an unsupervised manner so that knowledge from these videos can be properly represented and used for robots. Our evaluations show that motion codes can be extracted from demonstrations of action in the EPIC-KITCHENS dataset.
|
|
11:15-11:30, Paper TuAT1.7 | |
>HDR Reconstruction Based on the Polarization Camera |
|
Xuesong, Wu | National University of Defense Technology |
Zhang, Hong | University of Alberta |
Hu, Xiaoping | National University of Defense Technology |
Shakeri, Moein | University of Alberta |
Chen, Fan | National University of Defense Technology |
Ting, Juiwen | University of Alberta |
Keywords: Computer Vision for Other Robotic Applications, Computational Geometry, Computer Vision for Automation
Abstract: The recent development of the on-chip micropolarizer technology has made it possible to acquire-with the same ease of operation as a conventional camera-spatially aligned and temporally synchronized polarization images simultaneously in four orientations. This development has created new opportunities for interesting applications including those in robotics. In this paper, we investigate the use of this sensor technology in high-dynamic-range (HDR) imaging. Specifically, observing that natural light can be attenuated differently by varying the direction of the polarization filter, we treat the multiple images captured by the polarization camera as a set captured at different exposure times, useful to the reconstruction of an HDR image. In our approach, we first study the radiometric model of the polarization camera, and relate the polarizer direction, degree and angle of polarization of light to the exposure time of a pixel in the polarization images. Subsequently, by applying the standard radiometric calibration procedure of a camera, we recover the camera response function. With multiple polarization images at known pixel-specific exposure times, we can then proceed to estimate the irradiance maps from the images and generate an HDR image. Two datasets are created to evaluate our approach, and experimental results show the dynamic range by our approach can be increased by an amount dependent on light polarization. We also use two robotics experiments on feature matching and visual odometry to demonstrate the potential benefit of this increased dynamic range.
|
|
TuAT2 |
Room T2 |
Manufacturing and Logistics |
Regular session |
Chair: Arpenti, Pierluigi | CREATE Consortium |
Co-Chair: Rocco, Paolo | Politecnico Di Milano |
|
10:00-10:15, Paper TuAT2.1 | |
>Zero-Tuning Grinding Process Methodology of Cyber-Physical Robot System |
|
Yang, Hsuan-Yu | National Taiwan University |
Shih, Chih-Hsuan | National Taiwan University |
Lo, Yuan Chieh | Industrial Technology Research Institute |
Lian, Feng-Li | National Taiwan University |
Keywords: Industrial Robots, Intelligent and Flexible Manufacturing, AI-Based Methods
Abstract: Industrial robots play potential and important roles on labor-intensive and high-risk jobs. For example, typical industrial robots have been used in grinding process. However, the automatic grinding process by robots is a complex process because it still relies on skillful engineers to adaptively adjust several key parameters. Moreover, it might take a lot of time and effort for generating better grinding quality. Hence, this paper proposes a new framework of cyber-physical robot system with zero-tuning methodology which can automatically optimize the process parameters of robotic grinding process according to the desired quality. To overcome the gaps between real world and simulation leading to the uncertainty, one proper system calibration can help to generate real environment position precisely, and the cloud database is constructed to record the relative data during the grinding process simultaneously. The proposed zero-tuning methodology combines both neural network (NN) model and genetic algorithm (GA) and is designed to generate the best combination of related corresponding parameters to meet the desired quality. Experimental results show that the average error of the output result is 8.93%. To compare the CNC machine, our solution play is more potential to apply in plumbing industry.
|
|
10:15-10:30, Paper TuAT2.2 | |
>An External Stabilization Unit for High-Precision Applications of Robot Manipulators |
> Video Attachment
|
|
Berninger, Tobias Franz Christian | TU Munich |
Slimak, Tomas | TU Munich |
Weber, Tobias | Boeing Research & Technology |
Rixen, Daniel | Technische Universität München |
Keywords: Automation at Micro-Nano Scales, Actuation and Joint Mechanisms, Industrial Robots
Abstract: Because of their large workspace, robot manipulators have the potential to be used for high precision non-contact manufacturing processes, such as laser cutting or welding, on large complex work pieces. However, most industrial manipulators are not able to provide the necessary accuracy requirements. Mainly because of their flexible structures, they are subject to point to point positioning errors and also vibration errors on a smaller scale. The vibration issues are especially hard to deal with. Many published solutions propose to modify the robot's own control system to deal with these problems. However, most modern control techniques require high fidelity models of the underlying system dynamics, which are quite difficult to obtain for robot manipulators. In this work, we propose an external stabilization unit with an additional set of actuators/sensors to stabilize the process tool, similar to Optical Image Stabilization systems. We show that, because of collocated control, a model of the robot's own dynamic behavior is not needed to achieve high tracking accuracy. We also provide testing results of a prototype stabilizing a dummy tool in two degrees of freedom on a UR10 robot, which reduced its tracking error by two orders of magnitude below 20 micrometers.
|
|
10:30-10:45, Paper TuAT2.3 | |
>CUHK-AHU Dataset: Promoting Practical Self-Driving Applications in the Complex Airport Logistics, Hill and Urban Environments |
|
Chen, Wen | The Chinese University of Hong Kong |
Liu, Zhe | University of Cambridge |
Zhao, Hongchao | The Chinese University of Hong Kong |
Zhou, Shunbo | The Chinese University of Hong Kong |
Li, Haoang | The Chinese University of Hong Kong |
Liu, Yunhui | Chinese University of Hong Kong |
Keywords: Industrial Robots, SLAM, Mapping
Abstract: This paper presents a novel dataset targeting three types of challenging environments for autonomous driving, i.e., the industrial logistics environment, the undulating hill environment and the mixed complex urban environment. To the best of the author's knowledge, similar dataset has not been published in the existing public datasets, especially for the logistics environment collected in the functioning Hong Kong Air Cargo Terminal (HACT). Structural changes always suddenly appeared in the airport logistics environment due to the frequent movement of goods in and out. In the structureless and noisy hill environment, the non-flat plane movement is usual. In the mixed complex urban environment, the highly dynamic residence blocks, sloped roads and highways are included in a single collection. The presented dataset includes LiDAR, image, IMU and GPS data by repeatedly driving along several paths to capture the structural changes, the illumination changes and the different degrees of undulation of the roads. The baseline trajectories are provided which are estimated by Simultaneous Localization and Mapping (SLAM).
|
|
10:45-11:00, Paper TuAT2.4 | |
>A Flexible Robotic Depalletizing System for Supermarket Logistics |
> Video Attachment
|
|
Caccavale, Riccardo | Università Di Napoli "Federico II" |
Arpenti, Pierluigi | CREATE Consortium |
Paduano, Gianmarco | Scuola Politecnica E Delle Scienze Di Base Federico II Di Napoli |
Fontanelli, Giuseppe Andrea | University of Naples Federico II |
Lippiello, Vincenzo | University of Naples FEDERICO II |
Villani, Luigi | Univ. Napoli Federico II |
Siciliano, Bruno | Univ. Napoli Federico II |
Keywords: Logistics, Intelligent and Flexible Manufacturing, AI-Based Methods
Abstract: Depalletizing robotic systems are commonly deployed to automatize and speed-up parts of logistic processes. Despite this, the necessity to adapt the preexisting logistic processes to the automatic systems often impairs the application of such robotic solutions to small business realities like supermarkets. In this work we propose a robotic depalletizing system designed to be easily integrated into supermarket logistic processes. The system has to schedule, monitor and adapt the depalletizing process considering both on-line perceptual information given by non-invasive sensors and constraints provided by the high-level management system or by a supervising user. We describe the overall system discussing two case studies in the context of a supermarket logistic process. We show how the proposed system can manage multiple depalletizing strategies and multiple logistic requests.
|
|
11:00-11:15, Paper TuAT2.5 | |
>A Reconfigurable Gripper for Robotic Autonomous Depalletizing in Supermarket Logistics |
> Video Attachment
|
|
Fontanelli, Giuseppe Andrea | University of Naples Federico II |
Paduano, Gianmarco | Scuola Politecnica E Delle Scienze Di Base Federico II Di Napoli |
Caccavale, Riccardo | Università Di Napoli "Federico II" |
Arpenti, Pierluigi | CREATE Consortium |
Lippiello, Vincenzo | University of Naples FEDERICO II |
Villani, Luigi | Univ. Napoli Federico II |
Siciliano, Bruno | Univ. Napoli Federico II |
Keywords: Logistics, Grippers and Other End-Effectors, Mechanism Design
Abstract: Automatic depalletizing is becoming a practice widely applied in some warehouses to automatize and speedup the logistics. On the other hand, the necessity to adapt the preexisting logistic lines to a custom automatic system can be a limit for the application of robotic solutions into smaller facilities like supermarkets. In this work, we tackle this issue proposing a flexible and adaptive gripper for robotic depalletizing. The gripper has been designed to be assembled on the end-tip of an industrial robotic arm. A novel patent-pending mechanism allows grasping boxes and products from both the upper and the lateral side enabling the depalletizing also of boxes with complex shape. Moreover, the gripper is reconfigurable with five actuated degrees of freedom, that are automatically controlled using the embedded sensors to adapt grasping to different shapes and weights.
|
|
11:15-11:30, Paper TuAT2.6 | |
>Combining Speed and Separation Monitoring with Power and Force Limiting for Safe Collaborative Robotics Applications |
> Video Attachment
|
|
Lucci, Niccolò | Politecnico Di Milano |
Lacevic, Bakir | University of Sarajevo |
Zanchettin, Andrea Maria | Politecnico Di Milano |
Rocco, Paolo | Politecnico Di Milano |
Keywords: Robot Safety, Physical Human-Robot Interaction, Industrial Robots
Abstract: Enabling humans and robots to safely work close to each other deserves careful consideration. With the publication of ISO directives on this matter, two different strategies, namely the Speed and Separation Monitoring and the Power and Force Limiting, have been proposed. This paper proposes a method to efficiently combine the two aforementioned safety strategies for collaborative robotics operations. By exploiting the combination of the two, it is then possible to achieve higher levels of productivity, still preserving safety of the human operators. In a nutshell, the state of motion of each point of the robot is monitored so that at every time instant the robot is able to modulate its speed to eventually come into contact with a body region of the human, consistently with the corresponding biomechanical limit. Validation experiments have been conducted to quantify the benefits of this newly developed strategy with respect to the state-of-the-art.
|
|
TuAT3 |
Room T3 |
Scheduling |
Regular session |
Chair: Barton, Kira | University of Michigan at Ann Arbor |
Co-Chair: Chakraborty, Nilanjan | Stony Brook University |
|
10:00-10:15, Paper TuAT3.1 | |
>Distributed Near-Optimal Multi-Robots Coordination in Heterogeneous Task Allocation |
|
Li, Qinyuan | Swinburne University of Technology |
Li, Minyi | RMIT |
Vo, Bao Quoc | Swinburne University of Technology |
Kowalczyk, Ryszard | Swinburne University of Technology |
Keywords: Mechanism Design, Multi-Robot Systems, Planning, Scheduling and Coordination
Abstract: This paper explores the heterogeneous task allocation problem in Multi-robot systems. A game-theoretic formulation of the problem is proposed to align the goal of individual robots with the system objective. The concept of Nash equilibrium is applied to define a desired solution for the task allocation problem in which each robot can allocate itself to an appropriate task group. We also introduce a market-based distributed mechanism, called DisNE, to allow the robots to exchange messages with tasks and move between task groups, eventually reaching an equilibrium solution. We carry out comprehensive empirical studies to demonstrate that DisNE achieves near-optimal system utility in significantly shorter computation times when compared with the state-of-the-art mechanisms.
|
|
10:15-10:30, Paper TuAT3.2 | |
>Heterogeneous Vehicle Routing and Teaming with Gaussian Distributed Energy Uncertainty |
|
Fu, Bo | University of University |
Smith, William | US Army TARDEC |
Rizzo, Denise M. | U.S. Army TARDEC |
Castanier, Matthew P. | US Army GVSC |
Barton, Kira | University of Michigan at Ann Arbor |
Keywords: Multi-Robot Systems, Cooperating Robots, Planning, Scheduling and Coordination
Abstract: For robot swarms operating on complex missions in an uncertain environment, it is important that the decision-making algorithm considers both heterogeneity and uncertainty. This paper presents a stochastic programming framework for the vehicle routing problem with stochastic travel energy costs and heterogeneous vehicles and tasks. We represent the heterogeneity as linear constraints, estimate the uncertain energy cost through Gaussian process regression, formulate this stochasticity as chance constraints or stochastic recourse costs, and then solve the stochastic programs using branch and cut algorithms to minimize the expected energy cost. The performance and practicality are demonstrated through extensive computational experiments and a practical test case.
|
|
10:30-10:45, Paper TuAT3.3 | |
>Long-Run Multi-Robot Planning under Uncertain Action Durations for Persistent Tasks |
|
Azevedo, Carlos | Instituto Superior Técnico - Institute for Systems and Robotics |
Lacerda, Bruno | University of Oxford |
Hawes, Nick | University of Oxford |
Lima, Pedro U. | Instituto Superior Técnico - Institute for Systems and Robotics |
Keywords: Planning, Scheduling and Coordination, Multi-Robot Systems, Task Planning
Abstract: This paper presents an approach for multi-robot long-term planning under uncertainty over the duration of actions. The proposed methodology takes advantage of generalized stochastic Petri nets with rewards (GSPNR) to model multi-robot problems. A GSPNR allows for unified modeling of action selection, uncertainty on the duration of action execution, and for goal specification through the use of transition rewards and rewards per time unit. Our approach relies on the interpretation of the GSPNR model as an equivalent embedded Markov reward automaton (MRA). We then build on a state-of-the-art method to compute the long-run average reward over MRAs, extending it to enable the extraction of the optimal policy. We provide an empirical evaluation of the proposed approach on a simulated multi-robot monitoring problem, evaluating its performance and scalability. The results show that the synthesized policy outperforms a policy obtained from an infinite horizon discounted reward formulation as well as a carefully hand-crafted policy.
|
|
10:45-11:00, Paper TuAT3.4 | |
>Algorithm for Multi-Robot Chance-Constrained Generalized Assignment Problem with Stochastic Resource Consumption |
|
Yang, Fan | Stony Brook University |
Chakraborty, Nilanjan | Stony Brook University |
Keywords: Multi-Robot Systems, Optimization and Optimal Control, Planning, Scheduling and Coordination
Abstract: We present a novel algorithm for the multi-robot generalized assignment problem (GAP) with stochastic resource consumption. In this problem, each robot has a resource (e.g., battery life) constraint and it consumes a certain amount of resource to perform a task. In practice, the resource consumed for performing a task can be uncertain. Therefore, we assume that the resource consumption is a random variable with known mean and variance. The objective is to find an assignment of the robots to tasks that maximizes the team payoff. Each task is assigned to at most one robot and the resource constraint for each robot has to be satisfied with very high probability. We formulate the problem as a chance-constrained combinatorial optimization problem and call it the chance-constrained generalized assignment problem (CC-GAP). This problem is an extension of the deterministic generalized assignment problem, which is a NP-hard problem. We design an iterative algorithm for solving CC-GAP in which each robot maximizes its own objective by solving a chance-constrained knapsack problem in an iterative manner. The approximation ratio of our algorithm is (1+alpha), assuming that the deterministic knapsack problem is solved by an alpha-approximation algorithm. We present simulation results to demonstrate that our algorithm is scalable with the number of robots and tasks.
|
|
11:00-11:15, Paper TuAT3.5 | |
>The Pluggable Distributed Resource Allocator (PDRA): A Middleware for Distributed Computing in Mobile Robotic Networks |
> Video Attachment
|
|
Rossi, Federico | Jet Propulsion Laboratory - California Institute of Technology |
Vaquero, Tiago | JPL, Caltech |
Sanchez Net, Marc | Jet Propulsion Laboratory - California Institute of Technology |
Saboia Da Silva, Maira | University at Buffalo |
Vander Hook, Joshua | NASA Jet Propulsion Laboratory |
Keywords: Multi-Robot Systems, Planning, Scheduling and Coordination, Software, Middleware and Programming Environments
Abstract: We present the Pluggable Distributed Resource Allocator (PDRA), a middleware for distributed computing in heterogeneous mobile robotic networks. PDRA enables autonomous robotic agents to share computational resources for computationally expensive tasks such as localization and path planning. It sits between an existing single-agent planner/executor and existing computational resources (e.g. ROS packages), intercepts the executor’s requests and, if needed, transparently routes them to other robots for execution. PDRA is pluggable: it can be integrated in an existing single-robot autonomy stack with minimal modifications. Task allocation decisions are performed by a mixed-integer programming algorithm, solved in a shared-world fashion, that models CPU resources, latency requirements, and multi-hop, periodic, bandwidth-limited network communications; the algorithm can minimize overall energy usage or maximize the reward for completing optional tasks. Simulation results show that PDRA can reduce energy and CPU usage by over 50% in representative multi-robot scenarios compared to a naive scheduler; runs on embedded platforms; and performs well in delay- and disruption-tolerant networks (DTNs). PDRA is available to the community under an open-source license.
|
|
11:15-11:30, Paper TuAT3.6 | |
>Learning Scheduling Policies for Multi-Robot Coordination with Graph Attention Networks |
|
Wang, Zheyuan | Georgia Institute of Technology |
Gombolay, Matthew | Georgia Institute of Technology |
Keywords: Planning, Scheduling and Coordination, Imitation Learning, Multi-Robot Systems
Abstract: Increasing interest in integrating advanced robotics within manufacturing has spurred a renewed concentration in developing real-time scheduling solutions to coordinate human-robot collaboration in this environment. Traditionally, the problem of scheduling agents to complete tasks with temporal and spatial constraints has been approached either with exact algorithms, which are computationally intractable for large-scale, dynamic coordination, or approximate methods that require domain experts to craft heuristics for each application. We seek to overcome the limitations of these conventional methods by developing a novel graph attention network-based scheduler to automatically learn features of scheduling problems towards generating high-quality solutions. To learn effective policies for combinatorial optimization problems, we combine imitation learning, which makes use of expert demonstration on small problems, with graph neural networks, in a non-parametric framework, to allow for fast, near-optimal scheduling of robot teams with various sizes, while generalizing to large, unseen problems. Experimental results showed that our network-based policy was able to find high-quality solutions for ~90% of the testing problems involving scheduling 2-5 robots and up to 100 tasks, which significantly outperforms prior state-of-the-art, approximate methods. Those results were achieved with affordable computation cost and up to 100x less computation time compared to exact solvers.
|
|
TuAT4 |
Room T4 |
Robot Computation: Hardware, Software, Datasets |
Regular session |
Chair: Fallon, Maurice | University of Oxford |
Co-Chair: Scaramuzza, Davide | University of Zurich |
|
10:00-10:15, Paper TuAT4.1 | |
>The Newer College Dataset Handheld LiDAR, Inertial and Vision with Ground Truth |
> Video Attachment
|
|
Ramezani, Milad | University of Oxford |
Wang, Yiduo | University of Oxford |
Camurri, Marco | University of Oxford |
Wisth, David | University of Oxford |
Mattamala, Matías | University of Oxford |
Fallon, Maurice | University of Oxford |
Keywords: Big Data in Robotics and Automation, Localization, Mapping
Abstract: In this paper, we present a large dataset with a variety of mobile mapping sensors collected using a handheld device carried at typical walking speeds for nearly 2.2 km around New College, Oxford as well as a series of supplementary datasets with much more aggressive motion and lighting contrast. The datasets include data from two commercially available devices - a stereoscopic-inertial camera and a multibeam 3D LiDAR, which also provides inertial measurements. Additionally, we used a tripod-mounted survey grade LiDAR scanner to capture a detailed millimeter-accurate 3D map of the test location (containing ∼290 million points). Using the map, we generated a 6 Degrees of Freedom (DoF) ground truth pose for each LiDAR scan (with approximately 3 cm accuracy) to enable better benchmarking of LiDAR and vision localisation, mapping and reconstruction systems. This ground truth is the particular novel contribution of this dataset and we believe that it will enable systematic evaluation which many similar datasets have lacked. The large dataset combines both built environments, open spaces and vegetated areas so as to test localisation and mapping systems such as vision-based navigation, visual and LiDAR SLAM, 3D LiDAR reconstruction and appearance-based place recognition, while the supplementary datasets contain very dynamic motions to introduce more challenges for visual-inertial odometry systems. The datasets are available at: ori.ox.ac.uk/datasets/newer-college-dataset
|
|
10:15-10:30, Paper TuAT4.2 | |
>Faster Than FAST: GPU-Accelerated Frontend for High-Speed VIO |
|
Nagy, Balazs | University of Zürich |
Foehn, Philipp | University of Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Aerial Systems: Perception and Autonomy, SLAM, Visual Tracking
Abstract: The recent introduction of powerful embedded graphics processing units (GPUs) has allowed for unforeseen improvements in real-time computer vision applications. It has enabled algorithms to run onboard, well above the standard video rates, yielding not only higher information processing capability, but also reduced latency. This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms in the field of visual-inertial odometry (VIO). While most steps of a VIO pipeline work on visual features, they rely on image data for detection and tracking, of which both steps are well suited for parallelization. Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency. Our work first revisits the problem of non-maxima suppression for feature detection specifically on GPUs, and proposes a solution that selects local response maxima, imposes spatial feature distribution, and extracts features simultaneously. Our second contribution introduces an enhanced FAST feature detector that applies the aforementioned non-maxima suppression method. Finally, we compare our method to other state-of-the-art CPU and GPU implementations, where we always outperform all of them in feature tracking and detection, resulting in over 1000fps throughput on an embedded Jetson TX2 platform. Additionally, we demonstrate our work integrated into a VIO pipeline achieving a metric state estimation at ~200fps.
|
|
10:30-10:45, Paper TuAT4.3 | |
>GPU Parallelization of Policy Iteration RRT# |
|
Lawson, R. Connor | Georgia Institute of Technology |
Wills, Linda | Georgia Institute of Technology |
Tsiotras, Panagiotis | Georgia Tech |
Keywords: Motion and Path Planning
Abstract: Sampling-based planning has become a de facto standard for complex robots given its superior ability to rapidly explore high-dimensional configuration spaces. Most existing optimal sampling-based planning algorithms are sequential in nature and cannot take advantage of wide parallelism available on modern computer hardware. Further, tight synchronization of exploration and exploitation phases in these algorithms limits sample throughput and planner performance. Policy Iteration RRT# (PI-RRT#) exposes fine-grained parallelism during the exploitation phase, but this parallelism has not yet been evaluated using a concrete implementation. We first present a novel GPU implementation of PI-RRT#’s exploitation phase and discuss data structure considerations to maximize parallel performance. Our implementation achieves 3–4× speedup over a serial PI-RRT# implementation for a 77.9% decrease in overall planning time on average. As a second contribution, we introduce the Batched-Extension RRT# algorithm, which loosens the synchronization present in PI-RRT# to realize independent 12.97× and 12.54× speedups under serial and parallel exploitation, respectively.
|
|
10:45-11:00, Paper TuAT4.4 | |
>ROS-Lite: ROS Framework for NoC-Based Embedded Many-Core Platform |
> Video Attachment
|
|
Azumi, Takuya | Saitama University |
Maruyama, Yuya | Osaka University |
Kato, Shinpei | Nagoya University |
Keywords: Software, Middleware and Programming Environments, Localization, Motion and Path Planning
Abstract: This paper proposes ROS-lite, a robot operating system (ROS) development framework for embedded many-core platforms based on network-on-chip (NoC) technology. Many-core platforms support the high processing capacity and low power consumption requirement of embedded systems. In this study, a self-driving software platform module is parallelized to run on many-core processors to demonstrate the practicality of embedded many-core platforms. The experimental results show that the proposed framework and the parallelized applications have met the deadline for low-speed self-driving systems.
|
|
TuAT5 |
Room T5 |
Sim-To-Real |
Regular session |
Chair: Bock, Juergen | KUKA Deutschland GmbH |
Co-Chair: Batra, Dhruv | Facebook AI Research / Georgia Tech |
|
10:00-10:15, Paper TuAT5.1 | |
>Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization |
> Video Attachment
|
|
Kaspar,, Manuel | KUKA Deutschland GmbH |
Muñoz Osorio, Juan David | Leibniz University, KUKA Germany GmbH |
Bock, Juergen | KUKA Deutschland GmbH |
Keywords: Reinforecment Learning, Transfer Learning, Compliance and Impedance Control
Abstract: In this work we show how to use the Operational Space Control framework (OSC) under joint and cartesian constraints for reinforcement learning in cartesian space. Our method is therefore able to learn fast and with adjustable degrees of freedom, while we are able to transfer policies without additional dynamics randomizations on a KUKA LBR iiwa peg in-hole task. Before learning in simulation starts, we perform a system identification for aligning the simulation environment as far as possible with the dynamics of a real robot. Adding constraints to the OSC controller allows us to learn in a safe way on the real robot or to learn a flexible, goal conditioned policy that can be easily transferred from simulation to the real robot.
|
|
10:15-10:30, Paper TuAT5.2 | |
>Learning the Sense of Touch in Simulation: A Sim-To-Real Strategy for Vision-Based Tactile Sensing |
> Video Attachment
|
|
Sferrazza, Carmelo | ETH Zurich |
Bi, Thomas | ETH Zurich |
D'Andrea, Raffaello | ETHZ |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators
Abstract: Data-driven approaches to tactile sensing aim to overcome the complexity of accurately modeling contact with soft materials. However, their widespread adoption is impaired by concerns about data efficiency and the capability to generalize when applied to various tasks. This paper focuses on both these aspects with regard to a vision-based tactile sensor, which aims to reconstruct the distribution of the three-dimensional contact forces applied on its soft surface. Accurate models for the soft materials and the camera projection, derived via state-of-the-art techniques in the respective domains, are employed to generate a dataset in simulation. A strategy is proposed to train a tailored deep neural network entirely from the simulation data. The resulting learning architecture is directly transferable across multiple tactile sensors without further training and yields accurate predictions on real data, while showing promising generalization capabilities to unseen contact conditions.
|
|
10:30-10:45, Paper TuAT5.3 | |
>Reinforced Grounded Action Transformation for Sim-To-Real Transfer |
> Video Attachment
|
|
Karnan, Haresh | The University of Texas at Austin |
Desai, Siddharth | The University of Texas at Austin |
Warnell, Garrett | U.S. Army Research Laboratory |
Hanna, Josiah | The University of Texas at Austin |
Stone, Peter | University of Texas at Austin |
Keywords: Reinforecment Learning, Transfer Learning, Neural and Fuzzy Control
Abstract: Robots can learn to do complex tasks in simulation, but often, learned behaviors fail to transfer well to the real world due to simulator imperfections (the “reality gap”). Some existing solutions to this sim-to-real problem, such as Grounded Action Transformation (GAT), use a small amount of real-world experience to minimize the reality gap by “grounding” the simulator. While very effective in certain scenarios, GAT is not robust on problems that use complex function approximation techniques to model a policy. In this paper, we introduce Reinforced Grounded Action Transformation(RGAT), a new sim-to-real technique that uses Reinforcement Learning (RL) not only to update the target policy in simulation, but also to perform the grounding step itself. This novel formulation allows for end-to-end training during the grounding step, which, compared to GAT, produces a better grounded simulator. Moreover, we show experimentally in several MuJoCo domains that our approach leads to successful transfer for policies modeled using neural networks.
|
|
10:45-11:00, Paper TuAT5.4 | |
>Adaptability Preserving Domain Decomposition for Stabilizing Sim2Real Reinforcement Learning |
> Video Attachment
|
|
Gao, Haichuan | Tsinghua University |
Yang, Zhile | Tsinghua University |
Su, Xin | Tsinghua University |
Tan, Tian | Stanford University |
Chen, Feng | Tsinghua University |
Keywords: Reinforecment Learning, Transfer Learning, Big Data in Robotics and Automation
Abstract: In sim-to-real transfer of Reinforcement Learning (RL) policies for robot tasks, Domain Randomization (DR) is a widely used technique for improving adaptability. However, in DR there is a conflict between adaptability and training stability, and heavy DR tends to result in instability or even failure in training. To relieve this conflict, we propose a new algorithm named Domain Decomposition (DD) that decomposes the randomized domain according to environments and trains a separate RL policy for each part. This decomposition stabilizes the training of each RL policy, and as we prove theoretically, the adaptability of the overall policy can be preserved. Our simulation results verify that DD really improves stability in training while preserving ideal adaptability. Further, we complete a complex real-world vision-based patrolling task using DD, which demonstrates DD’s practicality. A video is attached as supplementary material.
|
|
11:00-11:15, Paper TuAT5.5 | |
>Sim-To-Real with Domain Randomization for Tumbling Robot Control |
> Video Attachment
|
|
Schwartzwald, Amalia | CSE, UMN |
Papanikolopoulos, Nikos | University of Minnesota |
Keywords: Model Learning for Control
Abstract: Tumbling locomotion allows for small robots to traverse comparatively rough terrain, however, their motion is complex and difficult to control. Existing tumbling robot control methods involve manual control or the assumption of flat terrain. Reinforcement learning allows for the exploration and exploitation of diverse environments. By utilizing reinforcement learning with domain randomization, a robust control policy can be learned in simulation then transferred to the real world. In this paper, we demonstrate autonomous setpoint navigation with a tumbling robot prototype on flat and non-flat terrain. The flexibility of this system improves the viability of nontraditional robots for navigational tasks.
|
|
11:15-11:30, Paper TuAT5.6 | |
>Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance |
|
Kadian, Abhishek | Facebook AI Research |
Truong, Joanne | The Georgia Institute of Technology |
Gokaslan, Aaron | Brown University |
Clegg, Alexander | Georgia Institute of Technology |
Wijmans, Erik | Georgia Tech |
Lee, Stefan | Oregon State University |
Savva, Manolis | Simon Fraser University |
Chernova, Sonia | Georgia Institute of Technology |
Batra, Dhruv | Facebook AI Research / Georgia Tech |
Keywords: Visual-Based Navigation, Simulation and Animation
Abstract: Does progress in simulation translate to progress on robots? If one method outperforms another in simulation, how likely is that trend to hold in reality on a robot? We examine this question for embodied PointGoal navigation – developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity. First, we develop Habitat-PyRobot Bridge (HaPy), a library for seamless execution of identical code on simulated agents and robots – transferring simulation-trained agents to a LoCoBot platform with a one-line code change. Second, we investigate the sim2real predictivity of Habitat-Sim for PointGoal navigation. We 3D-scan a physical lab space to create a virtualized replica, and run parallel tests of 9 different models in reality and simulation. We present a new metric called Simvs-Real Correlation Coefficient (SRCC) to quantify predictivity. We find that SRCC for Habitat as used for the CVPR19 challenge is low (0.18 for the success metric), suggesting that performance differences in this simulator-based challenge do not persist after physical deployment. This gap is largely due to AI agents learning to exploit simulator imperfections – abusing collision dynamics to ‘slide’ along walls , leading to shortcuts through otherwise non-navigable space. Naturally, such exploits do not work in the real world. Our experiments show that it is possible to tune simulation parameters to improve sim2real predictivity (e.g. improving SRCCSucc from 0.18 to 0.844) – increasing confidence that in-simulation comparisons will translate to deployed systems in reality.
|
|
TuAT7 |
Room T7 |
Localization |
Regular session |
Chair: Kim, Ayoung | Korea Advanced Institute of Science Technology |
Co-Chair: Liu, Ming | Hong Kong University of Science and Technology |
|
10:00-10:15, Paper TuAT7.1 | |
>Pedestrian Motion Tracking by Using Inertial Sensors on the Smartphone |
|
Wang, Yingying | The Chinese University of Hong Kong |
Cheng, Hu | The Chinese University of Hong Kong |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Localization, Human and Humanoid Motion Analysis and Synthesis
Abstract: Inertial Measurement Unit (IMU) has long been a dream for stable and reliable motion estimation, especially in indoor environments where GPS strength limits. In this paper, we propose a novel method for position and orientation estimation of a moving object only from a sequence of IMU signals collected from the phone. Our main observation is that human motion is monotonous and periodic. We adopt the Extended Kalman Filter and use the learning-based method to dynamically update the measurement noise of the filter. Our pedestrian motion tracking system intends to accurately estimate planar position, velocity, heading direction without restricting the phone’s daily use. The method is not only tested on the self-collected signals, but also provides accurate position and velocity estimations on the public RIDI dataset, i.e., the absolute transmit error is 1.28m for a 59-second sequence.
|
|
10:15-10:30, Paper TuAT7.2 | |
>A Bayesian Approach for Gas Source Localization in Large Indoor Environments |
> Video Attachment
|
|
Prabowo, Yaqub | Institut Teknologi Bandung |
Ranasinghe, Ravindra | University of Technology Sydney |
Dissanayake, Gamini | University of Technology Sydney |
Riyanto, Bambang | Institut Teknologi Bandung |
Yuliarto, Brian | Institut Teknologi Bandung |
Keywords: Localization, Robotics in Hazardous Fields
Abstract: The main contribution of this paper is a probabilistic estimator that assists a mobile robot to locate a gas source in an indoor environment. The scenario is that a robot equipped with a gas sensor enters a building after the gas is released due to a leak or explosion. The problem is discretized by dividing the environment into a set of regions and time into a set of time intervals. Likelihood functions describing the probability of obtaining a certain gas concentration measurement at a given location at a given time interval are assembled using data generated with GADEN, a three-dimensional gas dispersion simulator [1]. Given a measurement of the gas concentration is available, Bayes's rule is used to compute the joint probability density describing the location of the gas source and the time at which it started spreading. To illustrate the estimation process, a relatively simple motion planner that directs the robot towards the most likely gas source location using a cost function based on the marginal probability of the gas source location is used. The motion plan is periodically revised to reflect the latest posterior probability density. Simulation experiments in a large air-conditioned building with turbulence and wind are presented to demonstrate the effectiveness of the proposed technique.
|
|
10:30-10:45, Paper TuAT7.3 | |
>Towards Real-Time Non-Gaussian SLAM for Underdetermined Navigation |
|
Fourie, Dehann | Massachusetts Institute of Technology and Woods Hole Oceanograph |
Rypkema, Nicholas Rahardiyan | Massachusetts Institute of Technology |
Claassens, Samuel David | Semisorted Technologies |
Vaz Teixeira, Pedro | Massachusetts Institute of Technology |
Leonard, John | MIT |
Fischell, Erin Marie | Woods Hole Oceanographic Institution |
Keywords: SLAM, Range Sensing, Marine Robotics
Abstract: This paper presents a method for processing sparse, non-Gaussian multimodal data in a simultaneous localization and mapping (SLAM) framework using factor graphs. Our approach demonstrates the feasibility of using a sum-product inference strategy to recover functional belief marginals from highly non-Gaussian situations, relaxing the prolific unimodal Gaussian assumption. The method is more focused than conventional multi-hypothesis approaches, but still captures dominant modes via multi-modality. The proposed algorithm exists in a trade space that spans the anticipated uncertainty of measurement data, task-specific performance, sensor quality, and computational cost. This work leverages several major algorithm design constructs, including clique recycling, to put an upper bound on the allowable computational expense -- a major challenge in non-parametric methods. To better demonstrate robustness, experimental results show the feasibility of the method on at least two of four major sources of non-Gaussian behavior: i) the first introduces a canonical range-only problem which is always underdetermined although composed exclusively from Gaussian measurements; ii) a real-world AUV dataset, demonstrating how ambiguous acoustic correlator measurements are directly incorporated into a non-Gaussian SLAM solution, while using dead reckon tethering to overcome short term computational requirements.
|
|
10:45-11:00, Paper TuAT7.4 | |
>An Augmented Reality Spatial Referencing System for Mobile Robots |
|
Chacko, Sonia | NYU Tandon School of Engineering |
Granado, Armando | New York University Tandon School of Engineering |
Rajkumar, Ashwin | New York University Tandon School of Engineering |
Kapila, Vikram | NYU Tandon School of Engineering |
Keywords: Virtual Reality and Interfaces, Task Planning, Service Robotics
Abstract: The deployment of a mobile service robot in domestic settings is a challenging task due to the dynamic and unstructured nature of such environments. Successful operation of the robot requires continuous human supervision to update its spatial knowledge about the dynamic environment. Thus, it is essential to develop a human-robot interaction (HRI) strategy that is suitable for novice end users to effortlessly provide task-specific spatial information to the robot. Although several approaches have been developed for this purpose, most of them are not feasible or convenient for use in domestic environments. In response, we have developed an augmented reality (AR) spatial referencing system (SRS), which allows a non-expert user to tag any specific locations on a physical surface to allocate tasks to be performed by the robot at those locations. Specifically, in the AR-SRS, the user provides a spatial reference by creating an AR virtual object with a semantic label. The real-world location of the user-created virtual object is estimated and stored as spatial data along with the user-specified semantic label. We present three different approaches to establish the correspondence between the user-created virtual object locations and the real-world coordinates on an a priori static map of the service area available to the robot. The performance of each approach is evaluated and reported. We also present use-case scenarios to demonstrate potential applications of the AR-SRS for mobile service robots.
|
|
11:00-11:15, Paper TuAT7.5 | |
>GMMLoc: Structure Consistent Visual Localization with Gaussian Mixture Models |
> Video Attachment
|
|
Huang, Huaiyang | The Hong Kong University of Science and Technology |
Ye, Haoyang | The Hong Kong University of Science and Technology |
Sun, Yuxiang | Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology |
Keywords: Automation Technologies for Smart Cities, Localization, Visual-Based Navigation
Abstract: Incorporating prior structure information into the visual state estimation could generally improve the localization performance. In this letter, we aim to address the paradox between accuracy and efficiency in coupling visual factors with structure constraints. To this end, we present a cross-modality method that tracks a camera in a prior map modelled by the Gaussian Mixture Model (GMM). With the pose estimated by the front-end initially, the local visual observations and map components are associated efficiently, and the visual structure from the triangulation is refined simultaneously. By introducing the hybrid structure factors into the joint optimization, the camera poses are bundle-adjusted with the local visual structure. By evaluating our complete system, namely GMMLoc, on the public dataset, we show how our system can provide a centimeter-level localization accuracy with only trivial computational overhead. In addition, the comparative studies with the state-of-the-art vision-dominant state estimators demonstrate the competitive performance of our method.
|
|
11:15-11:30, Paper TuAT7.6 | |
>HDMI-Loc: Exploiting High Definition Map Image for Precise Localization Via Bitwise Particle Filter |
> Video Attachment
|
|
Jeong, Jinyong | KAIST |
Cho, Younggun | KAIST |
Kim, Ayoung | Korea Advanced Institute of Science Technology |
Keywords: Localization, Autonomous Vehicle Navigation, Visual-Based Navigation
Abstract: In this paper, we propose a method for accurately estimating the 6-Degree Of Freedom (DOF) pose in an urban environment when a High Definition (HD) map is available. An HD map expresses 3D geometric data with semantic information in a compressed format and thus is more memory-efficient than point cloud maps. The small capacity of HD maps can be a significant advantage for autonomous vehicles in terms of map storage and updates within a large urban area. Unfortunately, existing approaches failed to sufficiently exploit HD maps by only estimating partial pose. In this study, we present a full 6-DOF localization against an HD map using an onboard stereo camera with semantic information from roads. We introduce an 8-bit representation for road information, which allow for effective bitwise operation when matching between query data and the HD map. For the pose estimation, we leverage a particle filter followed by a full 6-DOF pose optimization. Our experimental results show a median error of approximately 0:3 m in the lateral and longitudinal directions for a drive of approximately 11 km. These results can be used by autonomous vehicles to correct the global position without using Global Positioning System (GPS) data in highly complex urban environments. The median operation speed is approximately 60 msec supporting 10 Hz.
|
|
11:15-11:30, Paper TuAT7.7 | |
>Visual SLAM with Drift-Free Rotation Estimation in Manhattan World |
|
Liu, Jiacheng | Tsinghua University |
Meng, Ziyang | Tsinghua University |
Keywords: Localization, SLAM, Visual-Based Navigation
Abstract: This paper presents an efficient and accurate simultaneous localization and mapping (SLAM) system in man-made environments. The Manhattan world assumption is imposed, with which the global orientation is obtained. The drift-free rotational motion estimation is derived from the structural regularities using line features. In particular, a two-stage vanishing points (VPs) estimation method is developed, which consists of a short-term tracking module to track the clustered line features and a long-term searching module to generate abundant sets of VPs candidates and retrieve the optimal one. A least square problem is constructed and solved to provide refined VPs with the clusters of structural line features every frame. We make full use of the absolute orientation estimation to benefit the whole SLAM process. In particular, we utilize the absolute orientation estimation to increase the localization accuracy in the front end, and formulate a linear batch camera pose refinement problem with the known rotations to improve the real time performance in the back end. Experiments on both synthesized and real-world scenes reveal results with high-precision in the real time camera pose estimation process and high-speed in pose graph optimization process compared with the existing state-of-the-art methods.
|
|
TuAT8 |
Room T8 |
Localization: Other Modalities I |
Regular session |
Chair: Martinez, Julieta | Uber |
Co-Chair: Wei, Bo | Northumbria University |
|
10:00-10:15, Paper TuAT8.1 | |
>Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars |
> Video Attachment
|
|
Martinez, Julieta | Uber |
Doubov, Sasha | University of Waterloo |
Fan, Jack | Uber ATG |
Bârsan, Ioan Andrei | Uber ATG / University of Toronto |
Wang, Shenlong | University of Toronto |
Mattyus, Gellert | Uber ATG |
Urtasun, Raquel | University of Toronto |
Keywords: Big Data in Robotics and Automation, Localization, Multi-Modal Perception
Abstract: We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles (SDVs). Towards this goal, we introduce Pit30M, a new image and LiDAR dataset with over 30 million frames, which is 10 to 100 times larger than those used in previous work. Pit30M is captured under diverse conditions (i.e., season, weather, time of the day, traffic), and provides accurate localization ground truth. We also automatically annotate our dataset with historical weather and astronomical data, as well as with image and LiDAR semantic segmentation (as a proxy measure for occlusion). We benchmark multiple existing methods for image and LiDAR retrieval and, in the process, introduce a simple, yet effective convolutional network-based LiDAR retrieval method that is competitive with the state-of-the-art. Our work provides, for the first, time, a benchmark for sub-metre retrieval-based localization at city scale.
|
|
10:15-10:30, Paper TuAT8.2 | |
>SolarSLAM: Battery-Free Loop Closure for Indoor Localisation |
|
Wei, Bo | Northumbria University |
Xu, Weitao | City University of Hong Kong |
Luo, Chengwen | Shenzhen University |
Zoppi, Guillaume | Northumbria University |
Ma, Dong | University of Cambridge |
Wang, Sen | Edinburgh Centre for Robotics, Heriot-Watt University |
Keywords: Localization, SLAM, Sensor Networks
Abstract: In this paper, we propose SolarSLAM, a battery-free loop closure method for indoor localisation. Inertial Measurement Unit (IMU) based indoor localisation method has been widely used due to its ubiquity in mobile devices, such as mobile phones, smartwatches and wearable bands. However, it suffers from the unavoidable long term drift. To mitigate the localisation error, many loop closure solutions have been proposed using sophisticated sensors, such as cameras, laser, etc. Despite achieving high-precision localisation performance, these sensors consume a huge amount of energy. Different from those solutions, the proposed SolarSLAM takes advantage of an energy harvesting solar cell as a sensor and achieves effective battery-free loop closure method. The proposed method suggests the key-point dynamic time warping for detecting loops and uses robust simultaneous localisation and mapping (SLAM) as the optimiser to remove falsely recognised loop closures. Extensive evaluations in the real environments have been conducted to demonstrate the advantageous photocurrent characteristics for indoor localisation and good localisation accuracy of the proposed method.
|
|
10:30-10:45, Paper TuAT8.3 | |
>Robot-To-Robot Relative Pose Estimation Based on Semidefinite Relaxation Optimization |
|
Li, Ming | Chinese University of Hong Kong, Shenzhen |
Liang, Guanqi | The Chinese University of Hong Kong, Shenzhen |
Luo, Haobo | The Chinese University of Hong Kong, Shenzhen |
Qian, Huihuan | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Lam, Tin Lun | The Chinese University of Hong Kong, Shenzhen |
Keywords: Localization, Multi-Robot Systems
Abstract: In this paper, the 2D robot-to-robot relative pose (position and orientation) estimation problem based on egomotion and noisy distance measurements is considered. We address this problem using the optimization based method. In particular, we start from a state-of-the-art method named square distances weighted least square (SD-WLS), and reformulate it as a non-convex quadratically constrained quadratic programming (QCQP) problem. To handle its non-convex nature, a SDP relaxation optimization based method is proposed, and we prove that the relaxation is theoretically tight when the measurements are free from noise or just corrupted by small noise. Further, to obtain the optimal solution of the relative pose estimation problem in the sense of maximum likelihood estimation (MLE), a theoretically optimal WLS method is developed to refine the estimate from the SDP optimization. Extensive simulations and a certain amount of real data processing results are presented for validating the performance of the the proposed algorithm and comparing its accuracy to the existing approaches.
|
|
10:45-11:00, Paper TuAT8.4 | |
>A Model-Based Approach to Acoustic Reflector Localization with a Robotic Platform |
> Video Attachment
|
|
Saqib, Usama | Aalborg University |
Jensen, Jesper Rindom | Aalborg University |
Keywords: Localization, Robot Audition, Mapping
Abstract: Constructing a spatial map of an indoor environment, e.g., a typical office environment with glass surfaces, is a difficult and challenging task. Current state-of-the-art, e.g., camera- and laser-based approaches are unsuitable for detecting transparent surfaces. Hence, the spatial map generated with these approaches are often inaccurate. In this paper, a method that utilizes echolocation with sound in the audible frequency range is proposed to robustly localize the position of an acoustic reflector, e.g., walls, glass surfaces etc., which could be used to construct a spatial map of an indoor environment as the robot moves. The proposed method estimate the acoustic reflector's position, using only a single microphone and a loudspeaker that are present on many socially assistive robot platforms such as the NAO robot. The experimental results show that the proposed method could robustly detect an acoustic reflector up to a distance of 1.5~m in more than 60 % of the trials and works efficiently even under low SNRs. To test the proposed method, a proof-of-concept robotic platform was build to construct a spatial map of an indoor environment.
|
|
11:00-11:15, Paper TuAT8.5 | |
>TP-TIO: A Robust Thermal-Inertial Odometry with Deep ThermalPoint |
> Video Attachment
|
|
Zhao, Shibo | Carnegie Mellon University |
Wang, Peng | Faculty of Robot Science and Engineering, Northeastern University |
Zhang, Hengrui | Carnegie Mellon University |
Fang, Zheng | Northeastern University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Localization, Sensor Fusion, SLAM
Abstract: To achieve robust motion estimation in GPS-denied and visually degraded environments such as dust, fog, and smoke, thermal odometry has been an attraction in the robotics community. However, most thermal odometry methods are purely based on classical feature extractors on re-scaled thermal images, which is difficult to establish robust correspondences in successive frames due to sudden photometric changes and large thermal noise. To overcome the limitations of feature-based thermal odometry, we propose ThermalPoint, a lightweight feature detection network specifically tailored for producing keypoints on thermal images, providing notable anti-noise improvements compared to other state-of-the-art methods. Also, we combine ThermalPoint with a novel radiometric feature tracking method, which directly makes use of full radiometric data and establishes reliable correspondences between sequential frames. Finally, taking advantage of an optimization-based visual-inertial framework, a deep feature-based thermal-inertial odometry (TP-TIO) estimation framework is proposed and evaluated thoroughly from thermal feature tracking to pose estimation in various visually degraded environments. Experiments show that our method outperforms the state-of-the-art visual and laser odometry methods in smoke-filled environments and achieves competitive accuracy in normal environments.
|
|
11:15-11:30, Paper TuAT8.6 | |
>Versatile 3D Multi-Sensor Fusion for Lightweight 2D Localization |
|
Geneva, Patrick | University of Delaware |
Merrill, Nathaniel | University of Delaware |
Yang, Yulin | University of Delaware |
Chen, Chuchu | University of Delaware |
Lee, Woosik | University of Delaware |
Huang, Guoquan (Paul) | University of Delaware |
Keywords: Localization, Sensor Fusion, Calibration and Identification
Abstract: Aiming for a lightweight and robust localization solution for low-cost, low-power autonomous robot platforms, such as educational or industrial ground vehicles, under challenging conditions (e.g., poor sensor calibration, low lighting and dynamic objects), we propose a two-stage localization system which incorporates both offline prior map building and online multi-modal localization. In particular, we develop an occupancy grid mapping system with probabilistic odometry fusion, accurate scan-to-submap covariance modeling, and accelerated loop-closure detection, which is further aided by 2D line features that exploit the environmental structural constraints. We then develop a versatile EKF-based online localization system which optimally (up to linearization) fuses multi-modal information provided by the pre-built occupancy grid map, IMU, odometry, and 2D LiDAR measurements with low computational requirements. Importantly, spatiotemporal calibration between these sensors are also estimated online to account for poor initial calibration and make the system more "plug-and-play", which improves both the accuracy and flexibility of the proposed multi-sensor fusion framework. In our experiments, our mapping system is shown to be more accurate than the state-of-the-art Google Cartographer. Then, extensive Monte-Carlo simulations are performed to verify both accuracy, consistency and efficiency of the proposed map-based localization system with full spatiotemporal calibration. We also validate the complete system (prior map building and online localization) with building-scale real-world datasets.
|
|
TuAT9 |
Room T9 |
Localization: Other Modalities II |
Regular session |
Chair: Westerlund, Tomi | University of Turku |
Co-Chair: Pang, Shuo | Embry-Riddle Aeronautical University |
|
10:00-10:15, Paper TuAT9.1 | |
>UWB-Based System for UAV Localization in GNSS-Denied Environments: Characterization and Dataset |
|
Peña Queralta, Jorge | University of Turku |
Martinez Almansa, Carmen | University of Turku |
Schiano, Fabrizio | Ecole Polytechnique Federale De Lausanne, EPFL |
Floreano, Dario | Ecole Polytechnique Federal, Lausanne |
Westerlund, Tomi | University of Turku |
Keywords: Localization, Aerial Systems: Perception and Autonomy, Search and Rescue Robots
Abstract: Small unmanned aerial vehicles (UAV) have penetrated multiple domains over the past years. In GNSS-denied or indoor environments, aerial robots require a robust and stable localization system, often with external feedback, in order to fly safely. Motion capture systems are typically utilized indoors when accurate localization is needed. However, these systems are expensive and most require a fixed setup. In this paper, we study and characterize an ultra-wideband (UWB) system for navigation and localization of aerial robots indoors based on Decawave's DWM1001 UWB node. The system is portable, inexpensive and can be battery powered in its totality. We show the viability of this system for autonomous flight of UAVs, and provide open-source methods and data that enable its widespread application even with movable anchor systems. We characterize the accuracy based on the position of the UAV with respect to the anchors, its altitude and speed, and the distribution of the anchors in space. Finally, we analyze the accuracy of the self-calibration of the anchors' positions.
|
|
10:15-10:30, Paper TuAT9.2 | |
>Ultra-Wideband Aided UAV Positioning Using Incremental Smoothing with Ranges and Multilateration |
> Video Attachment
|
|
Kang, Jungwon | York University |
Park, Kunwoo | York University |
Arjmandi, Zahra | York University |
Sohn, Gunho | York University |
Shahbazi, Mozhdeh | Centre De Géomatique Du Québec |
Menard, Patrick | CGQ |
Keywords: Localization, Aerial Systems: Applications, Sensor Fusion
Abstract: In this paper, we present a novel smoothing approach for ultra-wideband (UWB) aided unmanned aerial vehicle (UAV) positioning. Existing works based on smoothing or filtering estimate 3D position of UAV by updating a solution for each single 1D low-dimensional UWB range measurement. However, a low-dimensional single range measurement merely acts as a weak constraint in a solution space for UAV position estimation, and thus it can often lead to incorrect estimation in unfavorable conditions. Inspired by the idea that the multilateration outcome can be utilized as a measurement providing a strong constraint, we utilize two types of UWB-based measurements: (i) each single 1D range as a high-rate measurement with a weak constraint, and (ii) multilateration outcome as a low-rate measurement with a strong constraint. We propose an incremental smoothing-based method that seamlessly integrates these two types of UWB-based measurements and inertial measurement into a unified factor graph framework. Through experiments under a variety of scenarios, we demonstrate the effectiveness of the proposed method.
|
|
10:30-10:45, Paper TuAT9.3 | |
>BRM Localization: UAV Localization in GNSS-Denied Environments Based on Matching of Numerical Map and UAV Images |
|
Choi, Junho | KAIST |
Myung, Hyun | KAIST (Korea Adv. Inst. Sci. & Tech.) |
Keywords: Localization, Visual-Based Navigation, Autonomous Vehicle Navigation
Abstract: Localization is one of the most important technologies needed to use Unmanned Aerial Vehicles (UAVs) in actual fields. Currently, most UAVs use GNSS to estimate their position. Recently, there have been attacks that target the weaknesses of UAVs that use GNSS, such as interrupting GNSS signal to crash the UAVs or sending fake GNSS signals to hijack the UAVs. To avoid this kind of situation, this paper proposes an algorithm that deals with the localization problem of the UAV in GNSS-denied environments. We propose a localization method, named as BRM (Building Ratio Map based) localization, for a UAV by matching an existing numerical map with UAV images. The building area is extracted from the UAV images. The ratio of buildings that occupy in the corresponding image frame is calculated and matched with the building information on the numerical map. The position estimation is started in the range of several km2 area, so that the position estimation can be performed without knowing the exact initial coordinate. Only freely available maps are used for training data set and matching the ground truth. Finally, we get real UAV images, IMU data, and GNSS data from UAV flight to show that the proposed method can achieve better performance than the conventional methods.
|
|
10:45-11:00, Paper TuAT9.4 | |
>Inertial Velocity Estimation for Indoor Navigation through Magnetic Gradient-Based EKF and LSTM Learning Model |
|
Zmitri, Makia | CNRS/GIPSA-Lab |
Fourati, Hassen | GIPSA-Lab / University of Grenoble |
Prieur, Christophe | CNRS |
Keywords: Localization, Sensor Fusion, AI-Based Methods
Abstract: This paper presents a novel method to improve the inertial velocity estimation of a mobile body, for indoor navigation, using solely raw data from a triad of inertial sensors (accelerometer and gyroscope), as well as a determined arrangement of magnetometers array. The key idea of the method is the use of deep neural networks to dynamically tune the measurement covariance matrix of an Extended Kalman Filter (EKF). To do so, a Long Short-Term Memory (LSTM) model is derived to determine a pseudo-measurement of inertial velocity of the target under investigation. This measurement is used afterwords to dynamically adapt the measurement noise parameters of a magnetic field gradient-based EKF. As it was shown in the literature, there is a strong relation between inertial velocity and magnetic field gradient, which is highlighted with the proposed approach in this paper. Its performance is tested on the Openshoe dataset, and the obtained results compete with the INS/ZUPT approach, that unlike the proposed solution, can only be applied on foot-mounted applications and is not adequate to all walking paces.
|
|
11:00-11:15, Paper TuAT9.5 | |
>An Implementation of the Adaptive Neuro-Fuzzy Inference System (ANFIS) for Odor Source Localization |
|
Wang, Lingxiao | Embry-Riddle Aeronautical University |
Pang, Shuo | Embry-Riddle Aeronautical University |
Keywords: Neural and Fuzzy Control, AI-Based Methods, Autonomous Vehicle Navigation
Abstract: In this paper, we investigate the viability of implementing machine learning (ML) algorithms to solve the odor source localization (OSL) problem. The primary objective is to obtain an ML model that guides and navigates a mobile robot to find an odor source without explicating searching algorithms. To achieve this goal, the model of an adaptive neuro-fuzzy inference system (ANFIS) is employed to generate the olfactory-based navigation strategy. To train the ANFIS model, multiple training data sets are acquired by applying two traditional olfactory-based navigation methods, namely moth-inspired and Bayesian-inference methods, in hundreds of simulated OSL tests with different environments. After training with the hybrid-learning algorithm, the ANFIS model is validated in multiple OSL tests with varying searching conditions. Experiment results show that the ANFIS model can imitate other olfactory-based navigation methods and correctly locate the odor source. Besides, by training it with the fused training data set, the ANFIS model is better than two traditional navigation methods in terms of the averaged searching time.
|
|
TuAT10 |
Room T10 |
Visual Localization I |
Regular session |
Chair: Huang, Guoquan (Paul) | University of Delaware |
Co-Chair: Stiller, Christoph | Karlsruhe Institute of Technology |
|
10:00-10:15, Paper TuAT10.1 | |
>Visual-Inertial-Wheel Odometry with Online Calibration |
> Video Attachment
|
|
Lee, Woosik | University of Delaware |
Eckenhoff, Kevin | University of Delaware |
Yang, Yulin | University of Delaware |
Geneva, Patrick | University of Delaware |
Huang, Guoquan (Paul) | University of Delaware |
Keywords: Localization, Calibration and Identification, Wheeled Robots
Abstract: In this paper, we introduce a novel visual-inertial-wheel odometry (VIWO) system for ground vehicles, which efficiently fuses multi-modal visual, inertial and 2D wheel odometry measurements in a sliding-window filtering fashion. As multi-sensor fusion requires both intrinsic and extrinsic (spatiotemproal) calibration parameters which may vary over time during terrain navigation, we propose to perform VIWO along with online sensor calibration of wheel encoders' intrinsic and extrinsic parameters. To this end, we analytically derive the 2D wheel odometry measurement model from the raw wheel encoders' readings and optimally fuse this 2D relative motion information with 3D visual-inertial measurements. Additionally, an observability analysis is performed for the linearized VIWO system, which identifies five commonly-seen degenerate motions for wheel calibration parameters. The proposed system has been validated extensively in both Monte-Carlo simulations and real-world experiments in large-scale urban driving scenarios.
|
|
10:15-10:30, Paper TuAT10.2 | |
>Active Perception for Outdoor Localisation with an Omnidirectional Camera |
> Video Attachment
|
|
Jayasuriya, Maleen | University of Technology Sydney |
Ranasinghe, Ravindra | University of Technology Sydney |
Dissanayake, Gamini | University of Technology Sydney |
Keywords: Localization, Omnidirectional Vision, Autonomous Vehicle Navigation
Abstract: This paper presents a novel localisation framework based on an omnidirectional camera, targeted at outdoor urban environments. Bearing only information to persistent and easily observable high-level semantic landmarks (such as lamp-posts, street-signs and trees) are perceived using a Convolutional Neural Network (CNN). The framework utilises an information theoretic strategy to decide the best viewpoint to serve as an input to the CNN instead of the full 360 degree coverage offered by an omnidirectional camera, in order to leverage the advantage of having a higher field of view without compromising on performance. Environmental landmark observations are supplemented with observations to ground surface boundaries corresponding to high-level features such as manhole covers, pavement edges and lane markings extracted from a second CNN. Localisation is carried out in an Extended Kalman Filter (EKF) framework using a sparse 2D map of the environmental landmarks and Vector Distance Transform (VDT) based representation of the ground surface boundaries. This is in contrast to traditional vision only localisation systems that have to carry out Visual Odometry (VO) or Simultaneous Localisation and Mapping (SLAM), since low level features (such as SIFT, SURF, ORB) do not persist over long time frames due to radical appearance changes (illumination, occlusions etc) and dynamic objects. As the proposed framework relies on high-level persistent semantic features of the environment, it offers an opportunity to carry out localisation on a prebuilt map, which is significantly more resource efficient and robust. Experiments using a Personal Mobility Device (PMD) driven in a representative urban environment are presented to demonstrate and evaluate the effectiveness of the proposed localiser against relevant state of the art techniques.
|
|
10:30-10:45, Paper TuAT10.3 | |
>Ground Texture Based Localization: Do We Need to Detect Keypoints? |
|
Schmid, Jan Fabian | Robert Bosch GmbH; Goethe University Frankfurt |
Simon, Stephan F. | Robert Bosch GmbH |
Mester, Rudolf | NTNU Trondheim |
Keywords: Localization, Mapping, SLAM
Abstract: Localization using ground texture images recorded with a downward-facing camera is a promising approach to achieve reliable high-accuracy vehicle positioning. A common way to accomplish the task is to focus on prominent features of the ground texture such as stones and cracks. Our results indicate that with an approximately known camera pose it is sufficient to use arbitrary ground regions, i.e. extracting features at random positions without significant loss in localization performance. Additionally, we propose a real-time capable CPU-only localization method based on this idea, and suggest possible improvements for further research.
|
|
10:45-11:00, Paper TuAT10.4 | |
>Vision Global Localization with Semantic Segmentation and Interest Feature Points |
|
Li, Kai | Alibaba Group |
Zhang, Xudong | OPPO |
Li, Kun | Alibaba Group |
Zhang, Shuo | Alibaba Group |
Keywords: Localization, Computer Vision for Other Robotic Applications, Visual Tracking
Abstract: In this work, we present a vision-only global localization architecture for autonomous vehicle applications, and achieves centimeter-level accuracy and high robustness in various scenarios. We first apply pixel-wise segmentation to front-view mono camera and extract the semantic features, e.g. pole-like objects, lane markings, and curbs, which are robust to light condition, viewing angles and seasonal changes. For the scenes without enough semantic information, we extract interest feature points on static background, such as ground surface and buildings, assisted by our semantic segmentation. We create the visual global map with semantic features map layer extracted from LiDAR point-cloud semantic map and the point features map layer built with a fixed-pose structure from motion. A lumped Levenberg-Marquardt optimization solver is then applied to minimize to cost from two types of observations. We further evaluate the accuracy and robustness of our method with road tests on Alibaba's autonomous delivery vehicles in multiple scenarios as well as a KAIST urban dataset.
|
|
11:00-11:15, Paper TuAT10.5 | |
>Monocular Camera Localization in Prior LiDAR Maps with 2D-3D Line Correspondences |
> Video Attachment
|
|
Yu, Huai | Carnegie Mellon University; Wuhan University |
Zhen, Weikun | Carnegie Mellon University |
Yang, Wen | Wuhan University |
Zhang, Ji | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Localization, Sensor Fusion
Abstract: Light-weight camera localization in existing maps is essential for vision-based navigation. Currently, visual and visual-inertial odometry (VO&VIO) techniques are well-developed for state estimation but with inevitable accumulated drifts and pose jumps upon loop closure. To overcome these problems, we propose an efficient monocular camera localization method in prior LiDAR maps using direct 2D-3D line correspondences. To handle the appearance differences and modality gaps between LiDAR point clouds and images, geometric 3D lines are extracted offline from LiDAR maps while robust 2D lines are extracted online from video sequences. With the pose prediction from VIO, we can efficiently obtain coarse 2D-3D line correspondences. Then the camera poses and 2D-3D correspondences are iteratively optimized by minimizing the projection error of correspondences and rejecting outliers. Experimental results on the EurocMav dataset and our collected dataset demonstrate that the proposed method can efficiently estimate camera poses without accumulated drifts or pose jumps in structured environments.
|
|
11:15-11:30, Paper TuAT10.6 | |
>Monocular Localization in HD Maps by Combining Semantic Segmentation and Distance Transform |
> Video Attachment
|
|
Pauls, Jan-Hendrik | Karlsruhe Institute of Technology (KIT) |
Petek, Kürsat | Karlsruher Institut Für Technologie (KIT) |
Poggenhans, Fabian | FZI Research Center for Information Technology |
Stiller, Christoph | Karlsruhe Institute of Technology |
Keywords: Localization, SLAM, Intelligent Transportation Systems
Abstract: Easy, yet robust long-term localization is still an open topic in research. Existing approaches require either dense maps, expensive sensors, specialized map features or proprietary detectors. We propose using semantic segmentation on a monocular camera to localize directly in a HD map as used for automated driving. This combines lightweight, yet powerful HD maps with the simplicity of monocular vision and the flexibility of neural networks. The major challenges arising from this combination are data association and robustness against misdetections. Association is solved efficiently by applying distance transform on binary per-class images. This provides not only a fast lookup table for a smooth gradient as needed for pose-graph optimization, but also dynamic association by default. A sliding-window pose graph optimization combines single image detections with vehicle odometry, smoothing results and helping overcome even misclassifications in consecutive frames. Evaluation against a highly accurate 6D visual localization shows that our approach can achieve accuracy levels as required for automated driving, being one of the most lightweight and flexible methods to do so.
|
|
TuAT11 |
Room T11 |
Visual Localization II |
Regular session |
Chair: Forbes, James Richard | McGill University |
Co-Chair: Stachniss, Cyrill | University of Bonn |
|
10:00-10:15, Paper TuAT11.1 | |
>Learning an Overlap-Based Observation Model for 3D LiDAR Localization |
|
Chen, Xieyuanli | University of Bonn |
Läbe, Thomas | University of Bonn |
Nardi, Lorenzo | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Localization, SLAM
Abstract: Localization is a crucial capability for mobile robots and autonomous cars. In this paper, we address learn- ing an observation model for Monte-Carlo localization using 3D LiDAR data. We propose a novel, neural network-based observation model that computes the expected overlap of two 3D LiDAR scans. The model predicts the overlap and yaw angle offset between the current sensor reading and virtual frames generated from a pre-built map. We integrate this observation model into a Monte-Carlo localization framework and tested it on urban datasets collected with a car in different seasons. The experiments presented in this paper illustrate that our method can reliably localize a vehicle in typical urban environments. We furthermore provide comparisons to a beam- endpoint and a histogram-based method indicating a superior global localization performance of our method with fewer particles.
|
|
10:15-10:30, Paper TuAT11.2 | |
>Global Localization Over 2D Floor Plans with Free-Space Density Based on Depth Information |
|
Maffei, Renan | Federal University of Rio Grande Do Sul |
Pittol, Diego | Federal University of Rio Grande Do Sul |
Mantelli, Mathias Fassini | Federal University of Rio Grande Do Sul |
Prestes, Edson | UFRGS |
Kolberg, Mariana | UFRGS |
Keywords: Localization, RGB-D Perception
Abstract: Many applications with mobile robots require self-localization in indoor maps. While such maps can be previously generated by SLAM strategies, there are various localization approaches that use 2D floor plans as reference input. In this paper, we present a localization strategy using floor plan as map, which is based on spatial density information computed from dense depth data of RGB-D cameras. We propose an interval-based model, called Interval Free-Space Density, that bounds the uncertainty of observations and minimizes the effects of movable objects in the environment. Our model was applied in a Monte Carlo Localization strategy and compared with traditional observation models. The results of experiments showed the robustness of the proposed method in single-camera and multi-camera experiments in home environments.
|
|
10:30-10:45, Paper TuAT11.3 | |
>A Point Cloud Registration Pipeline Using Gaussian Process Regression for Bathymetric SLAM |
|
Hitchcox, Thomas | McGill University |
Forbes, James Richard | McGill University |
Keywords: SLAM, Marine Robotics, Visual-Based Navigation
Abstract: Point cloud registration is a means of achieving loop closure correction within a simultaneous localization and mapping (SLAM) algorithm. Data association is a critical component in point cloud registration, and can be very challenging in feature-depleted environments such as seabed. This paper presents a point cloud registration pipeline for performing loop closure correction in feature-depleted subsea environments using data collected from an optical scanner. The pipeline uses Gaussian process regression to extract keypoint sets, and a weighted network alignment algorithm to propose point correspondences. A variant of the iterative closest point (ICP) registration algorithm is used to perform fine alignment, with point correspondences informed by the mappings determined following the network alignment step. The developed registration pipeline is deployed with success on a challenging section of field data containing topography that cannot be resolved using conventional imaging sonar.
|
|
10:45-11:00, Paper TuAT11.4 | |
>A Robust Multi-Stereo Visual-Inertial Odometry Pipeline |
|
Jaekel, Joshua | Carnegie Mellon University |
Mangelson, Joshua | Brigham Young University |
Scherer, Sebastian | Carnegie Mellon University |
Kaess, Michael | Carnegie Mellon University |
Keywords: Localization, SLAM, Visual-Based Navigation
Abstract: In this paper we present a novel multi-stereo visual-inertial odometry (VIO) framework which aims to improve the robustness of a robot's state estimate during aggressive motion and in visually challenging environments. Our system uses a fixed-lag smoother which jointly optimizes for poses and landmarks across all stereo pairs. We propose a 1-point RANdom SAmple Consensus (RANSAC) algorithm which is able to perform outlier rejection across features from all stereo pairs. To handle the problem of noisy extrinsics, we account for uncertainty in the calibration of each stereo pair and model it in both our front-end and back-end. The result is a VIO system which is able to maintain an accurate state estimate under conditions that have typically proven to be challenging for traditional state-of-the-art VIO systems. We demonstrate the benefits of our proposed multi-stereo algorithm by evaluating it with both simulated and real world data. We show that our proposed algorithm is able to maintain a state estimate in scenarios where traditional VIO algorithms fail.
|
|
11:00-11:15, Paper TuAT11.5 | |
>Globally Optimal Consensus Maximization for Robust Visual Inertial Localization in Point and Line Map |
|
Jiao, Yanmei | Zhejiang University |
Wang, Yue | Zhejiang University |
Fu, Bo | Zhejiang University, the State Key Laboratory of Industrial Cont |
Tan, Qimeng | Beijing Institute of Spacecraft System Engineering |
Chen, Lei | Beijing Institute of Spacecraft System Engineering |
Wang, Minhang | Huawei |
Huang, Shoudong | University of Technology, Sydney |
Xiong, Rong | Zhejiang University |
Keywords: Localization, Sensor Fusion
Abstract: Map based visual inertial localization is a crucial step to reduce the drift in state estimation of mobile robots. The underlying problem for localization is to estimate the pose from a set of 3D-2D feature correspondences, of which the main challenge is the presence of outliers, especially in changing environment. In this paper, we propose a robust solution based on efficient global optimization of the consensus maximization problem, which is insensitive to high percentage of outliers. We first introduce translation invariant measurements (TIMs) for both points and lines to decouple the consensus maximization problem into rotation and translation subproblems, allowing for a two-stage solver with reduced solution dimensions. Then we show that (i) the rotation can be calculated by minimizing TIMs using only 1-dimensional branch-and-bound (BnB), (ii) the translation can be found by running 1-dimensional search for three times with prioritized progressive voting. Compared with the popular randomized solver, our solver achieves deterministic global convergence without depending on an initial value. While compared with existing BnB based methods, ours is exponentially faster. Finally, by evaluating the performance on both simulation and real-world datasets, our approach gives accurate pose even when there are 90% outliers (only 2 inliers).
|
|
11:15-11:30, Paper TuAT11.6 | |
>The Invariant Rauch-Tung-Striebel Smoother |
|
van der Laan, Niels | Delft University of Technology |
Cohen, Mitchell | McGill University |
Arsenault, Jonathan | McGill University |
Forbes, James Richard | McGill University |
Keywords: Localization, Autonomous Vehicle Navigation, Sensor Fusion
Abstract: This paper presents an invariant Rauch-Tung-Striebel (IRTS) smoother applicable to systems with states that are an element of a matrix Lie group. In particular, the extended Rauch-Tung-Striebel (RTS) smoother is adapted to work within a matrix Lie group framework. The main advantage of the invariant RTS (IRTS) smoother is that the linearization of the process and measurement models is independent of the state estimate resulting in state-estimate-independent Jacobians when certain technical requirements are met. A sample problem is considered that involves estimation of the three dimensional pose of a rigid body on SE(3), along with sensor biases. The multiplicative RTS (MRTS) smoother is also reviewed and is used as a direct comparison to the proposed IRTS smoother using experimental data. Both smoothing methods are also compared to invariant and multiplicative versions of the Gauss-Newton approach to solving the batch state estimation problem.
|
|
TuAT12 |
Room T12 |
Visual Localization III |
Regular session |
Chair: Barfoot, Timothy | University of Toronto |
Co-Chair: Oishi, Shuji | National Institute of Advanced Industrial Science and Technology (AIST) |
|
10:00-10:15, Paper TuAT12.1 | |
>C*: Cross-Modal Simultaneous Tracking and Rendering for 6-DoF Monocular Camera Localization Beyond Modalities |
> Video Attachment
|
|
Oishi, Shuji | National Institute of Advanced Industrial Science and Technology |
Kawamata, Yasunori | Toyohashi University of Technology |
Yokozuka, Masashi | Nat. Inst. of Advanced Industrial Science and Technology |
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Banno, Atsuhiko | National Instisute of Advanced Industrial Science and Technology |
Miura, Jun | Toyohashi University of Technology |
Keywords: Localization, Visual Tracking, Multi-Modal Perception
Abstract: We present a monocular camera localization technique for a three-dimensional prior map. Visual localization has been attracting considerable attention as a lightweight and widely available localization technique for any mobilities; however, it still suffers from appearance changes and a high computational cost. With a view to achieving robust and real-time visual localization, we first reduce the localization problem to alternate local tracking and occasional keyframe rendering by following a simultaneous tracking and rendering algorithm. At the same time, by using an information-theoretic metric denoted normalized information distance in the local tracking, we developed a 6-DoF localization method robust to intensity variations between modalities and varying sensor properties. We quantitatively evaluated the accuracy and robustness of our method using both synthetic and real datasets and achieved reliable and practical localization even in the case of extreme appearance changes.
|
|
10:15-10:30, Paper TuAT12.2 | |
>Denoising IMU Gyroscopes with Deep Learning for Open-Loop Attitude Estimation |
|
Brossard, Martin | Mines ParisTech |
Bonnabel, Silvere | Mines ParisTech |
Barrau, Axel | Safran |
Keywords: Localization, Calibration and Identification
Abstract: This paper proposes a learning method for denois- ing gyroscopes of Inertial Measurement Units (IMUs) using ground truth data, and estimating in real time the orientation (attitude) of a robot in dead reckoning. The obtained algorithm outperforms the state-of-the-art on the (unseen) test sequences. The obtained performances are achieved thanks to a well-chosen model, a proper loss function for orientation increments, and through the identification of key points when training with high-frequency inertial data. Our approach builds upon a neural network based on dilated convolutions, without requiring any recurrent neural network. We demonstrate how efficient our strategy is for 3D attitude estimation on the EuRoC and TUM-VI datasets. Interestingly, we observe our dead reckoning algorithm manages to beat top-ranked visual-inertial odometry systems in terms of attitude estimation although it does not use vision sensors. We believe this paper offers new perspectives for visual-inertial localization and constitutes a step toward more efficient learning methods involving IMUs. Our open-source implementation is available at https://github.com/ mbrossar/denoise-imu-gyro.
|
|
10:30-10:45, Paper TuAT12.3 | |
>Variational Inference with Parameter Learning Applied to Vehicle Trajectory Estimation |
|
Wong, Jeremy Nathan | University of Toronto |
Yoon, David Juny | University of Toronto |
Schoellig, Angela P. | University of Toronto |
Barfoot, Timothy | University of Toronto |
Keywords: Localization, SLAM, Sensor Fusion
Abstract: We present parameter learning in a Gaussian variational inference setting using only noisy measurements (i.e., no groundtruth). This is demonstrated in the context of vehicle trajectory estimation, although the method we propose is general. The paper extends the Exactly Sparse Gaussian Variational Inference (ESGVI) framework, which has previously been used for large-scale nonlinear batch state estimation. Our contribution is to additionally learn parameters of our system models (which may be difficult to choose in practice) within the ESGVI framework. In this paper, we learn the covariances for the motion and sensor models used within vehicle trajectory estimation. Specifically, we learn the parameters of a white-noise-on-acceleration motion model and the parameters of an Inverse-Wishart prior over measurement covariances for our sensor model. We demonstrate our technique using a 36 km dataset consisting of a car using lidar to localize against a high-definition map; we learn the parameters on a training section of the data and then show that we achieve high-quality state estimates on a test section, even in the presence of outliers. Lastly, we show that our framework can be used to solve pose graph optimization even with many false loop closures.
|
|
10:45-11:00, Paper TuAT12.4 | |
>Time-Relative RTK-GNSS: GNSS Loop Closure in Pose Graph Optimization |
|
Suzuki, Taro | Chiba Institute of Technology |
Keywords: Localization, Sensor Fusion, SLAM
Abstract: A pose-graph-based optimization technique is widely used to estimate robot poses using various sensor measurements from devices such as laser scanners and cameras. The global navigation satellite system (GNSS) has recently been used to estimate the absolute 3D position of outdoor mobile robots. However, since the accuracy of GNSS single-point positioning is only a few meters, the GNSS is not used for the loop closure of a pose graph. The main purpose of this study is to generate a loop closure of a pose graph using a time-relative real-time kinematic GNSS (TR-RTK-GNSS) technique. The proposed TR-RTK-GNSS technique uses time–differential carrier phase positioning, which is based on carrier-phase-based differential GNSS with a single GNSS receiver. Unlike a conventional RTK-GNSS, we can directly compute the robot’s relative position using only a stand-alone GNSS receiver. The initial pose graph is generated from the accumulated velocity computed from GNSS Doppler measurements. To reduce the accumulated error of velocity, we use the TR-RTK-GNSS technique for the loop closure in the graph-based optimization framework. The kinematic positioning tests were performed using an unmanned aerial vehicle to confirm the effectiveness of the proposed technique. From the tests, we can estimate the vehicle's trajectory with approximately 3 cm accuracy using only a stand-alone GNSS receiver.
|
|
11:00-11:15, Paper TuAT12.5 | |
>Improving Visual SLAM in Car-Navigated Urban Environments with Appearance Maps |
|
Jaenal, Alberto | University of Malaga |
Zuñiga-Noël, David | University of Malaga |
Gomez-Ojeda, Ruben | University of Málaga |
Gonzalez-Jimenez, Javier | University of Malaga |
Keywords: Localization, Recognition, Visual-Based Navigation
Abstract: This paper describes a method that corrects errors of a VSLAM-estimated trajectory for cars driving in GPS-denied environments, by applying constraints from public databases of geo-tagged images (Google Street View, Mapillary, etc). The method, dubbed Appearance-based Geo-Alignment for Simultaneous Localisation and Mapping (AGA-SLAM), encodes the available image database as an appearance map, which represents the space with a compact holistic descriptor for each image plus its associated geo-tag. The VSLAM trajectory is corrected on-line by incorporating constraints from the recognized places along the trajectory into a position-based optimization framework. The paper presents a seamless formulation to combine local and absolute metric observations with associations from Visual Place Recognition. The robustness of the holistic image descriptor to changes due to weather or illumination variations ensures a long-term consistent method to improve car localization. The proposed method has been extensively evaluated on more than 70 sequences from 4 different datasets, proving out its effectiveness and endurance to appearance challenges.
|
|
11:15-11:30, Paper TuAT12.6 | |
>ROVINS: Robust Omnidirectional Visual Inertial Navigation System |
> Video Attachment
|
|
Seok, Hochang | Hanyang University |
Lim, Jongwoo | Hanyang University |
Keywords: SLAM, Visual-Based Navigation, Omnidirectional Vision
Abstract: Visual odometry is an essential component in robot navigation and autonomous driving. However visual sensors are vulnerable in fast motion or sudden illumination changes. To compensate such weakness, inertial measurement units (IMUs) can be used to maintain the short-term motion when visual sensing is unstable, and to enhance the quality of estimated motion with inertial information. Recently ROVO (omnidirectional visual odometry) demonstrated superior performance and stability due to unceasing feature observation of the omnidirectional setup. However it still has the shortcomings of visual odometry. In this paper we propose an omnidirectional visual-inertial odometry system, which seamlessly integrate the inertial information into the omnidirectional visual odometer algorithm. First the soft relative pose constraints from inertial measurement is added to the pose optimization formulation, which enables blind motion estimation when all visual features are lost. Second by initializing the visual features in tracking using the prediction results from the estimated velocity, the feature tracking becomes more robust to visual disturbances. The experimental results show that the proposed visual-inertial algorithm outperforms the vision-only algorithm with significant margins.
|
|
TuAT13 |
Room T13 |
Mapping |
Regular session |
Chair: Olson, Edwin | University of Michigan |
Co-Chair: Zheng, Nanning | Xi'an Jiaotong University |
|
10:00-10:15, Paper TuAT13.1 | |
>CoBigICP: Robust and Precise Point Set Registration Using Correntropy Metrics and Bidirectional Correspondence |
> Video Attachment
|
|
Yin, Pengyu | Xi'an Jiaotong University |
Wang, Di | Xi'an Jiaotong University |
Du, Shaoyi | Xi'an Jiaotong University |
Ying, Shihui | School of Science, ShanghaiUniversity |
Gao, Yue | Tsinghua University |
Zheng, Nanning | Xi'an Jiaotong University |
Keywords: Probability and Statistical Methods, Mapping, Localization
Abstract: In this paper, we propose a novel probabilistic variant of iterative closest point (ICP) dubbed as CoBigICP. The method leverages both local geometrical information and global noise characteristics. Locally, the 3D structure of both target and source clouds are incorporated into the objective function through bidirectional correspondence. Globally, error metric of correntropy is introduced as noise model to resist outliers. Importantly, the close resemblance between normal-distributions transform (NDT) and correntropy is revealed. To ease the minimization step, an on-manifold parameterization of the special Euclidean group is proposed. Extensive experiments validate that CoBigICP outperforms several well-known and state-of-the-art methods.
|
|
10:15-10:30, Paper TuAT13.2 | |
>The Masked Mapper: Masked Metric Mapping |
|
Haggenmiller, Acshi | University of Michigan |
Kabacinski, Cameron | University of Michigan |
Krogius, Maximilian | University of Michigan |
Olson, Edwin | University of Michigan |
Keywords: SLAM, Mapping, Localization
Abstract: In this paper, we propose a flexible mapping scheme that uses a masking function ( mask) to focus the attention of a pose graph SLAM (Simultaneous Localization and Mapping) system. The masking function takes the robot's observations and returns true if the robot is in an important location. State-of-the-art methods in SLAM generate dense metric lidar maps, creating precise maps at a high computational cost by storing lidar scans for each pose node and continually attempting to close loops. In many cases, trying to always make loop closures is unnecessary for localization and even risky because of perceptual aliasing and false positives. By masking out these less useful positions, our method can create more accurate maps despite performing far fewer scan matches. We evaluate our system with three simple mask functions on a 2.5 km trajectory with significant angular drift. We compare the number of scan matches performed under each mask as well as the accuracy of the loop closures.
|
|
10:30-10:45, Paper TuAT13.3 | |
>Allocating Limited Sensing Resources to Accurately Map Dynamic Environments |
> Video Attachment
|
|
Mitchell, Derek | Carnegie Mellon University |
Michael, Nathan | Carnegie Mellon University |
Keywords: Energy and Environment-Aware Automation, Environment Monitoring and Management, Mapping
Abstract: This work addresses the problem of learning a model of a dynamic environment using many independent Hidden Markov Models (HMMs) with a limited number of observations available per iteration. Many techniques exist to model dynamic environments, but do not consider how to deploy robots to build this model. Additionally, there are many techniques for exploring environments that do not consider how to prioritize regions when resources, in terms of robots to deploy and deployment durations, are limited. Here, we consider an environment model consisting of a series of HMMs that evolve over time independently and can be directly observed. At each iteration, we must determine which HMMs to observe in order to maximize the gain in model accuracy. We present a utility measure that balances a Pearson's chi-squared goodness-of-fit of the dynamics model with Mutual Information (MI) to ensure that observations are allocated to maximize the convergence rate of all HMMs, resulting in a faster convergence to higher steady-state model confidence and accuracy than either chi-squared or MI alone.
|
|
10:45-11:00, Paper TuAT13.4 | |
>Adaptive Kernel Inference for Dense and Sharp Occupancy Grids |
> Video Attachment
|
|
Kwon, Youngsun | KAIST |
Moon, Bochang | Gwangju Institute of Science and Technology |
Yoon, Sung-eui | KAIST |
Keywords: Mapping, SLAM
Abstract: In this paper, we present a new approach, AKIMap, that uses an adaptive kernel inference for dense and sharp occupancy grid representations. Our approach is based on the multivariate kernel estimation, and we propose a simple, two-stage based method that selects an adaptive bandwidth matrix for an efficient and accurate occupancy estimation. To utilize correlations of occupancy observations given sparse and non-uniform distributions of point samples, we propose to use the covariance matrix as an initial bandwidth matrix, and then optimize the bandwidth matrix by adjusting its scale in an efficient, data-driven way for on-the-fly mapping. We demonstrate that the proposed technique estimates occupancy states more accurately than state-of-the-art methods given equal-data or equal-time settings, thanks to our adaptive inference. Furthermore, we show the practical benefits of the proposed work in on-the-fly mapping and observe that our adaptive approach shows the dense as well as sharp occupancy representations in a real environment.
|
|
11:00-11:15, Paper TuAT13.5 | |
>Object-Based Pose Graph for Dynamic Indoor Environments |
> Video Attachment
|
|
Gomez, Clara | University Carlos III of Madrid |
Hernandez Silva, Alejandra Carolina | University Carlos III of Madrid |
Derner, Erik | Czech Technical University in Prague |
Barber, Ramon | Universidad Carlos III of Madrid |
Babuska, Robert | Delft University of Technology |
Keywords: Mapping, Dynamics, Service Robotics
Abstract: Relying on static representations of the environment limits the use of mapping methods in most real-world tasks. Real-world environments are dynamic and undergo changes that need to be handled through map adaptation. In this work, an object-based pose graph is proposed to solve the problem of mapping in indoor dynamic environments with mobile robots. In contrast to state-of-the art methods where binary classifications between movable and static objects are used, we propose a new method to capture the probability of different objects over time. Object probability represents how likely it is to find a specific object in its previous location and it gives a quantification of how movable specific objects are. In addition, grouping object probabilities according to object class allows us to evaluate the movability of different object classes. We validate our object-based pose graph in real-world dynamic environments. Results in mapping and map adaptation with a real robot show efficient map maintenance through several mapping sessions and results in object classification according to movability show an improvement compared to binary classification.
|
|
11:15-11:30, Paper TuAT13.6 | |
>UFOMap: An Efficient Probabilistic 3D Mapping Framework That Embraces the Unknown |
|
Duberg, Daniel | KTH - Royal Institute of Technology |
Jensfelt, Patric | KTH - Royal Institute of Technology |
Keywords: Mapping, RGB-D Perception, Motion and Path Planning
Abstract: 3D models are an essential part of many robotic applications. In applications where the environment is unknown a-priori, or where only a part of the environment is known, it is important that the 3D model can handle the unknown space efficiently. Path planning, exploration, and reconstruction all fall into this category. In this paper we present an extension to OctoMap which we call UFOMap. UFOMap uses an explicit representation of all three states in the map, i.e., unknown, free, and occupied. This gives, surprisingly, a more memory efficient representation. We provide methods that allow for significantly faster insertions into the octree. Furthermore, UFOMap supports fast queries based on occupancy state using so called indicators and based on location by exploiting the octree structure and bounding volumes. This enables real-time colored octree mapping at high resolution (below 1 cm). UFOMap is contributed as a C++ library that can be used standalone but is also integrated into ROS.
|
|
TuAT14 |
Room T14 |
Mapping for Navigation |
Regular session |
Chair: Gawel, Abel Roman | ETH Zurich |
Co-Chair: Bertrand, Sylvain | Institute for Human and Machine Cognition |
|
10:00-10:15, Paper TuAT14.1 | |
>Detecting Usable Planar Regions for Legged Robot Locomotion |
|
Bertrand, Sylvain | Institute for Human and Machine Cognition |
Lee, Inho | IHMC |
Mishra, Bhavyansh | Institute of Human and Machine Cognition, University of West Flo |
Calvert, Duncan | IHMC |
Pratt, Jerry | Inst. for Human and Machine Cognition |
Griffin, Robert J. | Institute for Human and Machine Cognition (IHMC) |
Keywords: Mapping, Legged Robots, Visual-Based Navigation
Abstract: Awareness of the environment is essential for mobile robots. Perception for legged robots requires high levels of reliability and accuracy in order to walk stably in the types of complex, cluttered environments we are interested in. In this paper, we present a usable environmental perception algorithm designed to detect steppable areas and obstacles for the autonomous generation of desired footholds for legged robots. To produce an efficient representation of the environment, the proposed perception algorithm is desired to cluster point cloud data to planar regions composed of convex polygons. We describe in this paper the end-to-end pipeline from data collection to generation of the regions, where we first compose an octree in order to create a more efficient data representation. We then group the leaves in the tree using a nearest neighbor search into a planar region, which is composed of the concave hull of points that is decomposed into convex polygons. We present a variety of environments, and illustrate the usability of this approach by the Atlas humanoid robots walking over rough terrain. We also discuss various challenges we faced and insights we gained in the development of this approach.
|
|
10:15-10:30, Paper TuAT14.2 | |
>Accurate Mapping and Planning for Autonomous Racing |
> Video Attachment
|
|
Andresen, Leiv | ETH Zurich, Autonomous Systems Lab |
Brandemuehl, Adrian | ETH Zurich, Autonomous Systems Lab |
Hönger, Alex | ETH Zurich, Autonomous Systems Lab |
Kuan, Benson | ETH Zurich |
Vödisch, Niclas | ETH Zurich, Autonomous Systems Lab |
Blum, Hermann | ETH Zurich |
Reijgwart, Victor | ETH Zurich |
Bernreiter, Lukas | ETH Zurich, Autonomous Systems Lab |
Schaupp, Lukas | ETH Zurich |
Chung, Jen Jen | Eidgenössische Technische Hochschule Zürich |
Bürki, Mathias | Autonomous Systems Lab, ETH Zuerich |
Oswald, Martin R. | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Gawel, Abel Roman | ETH Zurich |
Keywords: Mapping, Motion and Path Planning, Sensor Fusion
Abstract: This paper presents the perception, mapping, and planning pipeline implemented on an autonomous race car. It was developed by the 2019 AMZ driverless team for the Formula Student Germany (FSG) 2019 driverless competition, where it won 1st place overall. The presented solution combines early fusion of camera and LiDAR data, a layered mapping approach, and a planning approach that uses Bayesian filtering to achieve high-speed driving on unknown race tracks while creating accurate maps. We benchmark the method against our team’s previous solution, which won FSG 2018, and show improved accuracy when driving at the same speeds. Furthermore, the new pipeline makes it possible to reliably raise the maximum driving speed in unknown environments from 3 m/s to 12 m/s while still mapping with an acceptable RMSE of 0.29 m.
|
|
10:30-10:45, Paper TuAT14.3 | |
>Crowdsourced 3D Mapping: A Combined Multi-View Geometry and Self-Supervised Learning Approach |
|
Chawla, Hemang | Navinfo Europe |
Jukola, Matti | Navinfo EU |
Brouns, Terence | NavInfo Europe |
Arani, Elahe | Navinfo Europe |
Zonooz, Bahram | Navinfo Europe |
Keywords: Mapping, SLAM, Deep Learning for Visual Perception
Abstract: The ability to efficiently utilize crowd-sourced visual data carries immense potential for the domains of large scale dynamic mapping and autonomous driving. However, state-of-the-art methods for crowdsourced 3D mapping assume prior knowledge of camera intrinsics. In this work we propose a framework that estimates the 3D positions of semantically meaningful landmarks such as traffic signs without assuming known camera intrinsics, using only monocular color camera and GPS. We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning, and show that combining their strengths is important for increasing the map coverage. To facilitate research on this task, we construct and make available a KITTI based 3D traffic sign ground truth positioning dataset. Using our proposed framework, we achieve an average single-journey relative and absolute positioning accuracy of 39 cm and 1.26 m respectively, on this dataset.
|
|
10:45-11:00, Paper TuAT14.4 | |
>Efficient Multiresolution Scrolling Grid for Stereo Vision-Based MAV Obstacle Avoidance |
|
Dexheimer, Eric | Carnegie Mellon University |
Mangelson, Joshua | Brigham Young University |
Scherer, Sebastian | Carnegie Mellon University |
Kaess, Michael | Carnegie Mellon University |
Keywords: Mapping, Aerial Systems: Perception and Autonomy, Collision Avoidance
Abstract: Fast, aerial navigation in cluttered environments requires a suitable map representation for path planning. In this paper, we propose the use of an efficient, structured multiresolution representation that expands the sensor range of dense local grids for memory-constrained platforms. While similar data structures have been proposed, we avoid processing redundant occupancy information and use the organization of the grid to improve efficiency. By layering 3D circular buffers that double in resolution at each level, obstacles near the robot are represented at finer resolutions while coarse spatial information is maintained at greater distances. We also introduce a novel method for efficiently calculating the Euclidean distance transform on the multiresolution grid by leveraging its structure. Lastly, we utilize our proposed framework to demonstrate improved stereo camera-based MAV obstacle avoidance with an optimization-based planner in simulation.
|
|
11:00-11:15, Paper TuAT14.5 | |
>DenseFusion: Large-Scale Online Dense Pointcloud and DSM Mapping for UAVs |
> Video Attachment
|
|
Chen, Lin | Northwestern Polytechnical University |
Zhao, Yong | Northwestern Polytechnic University |
Xu, Shibiao | Institute of Automation, Chinese Academy of Sciences |
Bu, Shuhui | Northwestern Polytechnical University |
Han, Pengcheng | Northwestern Polytechnical University |
Wan, Gang | Information Engineering University |
Keywords: Mapping, SLAM, Localization
Abstract: With the rapidly developing unmanned aerial vehicles, the requirements of generating maps efficiently and quickly are increasing. To realize online mapping, we develop a real-time dense mapping framework named DenseFusion which can incrementally generates dense geo-referenced 3D point cloud, digital orthophoto map (DOM) and digital surface model (DSM) from sequential aerial images with optional GPS information. The proposed method works in real-time on standard CPUs even for processing high resolution images. Based on the advanced monocular SLAM, our system first estimates appropriate camera poses and extracts effective keyframes, and next constructs virtual stereo-pair from consecutive frame to generate pruned dense 3D point clouds; then a novel real-time DSM fusion method is proposed which can incrementally process dense point cloud. Finally, a high efficiency visualization system is developed to adopt dynamic levels of detail method, which makes it render dense point cloud and DSM smoothly. The performance of the proposed method is evaluated through qualitative and quantitative experiments. The results indicate that compared to traditional structure from motion based approaches, the presented framework is able to output both large-scale high-quality DOM and DSM in real-time with low computational cost.
|
|
TuAT15 |
Room T15 |
Search and Mapping |
Regular session |
Chair: Ayanian, Nora | University of Southern California |
|
10:00-10:15, Paper TuAT15.1 | |
>Sampling-Based Search for a Semi-Cooperative Target |
|
Vandermeulen, Isaac | IRobot Corporation |
Gross, Roderich | The University of Sheffield |
Kolling, Andreas | Amazon |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Motion and Path Planning
Abstract: Searching for a lost teammate is an important task for multirobot systems. We present a variant of rapidly-expanding random trees (RRT) for generating search paths based on a probabilistic belief of the target teammate’s position. The belief is updated using a hidden Markov model built from knowledge of the target’s planned or historic behavior. For any candidate search path, this belief is used to compute a discounted reward which is a weighted sum of the connection probability at each time step. The RRT search algorithm uses randomly sampled locations to generate candidate vertices and adds candidate vertices to a planning tree based on bounds on the discounted reward. Candidate vertices are along the shortest path from an existing vertex to the sampled location, biasing the search based on the topology of the environment. This method produces high quality search paths which are not constrained to a grid and can be computed fast enough to be used in real time. Compared with two other strategies, it found the target significantly faster in the most difficult 60% of situations and was similar in the easier 40% of situations.
|
|
10:15-10:30, Paper TuAT15.2 | |
>Mixed-Integer Linear Programming Models for Multi-Robot Non-Adversarial Search |
|
Arruda Asfora, Beatriz | Cornell University |
Banfi, Jacopo | Cornell University |
Campbell, Mark | Cornell University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Search and Rescue Robots
Abstract: In this letter, we consider the Multi-Robot Efficient Search Path Planning (MESPP) problem, where a team of robots is deployed in a graph-represented environment to capture a moving target within a given deadline. We prove this problem to be NP-hard, and present the first set of Mixed-Integer Linear Programming (MILP) models to tackle the MESPP problem. Our models are the first to encompass multiple searchers, arbitrary capture ranges, and false negatives simultaneously. While state-of-the-art algorithms for MESPP are based on simple path enumeration, the adoption of MILP as a planning paradigm allows to leverage the powerful techniques of modern solvers, yielding better computational performance and, as a consequence, longer planning horizons. The models are designed for computing optimal solutions offline, but can be easily adapted for a distributed online approach. Our simulations show that it is possible to achieve 98% decrease in computational time relative to the previous state-of-the-art. We also show that the distributed approach performs nearly as well as the centralized, within 6% in the settings studied in this letter, with the advantage of requiring significant less time – an important consideration in practical search missions.
|
|
10:30-10:45, Paper TuAT15.3 | |
>Decentralised Self-Organising Maps for Multi-Robot Information Gathering |
|
Best, Graeme | Oregon State University |
Hollinger, Geoffrey | Oregon State University |
Keywords: Multi-Robot Systems, Planning, Scheduling and Coordination, Environment Monitoring and Management
Abstract: This paper presents a new coordination algorithm for decentralised multi-robot information gathering. We consider planning for an online variant of the multi-agent orienteering problem with neighbourhoods. This formulation closely aligns with a number of important tasks in robotics, including inspection, surveillance, and reconnaissance. We propose a decentralised variant of the self-organising map (SOM) learning procedure, named Dec-SOM, which efficiently plans sequences of waypoints for a team of robots. Decentralisation is achieved by performing a distributed allocation scheme jointly with a series of SOM adaptations. We also offer an efficient heuristic to select when to perform negotiations, which reduces communication resource usage. Simulation results in two settings, including an infrastructure inspection scenario with a real-world dataset of oil rigs, demonstrate that Dec-SOM outperforms baseline methods and other SOM variants, is competitive with centralised SOM, and is a viable solution for decentralised information gathering.
|
|
10:45-11:00, Paper TuAT15.4 | |
>Asynchronous Adaptive Sampling and Reduced-Order Modeling of Dynamic Processes by Robot Teams Via Intermittently Connected Networks |
> Video Attachment
|
|
Rovina, Hannes Kaspar | Swiss Federal Institute of Technology Lausanne, EPFL |
Salam, Tahiya | 1995 |
Kantaros, Yiannis | University of Pennsylvania |
Hsieh, M. Ani | University of Pennsylvania |
Keywords: Distributed Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Environment Monitoring and Management
Abstract: This work presents an asynchronous multi-robot adaptive sampling strategy through the synthesis of an intermittently connected mobile robot communication network. The objective is to enable a team of robots to adaptively sample and model a nonlinear dynamic spatiotemporal process. By employing an intermittently connected communication network, the team is not required to maintain an all-time connected network enabling them to cover larger areas, especially when the team size is small. The approach first determines the next meeting locations for data exchange and as the robots move towards these predetermined locations, they take measurements along the way. The data is then shared with other team members at the designated meeting locations and a reduced-order-model (ROM) of the process is obtained in a distributed fashion. The ROM is used to estimate field values in areas without sensor measurements, which informs the path planning algorithm when determining a new meeting location for the team. The main contribution of this work is an intermittent communication framework for asynchronous adaptive sampling of dynamic spatiotemporal processes. We demonstrate the framework in simulation and compare different reduced-order models under full, all-time and intermittent connectivity.
|
|
11:00-11:15, Paper TuAT15.5 | |
>Inter-Robot Range Measurements in Pose Graph Optimization |
> Video Attachment
|
|
Boroson, Elizabeth | University of Southern California |
Hewitt, Robert | Jet Propulsion Laboratory |
Ayanian, Nora | University of Southern California |
de la Croix, Jean-Pierre | Jet Propulsion Laboratory, California Institute of Technology |
Keywords: SLAM, Multi-Robot Systems, Field Robots
Abstract: For multiple robots performing exploration in a previously unmapped environment, such as planetary exploration, maintaining accurate localization and building a consistent map are vital. If the robots do not have a map to localize against and do not explore the same area, they may not be able to find visual loop closures to constrain their relative poses, making traditional SLAM impossible. This paper presents a method for using UWB ranging sensors in multi-robot SLAM, which allows the robots to localize and build a map together even without visual loop closures. The ranging measurements are added to the pose graph as edges and used in optimization to estimate the robots’ relative poses. This method builds a map using all robots’ observations that is consistent and usable. It performs similarly to visual loop closures when they are available, and provides a good map when they are not, which other methods cannot do. The method is demonstrated on PUFFER robots, developed for autonomous planetary exploration, in an unstructured environment.
|
|
11:15-11:30, Paper TuAT15.6 | |
>An Approach to Reduce Communication for Multi-Agent Mapping Applications |
|
Kepler, Michael | Virginia Polytechnic Institute and State University |
Stilwell, Daniel | Virginia Tech |
Keywords: Multi-Robot Systems, Mapping, Distributed Robot Systems
Abstract: In the context of a multi-agent system that uses a Gaussian process to estimate a spatial field of interest, we propose an approach that enables an agent to reduce the amount of data it shares with other agents. The main idea of the strategy is to rigorously assign a novelty metric to each measurement as it is collected, and only measurements that are sufficiently novel are communicated. We consider the ideal scenario where an agent can instantly share novel measurements, and we also consider the more practical scenario in which communication suffers from low bandwidth and is range-limited. For this scenario, an agent can only broadcast an informative subset of the novel measurements when the agent encounters other agents. We explore three different informative criteria for subset selection, namely entropy, mutual information, and a new criterion that reflects the value of a measurement. We apply our approach to three real-world datasets relevant to robotic mapping. The empirical findings show that an agent can reduce the amount of communicated measurements by two orders of magnitude and that the new criterion for subset selection yields superior predictive performance relative to entropy and mutual information.
|
|
TuAT16 |
Room T16 |
Sensor Fusion for Localization and Mapping |
Regular session |
Chair: Weiss, Stephan | Universität Klagenfurt |
Co-Chair: Min, Byung-Cheol | Purdue University |
|
10:00-10:15, Paper TuAT16.1 | |
>Pi-Map: A Decision-Based Sensor Fusion with Global Optimization for Indoor Mapping |
> Video Attachment
|
|
Yang, Zhiliu | Clarkson University |
Yu, Bo | PerceptIn |
Hu, Wei | PerceptIn Inc |
Tang, Jie | South China University of Technology |
Liu, Shaoshan | PerceptIn |
Liu, Chen | Clarkson University |
Keywords: Mapping, Sensor Fusion
Abstract: In this paper, we propose pi-map, an affordable, reliable, and scalable indoor mapping system for autonomous robot navigation. Firstly, we split participants for localization and mapping according to the precision of different sensors. Only LiDAR range data is used for global pose estimation with loop closure. Both LiDAR and sonar are used for map registration in a Bayesian filter fashion. Then, a tightly-coupled decision-based sensor fusion is performed by trajectory revisiting and rays casting. A trajectory fitting mechanism is also introduced to handle the nodes density mismatch between different sensors. Whole system applying only economical off-the-shelf sensors for map construction. Our experimental results quantitatively demonstrate the effectiveness of the proposed method, which is able to produce high-quality maps in both small-scale and large-scale real-world environments.
|
|
10:15-10:30, Paper TuAT16.2 | |
>MOZARD: Multi-Modal Localization for Autonomous Vehicles in Urban Outdoor Environments |
> Video Attachment
|
|
Schaupp, Lukas | ETH Zurich |
Pfreundschuh, Patrick | ETH Zurich |
Bürki, Mathias | Autonomous Systems Lab, ETH Zuerich |
Cadena Lerma, Cesar | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Nieto, Juan | ETH Zürich |
Keywords: Sensor Fusion, Mapping, Localization
Abstract: Visually poor scenarios are one of the main sources of failure in visual localization systems in outdoor environments. To address this challenge, we present MOZARD, a multi-modal localization system for urban outdoor environments using vision and LiDAR. By extending our preexisting key-point based visual multi-session local localization approach with the use of semantic data, an improved localization recall can be achieved across vastly different appearance conditions. In particular we focus on the use of curbstone information because of their broad distribution and reliability within urban environments. We present thorough experimental evaluations on several driving kilometers in challenging urban outdoor environments, analyze the recall and accuracy of our localization system and demonstrate in a case study possible failure cases of each subsystem.We demonstrate that MOZARD is able to bridge scenarios where our previous work VIZARD fails, hence yielding an increased recall performance, while a similar localization accuracy of 0.2[m] is achieved.
|
|
10:30-10:45, Paper TuAT16.3 | |
>Consistent Covariance Pre-Integration for Invariant Filters with Delayed Measurements |
|
Allak, Eren | Universität Klagenfurt |
Fornasier, Alessandro | University of Klagenfurt |
Weiss, Stephan | Universität Klagenfurt |
Keywords: Sensor Fusion, Localization, Autonomous Vehicle Navigation
Abstract: Sensor fusion systems merging (multiple) delayed sensor signals through a statistical approach are challenging setups, particularly for resource constrained platforms. For statistical consistency, one would be required to keep an appropriate history, apply the correcting signal at the given time stamp in the past, and re-apply all information received until the present time. This re-calculation becomes impractical (the bottleneck being the re-propagation of the covariance matrices for estimator consistency) for platforms with multiple sensors/states and low compute power. This work presents a novel approach for consistent covariance pre-integration allowing delayed sensor signals to be incorporated in a statistically consistent fashion with very low complexity. We leverage recent insights in Invariant Extended Kalman Filters (IEKF) and their log-linear, state independent error propagation together with insights from the scattering theory to mimic the re-calculation process as a medium through which we can propagate waves (covariance information in this case) in single operation steps. We support our findings in simulation and with real data.
|
|
10:45-11:00, Paper TuAT16.4 | |
>Synchronization of Microphones Based on Rank Minimization of Warped Spectrum for Asynchronous Distributed Recording |
|
Itoyama, Katsutoshi | Tokyo Institute of Technology |
Nakadai, Kazuhiro | Honda Research Inst. Japan Co., Ltd |
Keywords: Robot Audition, Sensor Fusion, Sensor Networks
Abstract: This paper describes a new method for synchronizing microphones based on spectral warping in an asynchronous microphone array. In an audio signal observed by an asynchronous microphone array, two factors are involved: the time lag caused by a mismatch of the sampling rate and offset between microphones, and the modulation caused by differences in spatial transfer function between the sound source and each microphone. A spectrum warping matrix representing a resampling effect in the frequency domain is formulated and an observation model of audio (spectrum) mixture in an asynchronous microphone array is constructed. The proposed synchronization method uses an iterative optimization algorithm based on gradient descent of a new objective function. The function is formulated as a logarithmic determinant of a spectrum correlation matrix that is derived from relaxation of a rank minimization problem. Experimental results showed that the proposed method effectively estimates modulated sampling rate and that the proposed method outperforms an existing synchronization method.
|
|
11:00-11:15, Paper TuAT16.5 | |
>Self-Supervised Neural Audio-Visual Sound Source Localization Via Probabilistic Spatial Modeling |
> Video Attachment
|
|
Masuyama, Yoshiki | Waseda University |
Bando, Yoshiaki | Kyoto University |
Yatabe, Kohei | Waseda University |
Sasaki, Yoko | National Inst. of Advanced Industrial Science and Technology |
Onishi, Masaki | National Inst. of AIST |
Oikawa, Yasuhiro | Waseda University |
Keywords: Robot Audition, Multi-Modal Perception, Sensor Fusion
Abstract: Detecting sound source objects within visual observation is important for autonomous robots to comprehend surrounding environments. Since sounding objects have a large variety with different appearances in our living environments, labeling all sounding objects is impossible in practice. This calls for self-supervised learning which does not require manual labeling. Most of conventional self-supervised learning uses monaural audio signals and images and cannot distinguish sound source objects having similar appearances due to poor spatial information in audio signals. To solve this problem, this paper presents a self-supervised training method using 360-deg images and multichannel audio signals. By incorporating with the spatial information in multichannel audio signals, our method trains deep neural networks (DNNs) to distinguish multiple sound source objects. Our system for localizing sound source objects in the image is composed of audio and visual DNNs. The visual DNN is trained to localize sound source candidates within an input image. The audio DNN verifies whether each candidate actually produces sound or not. These DNNs are jointly trained in a self-supervised manner based on a probabilistic spatial audio model. Experimental results with simulated data showed that the DNNs trained by our method localized multiple speakers. We also demonstrate that the visual DNN detected objects including talking visitors and specific exhibits from real data recorded in a science museum.
|
|
11:15-11:30, Paper TuAT16.6 | |
>Material Mapping in Unknown Environments Using Tapping Sound |
> Video Attachment
|
|
Kannan, Shyam Sundar | Purdue University |
Jo, Wonse | Purdue University |
Parasuraman, Ramviyas | University of Georgia |
Min, Byung-Cheol | Purdue University |
Keywords: Mapping, Multi-Modal Perception, Motion and Path Planning
Abstract: In this paper, we propose an autonomous exploration and a tapping mechanism-based material mapping system for a mobile robot in unknown environments. The goal of the proposed system is to integrate simultaneous localization and mapping (SLAM) modules and sound-based material classification to enable a mobile robot to explore an unknown environment autonomously and at the same time identify the various objects and materials in the environment. This creates a material map that localizes the various materials in the environment which has potential applications for search and rescue scenarios. A tapping mechanism and tapping audio signal processing based on machine learning techniques are exploited for a robot to identify the objects and materials. We demonstrate the proposed system through experiments using a mobile robot platform installed with Velodyne LiDAR, a linear solenoid, and microphones in an exploration-like scenario with various materials. Experiment results demonstrate that the proposed system can create useful material maps in unknown environments.
|
|
TuAT17 |
Room T17 |
Cooeprative SLAM |
Regular session |
Chair: Heckman, Christoffer | University of Colorado at Boulder |
Co-Chair: Kim, Jinwhan | KAIST |
|
10:00-10:15, Paper TuAT17.1 | |
>Dense Decentralized Multi-Robot SLAM Based on Locally Consistent TSDF Submaps |
> Video Attachment
|
|
Dubois, Rodolphe | ONERA |
Eudes, Alexandre | ONERA |
Moras, Julien | ONERA |
Fremont, Vincent | Ecole Centrale De Nantes, CNRS, LS2N, UMR 6004 |
Keywords: SLAM, Multi-Robot Systems
Abstract: This article introduces a decentralized multi-robot algorithm for Simultaneous Localization And Mapping (SLAM) inspired from the work of Duhautbout et al. (2019). This method makes each robot jointly build and exchange i) a collection of 3D dense locally consistent submaps, based on a Truncated Signed Distance Field (TSDF) representation of the environment, and ii) a pose-graph representation which encodes the relative pose constraints between the TSDF submaps and the trajectory keyframes, derived from the odometry, inter-robot observations and loop closures. Such loop closures are spotted by aligning and fusing the TSDF submaps. The performances of this method have been evaluated on the EuRoC dataset (Burri et al., 2016).
|
|
10:15-10:30, Paper TuAT17.2 | |
>A Decentralized Framework for Simultaneous Calibration, Localization and Mapping with Multiple LiDARs |
> Video Attachment
|
|
Lin, Jiarong | The University of Hong Kong |
Liu, Xiyuan | The University of Hong Kong |
Zhang, Fu | University of Hong Kong |
Keywords: SLAM, Sensor Fusion, Calibration and Identification
Abstract: LiDAR is playing a more and more essential role in autonomous driving vehicles for objection detection, self localization and mapping. A single LiDAR frequently suffers from hardware failure (e.g., temporary loss of connection) due to the harsh vehicle environment (e.g., temperature, vibration, etc.), or performance degradation due to the lack of sufficient geometry features, especially for solid-state LiDARs with small field of view (FoV). To improve the system robustness and performance in self-localization and mapping, we develop a decentralized framework for simultaneous calibration, localization and mapping with multiple LiDARs. Our proposed framework is based on an extended Kalman filter (EKF), but is specially formulated for decentralized implementation. Such an implementation could potentially distribute the intensive computation among smaller computing devices or resources dedicated for each LiDAR and remove the single point of failure problem. Then this decentralized formulation is implemented on an unmanned ground vehicle (UGV) carrying 5 low-cost LiDARs and moving at 1.36m/s in urban environments. Experiment results show that the proposed method can successfully and simultaneously estimate the vehicle state (i.e., pose and velocity) and all LiDAR extrinsic parameters. The localization accuracy is up to textbf{0.2}% on the two datasets we collected. To share our findings and to make contributions to the community, meanwhile enable the readers to verify our work, we will release all our source codes and hardware design blueprint on our Github.
|
|
10:30-10:45, Paper TuAT17.3 | |
>Better Together: Online Probabilistic Clique Change Detection in 3D Landmark-Based Maps |
|
Bateman, Samuel | University of Colorado - Boulder |
Harlow, Kyle | University of Colorado Boulder |
Heckman, Christoffer | University of Colorado at Boulder |
Keywords: SLAM, Probability and Statistical Methods, Mapping
Abstract: Many modern simultaneous localization and mapping (SLAM) techniques rely on sparse landmark-based maps due to their real-time performance. However, these techniques frequently assert that these landmarks are fixed in position over time, known as the "static-world assumption." This is rarely, if ever, the case in most real-world environments. Even worse, over long deployments, robots are bound to observe traditionally static landmarks change, e.g. when an autonomous vehicle encounters a construction zone. This work addresses this challenge, accounting for changes in complex three dimensional environments with the creation of a probabilistic filter that operates on the features that give rise to landmarks. To accomplish this, landmarks are clustered into cliques and a filter is developed to estimate their persistence jointly among observations of the landmarks in a clique. This filter uses estimated spatial-temporal priors of geometric objects, allowing for dynamic and semi-static objects to be removed from a formally static map. The proposed algorithm is validated in a 3D simulated environment.
|
|
10:45-11:00, Paper TuAT17.4 | |
>Robust Loop Closure Method for Multi-Robot Map Fusion by Integration of Consistency and Data Similarity |
|
Do, Haggi | KAIST |
Hong, Seonghun | Keimyung University |
Kim, Jinwhan | KAIST |
Keywords: SLAM, Multi-Robot Systems
Abstract: For an efficient collaboration of multi-robot system during missions, it is essential for the system to create a global map and localize the robots in it. However, the relative poses among robots may be unknown, preventing the system from generating the reference map. In such cases, the necessary information must be inferred through inter-robot loop closures, which are mainly perception-derived measurements obtained when robots observe the same place. However, as perception-derived measurements rely on the similarity of sensor data, different places could be wrongly identified as the same location if they exhibit similar appearances. This phenomenon, called perceptual aliasing, produces inaccurate loop closures that can severely distort the global map. This study presents a robust inter-robot loop closure selection for map fusion that utilizes the degrees of both consistency and data similarity of the loop closures for accurate measurement determination. We define the coalition of these information as the measurement pair score and employ it as weights in the objective function of the combinatorial optimization problem that can be solved as maximum edge weight clique from graph theory. The algorithm is tested on an experimental dataset for performance evaluation and the result is discussed in comparison to a state-of-the-art method.
|
|
11:00-11:15, Paper TuAT17.5 | |
>Real-Time Multi-SLAM System for Agent Localization and 3D Mapping in Dynamic Scenarios |
> Video Attachment
|
|
Ireta Muñoz, Fernando Israel | INRIA |
Roussel, David | IBISC, UEVE, Université Paris Saclay |
Alliez, Pierre | INRIA Sophia-Antipolis |
Bonardi, Fabien | Université De Rouen |
Bouchafa, Samia | Univ d'Evry Val d'Essonne/Université Paris Saclay |
Didier, Jean-Yves | Université D'Evry |
Kachurka, Viachaslau | Universite Paris Saclay, Univ Evry |
Rault, Bastien | Innodura TB |
Hadj-Abdelkader, Hicham | IBISC |
Robin, Maxime | Innodura TB |
Keywords: Sensor Fusion, SLAM, Agent-Based Systems
Abstract: This paper introduces a Wearable SLAM system that performs indoor and outdoor SLAM in real time. The related project is part of the MALIN challenge which aims at creating a system to track emergency response agents in complex scenarios (such as dark environments, smoked rooms, repetitive patterns, building floor transitions and doorway crossing problems), where GPS technology is insufficient or inoperative. The proposed system fuses different SLAM technologies to compensate the lack of robustness of each, while estimating the pose individually. LiDAR and visual SLAM are fused with an inertial sensor in such a way that the system is able to maintain GPS coordinates that are sent via radio to a ground station, for real-time tracking. More specifically, LiDAR and monocular vision technologies are tested in dynamic scenarios where the main advantages of each have been evaluated and compared. Finally, 3D reconstruction up to three levels of details is performed.
|
|
11:15-11:30, Paper TuAT17.6 | |
>Asynchronous and Parallel Distributed Pose Graph Optimization |
> Video Attachment
|
|
Tian, Yulun | Massachusetts Institute of Technology |
Koppel, Alec | University of Pennsylvania |
Bedi, Amrit Singh | US Army Research Lab |
How, Jonathan Patrick | Massachusetts Institute of Technology |
Keywords: SLAM, Distributed Robot Systems, Multi-Robot Systems
Abstract: We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP), the first asynchronous algorithm for distributed pose graph optimization (PGO) in multi-robot simultaneous localization and mapping. By enabling robots to optimize their local trajectory estimates without synchronization, ASAPP offers resiliency against communication delays and alleviates the need to wait for stragglers in the network. Furthermore, ASAPP can be applied on the rank-restricted relaxations of PGO, a crucial class of non-convex Riemannian optimization problems that underlies recent breakthroughs on globally optimal PGO. Under bounded delay, we establish the global first-order convergence of ASAPP using a sufficiently small stepsize. The derived stepsize depends on the worst-case delay and inherent problem sparsity, and furthermore matches known result for synchronous algorithms when there is no delay. Numerical evaluations on simulated and real-world datasets demonstrate favorable performance compared to state-of-the-art synchronous approach, and show ASAPP’s resilience against a wide range of delays in practice.
|
|
TuAT18 |
Room T18 |
Visual SLAM I |
Regular session |
Chair: Pradalier, Cedric | GeorgiaTech Lorraine |
Co-Chair: Scherer, Sebastian | Carnegie Mellon University |
|
10:00-10:15, Paper TuAT18.1 | |
>TartanAir: A Dataset to Push the Limits of Visual SLAM |
> Video Attachment
|
|
Wang, Wenshan | Carnegie Mellon University |
Zhu, Delong | The Chinese University of Hong Kong |
Wang, Xiangwei | Tongji University |
Hu, Yaoyu | Carnegie Mellon University |
Qiu, Yuheng | Carnegie Mellon University |
Wang, Chen | Carnegie Mellon University |
Hu, Yafei | Carnegie Mellon University |
Kapoor, Ashish | MicroSoft |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: SLAM, Visual Learning, Localization
Abstract: We present a challenging dataset, the TartanAir, for robot navigation tasks and more. The data is collected in photo-realistic simulation environments with the presence of moving objects, changing light and various weather conditions. By collecting data in simulations, we are able to obtain multi-modal sensor data and precise ground truth labels such as the stereo RGB image, depth image, segmentation, optical flow, camera poses, and LiDAR point cloud. We set up large numbers of environments with various styles and scenes, covering challenging viewpoints and diverse motion patterns that are difficult to achieve by using physical data collection platforms. In order to enable data collection at such a large scale, we develop an automatic pipeline, including mapping, trajectory sampling, data processing, and data verification. We evaluate the impact of various factors on visual SLAM algorithms using our data. The results of state-of-the-art algorithms reveal that the visual SLAM problem is far from solved. Methods that show good performance on established datasets such as KITTI do not perform well in more difficult scenarios. Although we use the simulation, our goal is to push the limits of Visual SLAM algorithms in the real world by providing a challenging benchmark for testing new methods, while also using a large diverse training data for learning-based methods. Our dataset is available at http://theairlab.org/tartanair-dataset.
|
|
10:15-10:30, Paper TuAT18.2 | |
>From Points to Planes - Adding Planar Constraints to Monocular SLAM Factor Graphs |
> Video Attachment
|
|
Arndt, Charlotte | Robert Bosch GmbH, Corporate Sector Research and Advance Enginee |
Sabzevari, Reza | Robert Bosch GmbH, Corporate Sector Research and Advance Enginee |
Civera, Javier | Universidad De Zaragoza |
Keywords: SLAM, Mapping
Abstract: Planar structures are common in man-made environments. Their addition to monocular SLAM algorithms is of relevance in order to achieve more complete and higherlevel scene representations. Also, the additional constraints they introduce might reduce the estimation errors in certain situations. In this paper we present a novel formulation to incorporate plane landmarks and planar constraints to feature-based monocular SLAM. Specifically, we enforce in-plane points to lie exactly in the plane they belong to, propagating such information to the rest of the states. Our formulation, differently from the state of the art, allows us to incorporate general planes, independently of depth information or CNN segmentation being available (although we could also use them). We evaluate our method in several sequences of public databases, showing accurate plane estimations and pose accuracy on par with state-of-the-art point-only monocular SLAM.
|
|
10:30-10:45, Paper TuAT18.3 | |
>Robust Monocular Edge Visual Odometry through Coarse-To-Fine Data Association |
|
Wu, Xiaolong | Georgia Institute of Technology |
Vela, Patricio | Georgia Institute of Technology |
Pradalier, Cedric | GeorgiaTech Lorraine |
Keywords: SLAM, Localization, Mapping
Abstract: This work describes a monocular visual odometry framework, which exploits the best attributes of edge features for illumination-robust camera tracking, while at the same time ameliorating the performance degradation of edge mapping. In the front-end, an ICP-based edge registration provides robust motion estimation and coarse data association under lighting changes. In the back-end, a novel edge-guided data association pipeline searches for the best photometrically matched points along geometrically possible edges through template matching, so that the matches can be further refined in later bundle adjustment. The core of our proposed data association strategy lies in a point-to-edge geometric uncertainty analysis, which analytically derives (1) a probabilistic search length formula that significantly reduces the search space and (2) a geometric confidence metric for mapping degradation detection based on the predicted depth uncertainty. Moreover, a match confidence based patch size adaption strategy is integrated into our pipeline to reduce matching ambiguity. We present extensive analysis and evaluation of our proposed system on synthetic and real-world benchmark datasets under the influence of illumination changes and large camera motions, where our proposed system outperforms current state-of-art algorithms.
|
|
10:45-11:00, Paper TuAT18.4 | |
>SaD-SLAM: A Visual SLAM Based on Semantic and Depth Information |
|
Yuan, Xun | University of Science and Technology of China |
Chen, Song | University of Science and Technology of China |
Keywords: SLAM
Abstract: Simultaneous Localization and Mapping (SLAM) is considered significant for intelligent mobile robot autonomous pathfinding. Over the past years, many successful SLAM systems have been developed and worked satisfactorily in static environments. However, in some dynamic scenes containing moving objects, the camera pose estimation error would be unacceptable, or the systems even lose their locations. In this paper, we present SaD-SLAM, a visual SLAM system that, building on ORB-SLAM2, achieves excellent performance in dynamic environments. With the help of semantic and depth information, we find out feature points that belong to movable objects. And we detect whether those feature points are keeping still at the moment. To make the system perform accurately and robustly in dynamic scenes, we use both feature points extracted from static objects and static feature points derived from movable objects to finetune the camera pose estimation. We evaluate our algorithm in TUM RGB-D datasets. The results demonstrate the absolute trajectory accuracy of SaD-SLAM can be improved significantly compared with the original ORB-SLAM2. We also compare our algorithm with DynaSLAM and DS-SLAM, which are designed to fit dynamic scenes.
|
|
11:00-11:15, Paper TuAT18.5 | |
>Exploit Semantic and Public Prior Information in MonoSLAM |
|
Ye, Chenxi | University College London |
Wang, Yiduo | University of Oxford |
Lu, Ziwen | University College London |
Gilitschenski, Igor | Massachusetts Institute of Technology |
Parsley, Martin Peter | University College London |
Julier, Simon | University College London |
Keywords: SLAM, Semantic Scene Understanding, Visual-Based Navigation
Abstract: In this paper, we propose a method to use semantic information to improve the use of map priors in a sparse, feature-based MonoSLAM system. To incorporate the priors, the features in the prior and SLAM maps must be associated with one another. Most existing systems build a map using SLAM and then align it with the prior map. However, this approach assumes that the local map is accurate, and the majority of the features within it can be constrained by the prior. We use the intuition that many prior maps are created to provide semantic information. Therefore, valid associations only exist if the features in the SLAM map arise from the same kind of semantic object as the prior map. Using this intuition, we extend ORB-SLAM2 using an open source pre-trained semantic segmentation network (DeepLabV3+) to incorporate prior information from Open Street Map building footprint data. We show that the amount of drift, before loop closing, is significantly smaller than that for original ORB-SLAM2. Furthermore, we show that when ORB-SLAM2 is used as a prior-aided visual odometry system, the tracking accuracy is equal to or better than the full ORB-SLAM2 system without the need for global mapping or loop closure.
|
|
11:15-11:30, Paper TuAT18.6 | |
>Dual-SLAM: A Framework for Robust Single Camera Navigation |
> Video Attachment
|
|
Huang, Huajian | The Hong Kong University of Science and Technology |
Lin, Wen-Yan | Singapore Management University |
Liu, Siying | Institute for Infocomm Research, Singapore |
Zhang, Dong | Sun Yat-Sen University |
Yeung, Sai-Kit | Hong Kong University of Science and Technology |
Keywords: SLAM, Autonomous Vehicle Navigation
Abstract: SLAM (Simultaneous Localization And Mapping) seeks to provide a moving agent with real-time self-localization. To achieve real-time speed, SLAM incrementally propagates position estimates. This makes SLAM fast but also makes it vulnerable to local pose estimation failures. As local pose estimation is ill-conditioned, local pose estimation failures happen regularly, making the overall SLAM system brittle. This paper attempts to correct this problem. We note that while local pose estimation is ill-conditioned, pose estimation over longer sequences is well-conditioned. Thus, local pose estimation errors eventually manifest themselves as mapping inconsistencies. When this occurs, we save the current map and activate two new SLAM threads. One processes incoming frames to create a new map and the other, recovery thread, backtracks to link new and old maps together. This creates a Dual-SLAM framework that maintains real-time performance while being robust to local pose estimation failures. Evaluation on benchmark datasets show Dual-SLAM can reduce failures by a dramatic 88%.
|
|
TuAT19 |
Room T19 |
Visual SLAM II |
Regular session |
Chair: Tombari, Federico | Technische Universität München |
Co-Chair: Kerr, Dermot | University of Ulster |
|
10:00-10:15, Paper TuAT19.1 | |
>Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints |
> Video Attachment
|
|
Jau, You-Yi | University of California San Diego |
Zhu, Rui | University of California San Diego |
Su, Hao | UCSD |
Chandraker, Manmohan | University of California, San Diego |
Keywords: SLAM, Deep Learning for Visual Perception, Localization
Abstract: Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry (VO) and simultaneous localization and mapping (SLAM), where classic methods consisting of hand-crafted features and sampling-based outlier rejection have been a dominant choice for over a decade. Although multiple works propose to replace these modules with learning-based counterparts, most have not yet been as accurate, robust and generalizable as conventional methods. In this paper, we design an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective. We show both quantitatively and qualitatively that pose estimation performance may be achieved on par with the classic pipeline. Moreover, we are able to show by end-to-end training, the key components of the pipeline could be significantly improved, which leads to better generalizability to unseen datasets compared to existing learning-based methods.
|
|
10:15-10:30, Paper TuAT19.2 | |
>DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features |
> Video Attachment
|
|
Li, Dongjiang | Beijing Jiaotong University |
Shi, Xuesong | Intel |
Long, Qiwei | Beijingjiaotong University |
Liu, Shenghui | Intel Corporation |
Yang, Wei | Beijing Jiaotong University, School of Electronic and Information |
Wang, Fangshi | Beijing Jiaotong University |
Wei, Qi | Tsinghua University |
Qiao, Fei | Tsinghua University |
Keywords: SLAM, Localization
Abstract: A robust and efficient Simultaneous Localization and Mapping (SLAM) system is essential for robot autonomy. For visual SLAM algorithms, though the theoretical framework has been well established for most aspects, feature extraction and association is still empirically designed in most cases, and can be vulnerable in complex environments. This paper shows that feature extraction with deep convolutional neural networks (CNNs) can be seamlessly incorporated into a modern SLAM framework. The proposed SLAM system utilizes a state-of-the-art CNN to detect keypoints in each image frame, and to give not only keypoint descriptors, but also a global descriptor of the whole image. These local and global features are then used by different SLAM modules, resulting in much more robustness against environmental changes and viewpoint changes compared with using hand-crafted features. We also train a visual vocabulary of local features with a Bag of Words (BoW) method. Based on the local features, global features, and the vocabulary, a highly reliable loop closure detection method is built. Experimental results show that all the proposed modules significantly outperforms the baseline, and the full system achieves much lower trajectory errors and much higher correct rates on all evaluated data. Furthermore, by optimizing the CNN with Intel OpenVINO toolkit and utilizing the Fast BoW library, the system benefits greatly from the SIMD (single-instruction-multiple-data) techniques in modern CPUs. The full system can run in real-time without any GPU or other accelerators. The code is public at https://github.com/ivipsourcecode/dxslam.
|
|
10:30-10:45, Paper TuAT19.3 | |
>EAO-SLAM: Monocular Semi-Dense Object SLAM Based on Ensemble Data Association |
> Video Attachment
|
|
Wu, Yanmin | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Zhu, Delong | The Chinese University of Hong Kong |
Feng, Yonghui | Northeastern University |
Coleman, Sonya | University of Ulster |
Kerr, Dermot | University of Ulster |
Keywords: SLAM, Computer Vision for Automation, Perception for Grasping and Manipulation
Abstract: Object-level data association and pose estimation play a fundamental role in semantic SLAM, which remain unsolved due to the lack of robust and accurate algorithms. In this work, we propose an ensemble data associate strategy for integrating the parametric and nonparametric statistic tests. By exploiting the nature of different statistics, our method can effectively aggregate the information of different measurements, and thus significantly improve the robustness and accuracy of data association. We then present an accurate object pose estimation framework, in which an outliers-robust centroid and scale estimation algorithm and an object pose initialization algorithm are developed to help improve the optimality of pose estimation results. Furthermore, we build a SLAM system that can generate semi-dense or lightweight object-oriented maps with a monocular camera. Extensive experiments are conducted on three publicly available datasets and a real scenario. The results show that our approach significantly outperforms state-of-the-art techniques in accuracy and robustness. The source code is available on https://github.com/yanmin-wu/EAO-SLAM.
|
|
10:45-11:00, Paper TuAT19.4 | |
>Dynamic Object Tracking and Masking for Visual SLAM |
> Video Attachment
|
|
Vincent, Jonathan | Université De Sherbrooke |
Labbé, Mathieu | Université De Sherbrooke |
Lauzon, Jean-Samuel | Université De Sherbrooke |
Grondin, Francois | Massachusetts Institute of Technology |
Comtois-Rivet, Pier-Marc | Institut Du Vehicule Innovant |
Michaud, Francois | Universite De Sherbrooke |
Keywords: SLAM, Mapping
Abstract: In dynamic environments, performance of visual SLAM techniques can be impaired by visual features taken from moving objects. One solution is to identify those objects so that their visual features can be removed for localization and mapping. This paper presents a simple and fast pipeline that uses deep neural networks, extended Kalman filters and visual SLAM to improve both localization and mapping in dynamic environments (around 14 fps on a GTX 1080). Results on the dynamic sequences from the TUM dataset using RTAB-Map as visual SLAM suggest that the approach achieves similar localization performance compared to other state-of-the-art methods, while also providing the position of the tracked dynamic objects, a 3D map free of those dynamic objects, better loop closure detection with the whole pipeline able to run on a robot moving at moderate speed.
|
|
11:00-11:15, Paper TuAT19.5 | |
>Structure-SLAM: Low-Drift Monocular SLAM in Indoor Environments |
|
Li, Yanyan | Technical University of Munich |
Brasch, Nikolas | Technical University of Munich |
Wang, Yida | Technical University of Munich |
Navab, Nassir | TU Munich |
Tombari, Federico | Technische Universität München |
Keywords: SLAM, Visual Tracking
Abstract: In this paper a low-drift monocular SLAM method is proposed targeting indoor scenarios, where monocular SLAM often fails due to the lack of textured surfaces. Our approach decouples rotation and translation estimation of the tracking process to reduce the long-term drift in indoor environments. In order to take full advantage of the available geometric information in the scene, surface normals are predicted by a convolutional neural network from each input RGB image in real-time. First, a drift-free rotation is estimated based on lines and surface normals using spherical mean-shift clustering, leveraging the weak Manhattan World assumption. Then translation is computed from point and line features. Finally, the estimated poses are refined with a map-to-frame optimization strategy. The proposed method outperforms the state of the art on common SLAM benchmarks such as ICL-NUIM and TUM RGB-D.
|
|
11:15-11:30, Paper TuAT19.6 | |
>Comparing Visual Odometry Systems in Actively Deforming Simulated Colon Environments |
> Video Attachment
|
|
Fulton, Mitchell | University of Colorado at Boulder |
Prendergast, Joseph Micah | University of Colorado at Boulder |
DiTommaso, Emily Rose | University of Colorado Boulder |
Rentschler, Mark | University of Colorado at Boulder |
Keywords: SLAM, Localization, Computer Vision for Medical Robotics
Abstract: This paper presents a new open-source dataset with ground truth position in a simulated colon environment to promote development of real-time feedback systems for physicians performing colonoscopies. Four systems (DSO, LSD-SLAM, SfMLearner, ORB-SLAM2) are tested on this dataset and their failures are analyzed. A data collection platform was fabricated and used to take the dataset in a colonoscopy training simulator that was affixed to a flat surface. The noise in the ground truth positional data induced from the metal in the data collection platform was then characterized and corrected. The Absolute Trajectory RMSE Error (ATE) and Relative Error (RE) metrics were performed on each of the sequences in the dataset for each of the Simultaneous Localization And Mapping (SLAM) systems. While these systems all had good performance in idealized conditions, more realistic conditions in the harder sequences caused them to produce poor results or fail completely. These failures will be a hindrance to physicians in a real-world scenario, so future systems made for this environment must be more robust to the difficulties found in the colon, even at the expense of trajectory accuracy. The authors believe that this is the first open-source dataset with groundtruth data displaying a simulated in vivo environment with active deformation, and that this is the first step toward achieving useful SLAM within the colon. The dataset is available at www.colorado.edu/lab/amtl/datasets.
|
|
TuAT20 |
Room T20 |
Visual SLAM III |
Regular session |
Chair: Ila, Viorela | The University of Sydney |
Co-Chair: Indelman, Vadim | Technion - Israel Institute of Technology |
|
10:00-10:15, Paper TuAT20.1 | |
>Speed and Memory Efficient Dense RGB-D SLAM in Dynamic Scenes |
> Video Attachment
|
|
Canovas, Bruce | GIPSA-Lab |
Rombaut, Michele | Universite Grenoble Alpes |
Pellerin, Denis | Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-Lab |
Negre, Amaury | Cnrs Gipsa Lab |
Keywords: SLAM, Mapping, RGB-D Perception
Abstract: Real-time dense 3D localization and mapping systems are required to enable robotics platforms to interact in and with their environments. Several solutions have used surfel representations to model the world. While they produce impressive results, they require heavy and costly hardware to operate properly. Many of them are also limited to static environments and small inter-frame motions. Whereas most of the state of the art approaches focus on the accuracy of the reconstruction, we assume that many robotics applications do not require a high resolution level in the rebuilt surface and can benefit from a less accurate but less expensive map, so as to gain in run-time and memory efficiency. In this paper we propose a fast RGB-D SLAM articulated around a rough and lightweight 3D representation for dense compact mapping in dynamic indoor environment, targeting mainstream computing platforms. A simple and fast formulation to detect and filter out dynamic elements is also presented. We show the robustness of our system, its low memory requirement and the good performance it enables.
|
|
10:15-10:30, Paper TuAT20.2 | |
>DUI-VIO: Depth Uncertainty Incorporated Visual Inertial Odometrybased on an RGB-D Camera |
> Video Attachment
|
|
Zhang, He | Virginia Commonwealth University |
Ye, Cang | Virginia Commonwealth University |
Keywords: SLAM, Service Robotics, RGB-D Perception
Abstract: This paper presents a new RGB-D-camera-based visual-inertial odometry (VIO), termed IDU-VIO, for estimating the motion state of the camera. First, a Gaussian mixture model (GMM) to is employed to model the uncertainty of the depth data for each pixel on the camera’s color image. Second, the uncertainties are incorporated into the VIO’s initialization and optimization processes to make the state estimate more accurate. In order to perform the initialization process, we propose a hybrid-perspective-n-point (PnP) method to compute the pose change between two camera frames and use the result to triangulate the depth for an initial set of visual features whose depth values are unavailable from the camera. Hybrid-PnP first uses a 2D-2D PnP algorithm to compute rotation so that more visual features may be used to obtain a more accurate rotation estimate. It then uses a 3D-2D scheme to compute translation by taking into account the uncertainties of depth data, resulting in a more accurate translation estimate. The more accurate pose change estimated by Hybrid-PnP help to improve the initialization result and thus the VIO performance in state estimation. In addition, Hybrid-PnP make it possible to compute the pose change by using a small number of features with a known depth. This improves the reliability of the initialization process. Finally, IDU-VIO incorporates the uncertainties of the inverse depth measurements into the nonlinear optimization process, leading to a reduced state estimation error. Experimental results validate that the proposed IDU-VIO method outperforms the state-of-the-art VIO methods in terms of accuracy and reliability.
|
|
10:30-10:45, Paper TuAT20.3 | |
>Probabilistic Qualitative Localization and Mapping |
|
Mor, Roee | Technion - Israel Institute of Technology |
Indelman, Vadim | Technion - Israel Institute of Technology |
Keywords: Autonomous Vehicle Navigation, Mapping, SLAM
Abstract: Simultaneous localization and mapping (SLAM) is essential in numerous robotics applications such as autonomous navigation. Traditional SLAM approaches infer the metric state of the robot along with a metric map of the environment. While existing algorithms exhibit good results, they are still sensitive to measurement noise, sensors quality, data association and are still computationally expensive. Alternatively, we note that some navigation and mapping missions can be achieved using only qualitative geometric information, an approach known as qualitative spatial reasoning (QSR). In this work we contribute a novel probabilistic qualitative localization and mapping ap- proach, which extends the state of the art by inferring also the qualitative state of the camera poses (localization), as well as incorporating probabilistic connections between views (in time and in space). Our method is in particular appealing in scenarios with a small number of salient landmarks and sparse landmark tracks. We evaluate our approach in simulation and in a real-world dataset, and show its superior performance and low complexity compared to state of the art.
|
|
10:45-11:00, Paper TuAT20.4 | |
>Robust Ego and Object 6-DoF Motion Estimation and Tracking |
> Video Attachment
|
|
Zhang, Jun | Australian National University |
Henein, Mina | Australian National University |
Mahony, Robert | Australian National University |
Ila, Viorela | The University of Sydney |
Keywords: SLAM, RGB-D Perception, Visual Tracking
Abstract: The problem of tracking self-motion as well as motion of objects in the scene using information from a camera is known as multi-body visual odometry and is a challenging task. This paper proposes a robust solution to achieve accurate estimation and consistent track-ability for dynamic multi- body visual odometry. A compact and effective framework is proposed leveraging recent advances in semantic instance-level segmentation and accurate optical flow estimation. A novel formulation, jointly optimizing SE(3) motion and optical flow is introduced that improves the quality of the tracked points and the motion estimation accuracy. The proposed approach is evaluated on the virtual KITTI Dataset and tested on the real KITTI Dataset, demonstrating its applicability to autonomous driving applications. For the benefit of the community, we make the source code public.
|
|
11:00-11:15, Paper TuAT20.5 | |
>SeqSphereVLAD: Sequence Matching Enhanced Orientation-Invariant Place Recognition |
> Video Attachment
|
|
Yin, Peng | Carnegie Mellon University |
Wang, Fuying | Tsinghua University |
Egorov, Anton | Skolkovo Institute of Science and Technology |
Hou, Jiafan | The Chinese University of Hong Kong, Shenzhen |
Zhang, Ji | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Keywords: SLAM, Mapping, Recognition
Abstract: Human beings and animals are capable of recognizing places from a previous journey when viewing them under different environmental conditions (e.g., illuminations and weathers). This paper seeks to provide robots with a human-like place recognition ability using a new point cloud feature learning method. This is a challenging problem due to the difficulty of extracting invariant local descriptors from the same place under various orientation differences and dynamic obstacles. In this paper, we propose a novel lightweight 3D place recognition method, SeqSphereVLAD, which is capable of recognizing places from a previous trajectory regardless of the viewpoint and the temporary observation differences. The major contributions of our method lie in two modules: (1) the spherical convolution feature extraction module, which produces orientation-invariant local place descriptors, and (2) the coarse-to-fine sequence matching module, which ensures both accurate loop-closure detection and real-time performance. Despite the apparent simplicity, our proposed approach outperform the state-of-the-arts for place recognition under datasets that combine orientation and context differences. Compared with the arts, our method can achieve above 95% average recall for the best match with only 18% inference time of PointNet-based place recognition methods.
|
|
11:15-11:30, Paper TuAT20.6 | |
>Online Visual Place Recognition Via Saliency Re-Identification |
> Video Attachment
|
|
Wang, Han | Nanyang Technological University |
Wang, Chen | Carnegie Mellon University |
Xie, Lihua | NanyangTechnological University |
Keywords: SLAM, Computer Vision for Other Robotic Applications, Recognition
Abstract: As an essential component of visual simultaneous localization and mapping (SLAM), place recognition is crucial for robot navigation and autonomous driving. Existing methods often formulate visual place recognition as feature matching, which is computationally expensive for many robotic applications with limited computing power, e.g., autonomous driving. Inspired by the fact that human beings always recognize a place by remembering salient regions or objects that are more attractive or interesting than others, we formulate visual place recognition as saliency re-identification, which is natural and straightforward. In order to reduce computational cost, we propose to perform both saliency detection and re-identification in frequency domain, in which all operations become element-wise. The experiments show that our proposed method achieves competitive accuracy and much higher speed than the state-of-the-art feature-based methods. The proposed method is open-sourced.
|
|
TuAT21 |
Room T21 |
SLAM |
Regular session |
Chair: Steckel, Jan | University of Antwerp |
Co-Chair: Mangelson, Joshua | Brigham Young University |
|
10:00-10:15, Paper TuAT21.1 | |
>ARAS: Ambiguity-Aware Robust Active SLAM Based on Multi-Hypothesis State and Map Estimations |
> Video Attachment
|
|
Hsiao, Ming | Carnegie Mellon University |
Mangelson, Joshua | Brigham Young University |
Suresh, Sudharshan | Carnegie Mellon University |
Debrunner, Chris | Lockheed Martin |
Kaess, Michael | Carnegie Mellon University |
Keywords: SLAM, Mapping, Motion and Path Planning
Abstract: In this paper, we introduce an ambiguity-aware robust active SLAM (ARAS) framework that makes use of multi-hypothesis state and map estimations to achieve better robustness. Ambiguous measurements can result in multiple probable solutions in a multi-hypothesis SLAM (MH-SLAM) system if they are temporarily unsolvable (due to insufficient information), our ARAS aims at taking all these probable estimations into account explicitly for decision making and planning, which, to the best of our knowledge, has not yet been covered by any previous active SLAM approach (which mostly consider a single hypothesis at a time). This novel ARAS framework 1) adopts local contours for efficient multihypothesis exploration, 2) incorporates an active loop closing module that revisits mapped areas to acquire information for hypotheses pruning to maintain the overall computational efficiency, and 3) demonstrates how to use the output target pose for path planning under the multi-hypothesis estimations. Through extensive simulations and a real-world experiment, we demonstrate that the proposed ARAS algorithm can actively map general indoor environments more robustly than a similar single-hypothesis approach in the presence of ambiguities.
|
|
10:15-10:30, Paper TuAT21.2 | |
>On-Plate Localization and Mapping for an Inspection Robot Using Ultrasonic Guided Waves: A Proof of Concept |
> Video Attachment
|
|
Pradalier, Cedric | GeorgiaTech Lorraine |
Ouabi, Othmane-Latif | Umi 2958 Gt-Cnrs |
Pomarede, Pascal | GeorgiaTech Lorraine |
Steckel, Jan | University of Antwerp |
Keywords: SLAM, Industrial Robots, Probability and Statistical Methods
Abstract: This paper presents a proof-of-concept for a localization and mapping system for magnetic crawlers performing inspection tasks on structures made of large metal plates. By relying on ultrasonic guided waves reflected from the plate edges, we demonstrate that it is possible to recover the plate geometry and robot trajectory to a precision comparable to the signal wavelength. The approach is tested using real acoustic signals acquired on test metal plates using lawn-mower paths and random-walks. To the contrary of related works, this paper focuses on the practical details of the localization and mapping algorithm.
|
|
10:30-10:45, Paper TuAT21.3 | |
>Plug-And-Play SLAM: A Unified SLAM Architecture for Modularity and Ease of Use |
> Video Attachment
|
|
Colosi, Mirco | Sapienza, University of Rome |
Aloise, Irvin | Sapienze University of Rome |
Guadagnino, Tiziano | Sapienza University of Rome |
Schlegel, Dominik | Sapienza - University of Rome |
Della Corte, Bartolomeo | Sapienza University of Rome |
Arras, Kai Oliver | Bosch Research |
Grisetti, Giorgio | Sapienza University of Rome |
Keywords: SLAM, Mapping
Abstract: Simultaneous Localization and Mapping (SLAM) is considered a mature research field with numerous applications and publicly available open-source systems. Despite this maturity, existing SLAM systems often rely on ad-hoc implementations or are tailored to predefined sensor setups. In this work, we tackle these issues, proposing a novel unified SLAM architecture specifically designed to standardize the SLAM problem and to address heterogeneous sensor configurations. Thanks to its modularity and design patterns, the presented framework is easy to extend, maximizes code reuse and improves computational efficiency. We show in our experiments with a variety of typical sensor configurations that these advantages come without compromising state-of-the-art SLAM performance. The result demonstrates the architecture’s relevance for facilitating further research in (multi-sensor) SLAM and its transfer into practical applications.
|
|
10:45-11:00, Paper TuAT21.4 | |
>Majorization Minimization Methods for Distributed Pose Graph Optimization with Convergence Guarantees |
|
Fan, Taosha | Northwestern University |
Murphey, Todd | Northwestern University |
Keywords: SLAM, Mapping, Optimization and Optimal Control
Abstract: In this paper, we consider the problem of distributed pose graph optimization (PGO) that has extensive applications in multi-robot simultaneous localization and mapping (SLAM). We propose majorization minimization methods for distributed PGO and show that our methods are guaranteed to converge to first-order critical points under mild conditions. Furthermore, since our methods rely a proximal operator of distributed PGO, the convergence rate can be significantly accelerated with Nesterov's method, and more importantly, the acceleration induces no compromise of convergence guarantees. In addition, we also present accelerated majorization minimization methods for the distributed chordal initialization that have a quadratic convergence, which can be used to compute an initial guess for distributed PGO. The efficacy of this work is validated through applications on a number of 2D and 3D SLAM datasets and comparisons with existing state-of-the-art methods, which indicates that our methods have faster convergence and result in better solutions to distributed PGO.
|
|
11:00-11:15, Paper TuAT21.5 | |
>Variational Filtering with Copula Models for SLAM |
|
Martin, John D. | Stevens Institute of Technology |
Doherty, Kevin | Massachusetts Institute of Technology |
Cyr, Caralyn | Stevens Institute of Technology |
Englot, Brendan | Stevens Institute of Technology |
Leonard, John | MIT |
Keywords: SLAM, Localization
Abstract: The ability to infer map variables and estimate pose is crucial to the operation of autonomous mobile robots. In most cases the shared dependency between these variables is modeled through a multivariate Gaussian distribution, but there are many situations where that assumption is unrealistic. Our paper shows how it is possible to relax this assumption and perform simultaneous localization and mapping (SLAM) with a larger class of distributions, whose multivariate dependency is represented with a copula model. We integrate the distribution model with copulas into a Sequential Monte Carlo estimator and show how unknown model parameters can be learned through gradient-based optimization. We demonstrate our approach is effective in settings where Gaussian assumptions are clearly violated, such as environments with uncertain data association and nonlinear transition models.
|
|
11:15-11:30, Paper TuAT21.6 | |
>Cluster-Based Penalty Scaling for Robust Pose Graph Optimization |
|
Wu, Fang | Ecole Polytechnique De Montreal |
Beltrame, Giovanni | Ecole Polytechnique De Montreal |
Keywords: SLAM, Mapping
Abstract: Robust pose graph optimization is essential for reliable pose estimation in Simultaneous Localization and Mapping (SLAM) system. Due to the nature of loop closures, even one spurious measurement could trick the SLAM estimator and severely distort the mapping results. Existing methods to avoid this problem mostly focus on ensuring local measurement consistency by evaluating measurements independently, often requiring parameters that are difficult to tune. This paper proposes a cluster-based penalty scaling (CPS) method to ensure both the local and global consistency by first evaluating the edge quality locally, and then integrating this information into the optimization formulation.
|
|
11:15-11:30, Paper TuAT21.7 | |
>A Theory of Fermat Paths for 3D Imaging Sonar Reconstruction |
|
Westman, Eric | Carnegie Mellon University |
Gkioulekas, Ioannis | Carnegie Mellon University |
Kaess, Michael | Carnegie Mellon University |
Keywords: Marine Robotics, Mapping, Field Robots
Abstract: In this work, we present a novel method for reconstructing particular 3-D surface points using an imaging sonar sensor. We derive the two-dimensional Fermat flow equation, which may be applied to the planes defined by each discrete azimuth angle in the sonar image. We show that the Fermat flow equation applies to boundary points and surface points which correspond to specular reflections within the 2-D plane defined by their azimuth angle measurement. The Fermat flow equation can be used to resolve the 2-D location of these surface points within the plane, and therefore also their full 3-D location. This is achieved by translating the sensor to estimate the spatial gradient of the range measurement. This method does not rely on the precise image intensity values or the reflectivity of the imaged surface to solve for the surface point locations. We demonstrate the effectiveness of our proposed method by reconstructing 3-D object points on both simulated and real-world datasets.
|
|
TuAT22 |
Room T22 |
Sensor Fusion for SLAM |
Regular session |
Chair: Atanasov, Nikolay | University of California, San Diego |
Co-Chair: Nakamura, Yoshihiko | University of Tokyo |
|
10:00-10:15, Paper TuAT22.1 | |
>Tightly-Coupled Fusion of Global Positional Measurements in Optimization-Based Visual-Inertial Odometry |
|
Cioffi, Giovanni | University of Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: SLAM, Sensor Fusion
Abstract: Motivated by the goal of achieving robust, drift-free pose estimation in long-term autonomous navigation, in this work we propose a methodology to fuse global positional information with visual and inertial measurements in a tightly-coupled nonlinear-optimization–based estimator. Differently from previous works, which are loosely-coupled, the use of a tightly-coupled approach allows exploiting the correlations amongst all the measurements. A sliding window of the most recent system states is estimated by minimizing a cost function that includes visual re-projection errors, relative inertial errors, and global positional residuals. We use IMU preintegration to formulate the inertial residuals and leverage the outcome of such algorithm to efficiently compute the global position residuals. The experimental results show that the proposed method achieves accurate and globally consistent estimates, with negligible increase of the optimization computational cost. Our method consistently outperforms the loosely-coupled fusion approach. The mean position error is reduced up to 50% with respect to the loosely-coupled approach in Unmanned Aerial Vehicle (UAV) flights with distance travelled of about 1 km, where the global position information is given by noisy GPS measurements. To the best of our knowledge, this is the first work where global positional measurements are tightly fused in an optimization-based visual-inertial odometry (VIO) algorithm, leveraging the IMU preintegration method to define the global positional factors.
|
|
10:15-10:30, Paper TuAT22.2 | |
>GR-SLAM: Vision-Based Sensor Fusion SLAM for Ground Robots on Complex Terrain |
> Video Attachment
|
|
Su, Yun | Shenyang Institute of Automation |
Wang, Ting | Robotics Lab., Shenyang Institute of Automation, CAS |
Yao, Chen | Shenyang Institute of Automation, Chinese Academy of Sciences |
Shao, Shiliang | SIA |
Wang, Zhidong | Chiba Institute of Technology |
Keywords: SLAM, Sensor Fusion, Visual-Based Navigation
Abstract: In recent years, many excellent SLAM methods based on cameras, especially the camera-IMU fusion (VIO), have emerged, which has greatly improved the accuracy and robustness of SLAM. However, we find through experiments that most of the existing VIO methods perform well on drones or drone datasets, but for ground robots on complex terrain, they cannot continuously provide accurate and robust localization results. Some researchers have proposed methods for ground robots, but most of them have limited applications due to the assumption of plane motion. Therefore, this paper proposes GR-SLAM for the localization of ground robots on complex terrain, which can fuse camera, IMU, and encoder data in a tightly coupled scheme to provide accurate and robust state estimation for robots. First, an odometer increment model is proposed, which can fuse the encoder and IMU data to calculate the robot pose increment on manifold, and calculate the frame constraints through the pre-integrated increment. Then we propose an evaluation algorithm for multi-sensor measurements, which can detect abnormal data and adjust its optimization weight. Finally, we implement a complete factor graph optimization framework based on sliding window, which can tightly couple camera, IMU, and encoder data to perform state estimation. Extensive experiments are conducted based on a real ground robot and the results show that GR-SLAM can provide accurate and robust state estimation for ground robots.
|
|
10:30-10:45, Paper TuAT22.3 | |
>OrcVIO: Object Residual Constrained Visual-Inertial Odometry |
> Video Attachment
|
|
Shan, Mo | University of California San Diego |
Feng, Qiaojun | University of California, San Diego |
Atanasov, Nikolay | University of California, San Diego |
Keywords: SLAM, Semantic Scene Understanding, Object Detection, Segmentation and Categorization
Abstract: Introducing object-level semantic information into simultaneous localization and mapping (SLAM) system is critical. It not only improves the performance but also enables tasks specified in terms of meaningful objects. This work presents OrcVIO, for visual-inertial odometry tightly coupled with tracking and optimization over structured object models. OrcVIO differentiates through semantic feature and bounding-box reprojection errors to perform batch optimization over the pose and shape of objects. The estimated object states aid in real-time incremental optimization over the IMU-camera states. The ability of OrcVIO for accurate trajectory estimation and large-scale object-level mapping is evaluated using real data.
|
|
10:45-11:00, Paper TuAT22.4 | |
>LIC-Fusion 2.0: LiDAR-Inertial-Camera Odometry with Sliding-Window Plane-Feature Tracking |
> Video Attachment
|
|
Zuo, Xingxing | Zhejiang University |
Yang, Yulin | University of Delaware |
Geneva, Patrick | University of Delaware |
Lv, Jiajun | Zhejiang University |
Liu, Yong | Zhejiang University |
Huang, Guoquan (Paul) | University of Delaware |
Pollefeys, Marc | ETH Zurich |
Keywords: Sensor Fusion, Localization, SLAM
Abstract: Multi-sensor fusion of multi-modal measurements from commodity inertial, visual and LiDAR sensors to provide robust and accurate 6DOF pose estimation holds great potential in robotics and beyond. In this paper, building upon our prior work (i.e., LIC-Fusion), we develop a sliding-window filter based LiDAR-Inertial-Camera odometry with online spatiotemporal calibration (i.e., LIC-Fusion 2.0), which introduces a novel sliding-window plane-feature tracking for efficiently processing 3D LiDAR point clouds. In particular, after motion compensation for LiDAR points by leveraging IMU data, low-curvature planar points are extracted and tracked across the sliding window. A novel outlier rejection criteria is proposed in the plane-feature tracking for high quality data association. Only the tracked planar points belonging to the same plane will be used for plane initialization, which makes the plane extraction efficient and robust. Moreover, we perform the observability analysis for the IMU-LiDAR subsystem under consideration and report the degenerate cases for spatiotemporal calibration using plane features. While the estimation consistency and identified degenerate motions are validated in Monte-Carlo simulations, different real-world experiments are also conducted to show that the proposed LIC-Fusion 2.0 outperforms its predecessor and other state-of-the-art methods.
|
|
11:00-11:15, Paper TuAT22.5 | |
>Leveraging Planar Regularities for Point Line Visual-Inertial Odometry |
> Video Attachment
|
|
Li, Xin | Peking University |
He, Yijia | Institute of Automation, Chinese Academy of Sciences |
Lin, Jinlong | Peking University |
Liu, Xiao | Megvii Technology Inc |
Keywords: SLAM, Mapping, Sensor Fusion
Abstract: With monocular Visual-Inertial Odometry (VIO) system, 3D point cloud and camera motion can be estimated simultaneously. Because pure sparse 3D points provide a structureless representation of the environment, generating 3D mesh from sparse points can further model the environment topology and produce dense mapping. To improve the accuracy of 3D mesh generation and localization, we propose a tightly-coupled monocular VIO system, PLP-VIO, which exploits point features and line features as well as plane regularities. The co-planarity constraints are used to leverage additional structure information for the more accurate estimation of 3D points and spatial lines in state estimator. To detect plane and 3D mesh robustly, we combine both the line features with point features in the detection method. The effectiveness of the proposed method is verified on both synthetic data and public datasets and is compared with other state-of-the-art algorithms.
|
|
11:15-11:30, Paper TuAT22.6 | |
>SplitFusion: Simultaneous Tracking and Mapping for Non-Rigid Scenes |
> Video Attachment
|
|
Li, Yang | The University of Tokyo |
Zhang, Tianwei | The University of Tokyo |
Nakamura, Yoshihiko | University of Tokyo |
Harada, Tatsuya | The University of Tokyo |
Keywords: Mapping, SLAM, Localization
Abstract: We present SplitFusion, a novel dense RGBD SLAM framework that simultaneously performs tracking and volumetric reconstruction for both rigid and non-rigid components of the scene. SplitFusion first adopts deep learning based semantic instant segmentation technique to split the scene into rigid or non-rigid geometric surfaces. The split surfaces are independently tracked via rigid or non-rigid ICP and reconstructed through incremental depth map volumetric fusion. Experimental results show that the proposed approach can provide not only accurate environment maps but also well reconstructed non-rigid targets, e.g., the moving humans.
|
|
TuAT23 |
Room T23 |
Range SLAM |
Regular session |
Chair: Wang, Sen | Edinburgh Centre for Robotics, Heriot-Watt University |
Co-Chair: Tan, U-Xuan | Singapore University of Techonlogy and Design |
|
10:00-10:15, Paper TuAT23.1 | |
>LIO-SAM: Tightly-Coupled Lidar Inertial Odometry Via Smoothing and Mapping |
|
Shan, Tixiao | Massachusetts Institute of Technology |
Englot, Brendan | Stevens Institute of Technology |
Meyers, Drew | MIT |
Wang, Wei | Massachusetts Institute of Technology |
Ratti, Carlo | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: Sensor Fusion, Range Sensing, Mapping
Abstract: We propose a framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, that achieves highly accurate, real-time mobile robot trajectory estimation and map-building. LIO-SAM formulates lidar-inertial odometry atop a factor graph, allowing a multitude of relative and absolute measurements, including loop closures, to be incorporated from different sources as factors into the system. The estimated motion from inertial measurement unit (IMU) pre-integration de-skews point clouds and produces an initial guess for lidar odometry optimization. The obtained lidar odometry solution is used to estimate the bias of the IMU. To ensure high performance in real-time, we marginalize old lidar scans for pose optimization, rather than matching lidar scans to a global map. Scan-matching at a local scale instead of a global scale significantly improves the real-time performance of the system, as does the selective introduction of keyframes, and an efficient sliding window approach that registers a new keyframe to a fixed-size set of prior "sub-keyframes." The proposed method is extensively evaluated on datasets gathered from three platforms over various scales and environments.
|
|
10:15-10:30, Paper TuAT23.2 | |
>LiTAMIN: LiDAR Based Tracking and MappINg by Stabilized ICP for Geometry Approximation with Normal Distributions |
> Video Attachment
|
|
Yokozuka, Masashi | Nat. Inst. of Advanced Industrial Science and Technology |
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Oishi, Shuji | National Institute of Advanced Industrial Science and Technology |
Banno, Atsuhiko | National Instisute of Advanced Industrial Science and Technology |
Keywords: SLAM, Mapping, Localization
Abstract: This paper proposes a 3D LiDAR SLAM method that improves accuracy, robustness and computational efficiency for iterative closest point (ICP) employed locally approximated geometry with clusters of normal distributions. In comparison with previous normal distribution based ICP methods, such as NDT and GICP, our ICP method is simply stabilized with normalization of the cost function by Frobenius norm and regulalization of covariance matrix. The previous methods are stabilized with pricipal component analysis (PCA), whose computational cost is higher than our method. Moreover, our SLAM method can reduce the effect of the wrong loop closure constraints. Exeprimental results show that our SLAM method has advantages against open source state-of-the-art methods that are LOAM, LeGO-LOAM and hdl graph slam.
|
|
10:30-10:45, Paper TuAT23.3 | |
>GOSMatch: Graph-Of-Semantics Matching for Detecting Loop Closures in 3D LiDAR Data |
|
Zhu, Yachen | Sun Yat-Sen University |
Ma, Yanyang | Sun Yat-Sen University |
Chen, Long | Sun Yat-Sen University |
Liu, Cong | Sun Yat-Sen University |
Ye, Maosheng | Wuhan University |
Li, Lingxi | Indiana University-Purdue University Indianapolis |
Keywords: SLAM, Localization
Abstract: Detecting loop closures in 3D Light Detection and Ranging (LiDAR) data is a challenging task since point-level methods always suffer from instability. This paper presents a semantic-level approach named GOSMatch to perform reliable place recognition. Our method leverages novel descriptors, which are generated from the spatial relationship between semantics, to perform frame description and data association. We also propose a coarse-to-fine strategy to efficiently search for loop closures. Besides, GOSMatch can give an accurate 6-DOF initial pose estimation once a loop closure is confirmed. Extensive experiments have been conducted on the KITTI odometry dataset and the results show that GOSMatch can achieve robust loop closure detection performance and outperform existing methods.
|
|
10:45-11:00, Paper TuAT23.4 | |
>Seed: A Segmentation-Based Egocentric 3D Point Cloud Descriptor for Loop Closure Detection |
|
Fan, Yunfeng | Singapore University of Technology and Design |
He, Yichang | SUTD |
Tan, U-Xuan | Singapore University of Techonlogy and Design |
Keywords: SLAM, Mapping
Abstract: Place recognition is essential for SLAM system since it is critical for loop closure and can help to correct the accumulated drift and result in a globally consistent map. Unlike the visual slam which can use diverse feature detection methods to describe the scene, there are limited works reported to represent a place using single LiDAR scan. In this paper, we propose a segmentation-based egocentric descriptor termed emph{Seed} by using a single LiDAR scan to describe the scene. Through the segmentation approach, we first obtain different segmented objects, which can reduce the noise and resolution effect, making it more robust. Then, the topological information of the segmented objects is encoded into the descriptor. Unlike other reported approaches, the proposed method is rotation invariant and insensitive to translation variation. The feasibility of proposed method is evaluated through the KITTI dataset and the results show that the proposed method outperforms the state-of-the-art method in terms of accuracy.
|
|
11:00-11:15, Paper TuAT23.5 | |
>RadarSLAM: Radar Based Large-Scale SLAM in All Weathers |
> Video Attachment
|
|
Hong, Ziyang | Heriot-Watt University |
Petillot, Yvan R. | Heriot-Watt University |
Wang, Sen | Edinburgh Centre for Robotics, Heriot-Watt University |
Keywords: SLAM, Localization, Mapping
Abstract: Numerous Simultaneous Localization and Mapping (SLAM) algorithms have been presented in last decade using different sensor modalities. However, robust SLAM in extreme weather conditions is still an open research problem. In this paper, RadarSLAM, a full radar based graph SLAM system, is proposed for reliable localization and mapping in large-scale environments. It is composed of pose tracking, local mapping, loop closure detection and pose graph optimization, enhanced by novel feature matching and probabilistic point cloud generation on radar images. Extensive experiments are conducted on a public radar dataset and several self-collected radar sequences, demonstrating the state-of-the-art reliability and localization accuracy in various adverse weather conditions, such as dark night, dense fog and heavy snowfall.
|
|
11:15-11:30, Paper TuAT23.6 | |
>GP-SLAM+: Real-Time 3D Lidar SLAM Based on Improved Regionalized Gaussian Process Map Reconstruction |
> Video Attachment
|
|
Ruan, Jianyuan | Zhejiang University |
Li, Bo | Zhejiang University |
Wang, Yinqiang | Zhejiang University |
Fang, Zhou | Zhejiang University |
Keywords: SLAM, Mapping, Localization
Abstract: This paper presents a 3D lidar SLAM system based on improved regionalized Gaussian process (GP) map reconstruction to provide both low-drift state estimation and mapping in real-time for robotics applications. We utilize spatial GP regression to model the environment. This tool enables us to recover surfaces including those in sparsely scanned areas and obtain uniform samples with uncertainty. Those properties facilitate robust data association and map updating in our scan-to-map registration scheme, especially when working with sparse range data. Compared with previous GP-SLAM, this work overcomes the prohibitive computational complexity of GP and redesigns the registration strategy to meet the accuracy requirements in 3D scenarios. For large-scale tasks, a two-thread framework is employed to suppress the drift further. Aerial and ground-based experiments demonstrate that our method allows robust odometry and precise mapping in real-time. It also outperforms the state-of-the-art lidar SLAM systems in our tests with light-weight sensors.
|
|
TuBT1 |
Room T1 |
Imitation Learning I |
Regular session |
Chair: Stone, Peter | University of Texas at Austin |
Co-Chair: Taniguchi, Tadahiro | Ritsumeikan University |
|
11:45-12:00, Paper TuBT1.1 | |
>Domain-Adversarial and -Conditional State Space Model for Imitation Learning |
> Video Attachment
|
|
Okumura, Ryo | Panasonic Corporation |
Okada, Masashi | Panasonic Corporation |
Taniguchi, Tadahiro | Ritsumeikan University |
Keywords: Imitation Learning, Representation Learning, Model Learning for Control
Abstract: State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning. Without these states, imitation learning is hampered by domain-dependent information useless for control. However, existing methods fail to remove such disturbances from the states when the data from experts and agents show large domain shifts. To overcome this issue, we propose a domain-adversarial and -conditional state space model (DAC-SSM) that enables control systems to obtain domain-agnostic and task- and dynamics-aware states. DAC-SSM jointly optimizes the state inference, observation reconstruction, forward dynamics, and reward models. To remove domain-dependent information from the states, the model is trained with domain discriminators in an adversarial manner, and the reconstruction is conditioned on domain labels. We experimentally evaluated the model predictive control performance via imitation learning for continuous control of sparse reward tasks in simulators and compared it with the performance of the existing SRL method. The agents from DAC-SSM achieved performance comparable to experts and more than twice the baselines. We conclude domain-agnostic states are essential for imitation learning that has large domain shifts and can be obtained using DAC-SSM.
|
|
12:00-12:15, Paper TuBT1.2 | |
>Planning on the Fast Lane: Learning to Interact Using Attention Mechanisms in Path Integral Inverse Reinforcement Learning |
> Video Attachment
|
|
Rosbach, Sascha | Volkswagen AG |
Li, Xing | Volkswagen AG |
Grossjohann, Simon | Volkswagen AG |
Homoceanu, Silviu | Volkswagen AG |
Roth, Stefan | TU Darmstadt |
Keywords: Learning from Demonstration, Motion and Path Planning, Imitation Learning
Abstract: General-purpose trajectory planning algorithms for automated driving utilize complex reward functions to perform a combined optimization of strategic, behavioral, and kinematic features. The specification and tuning of a single reward function is a tedious task and does not generalize over a large set of traffic situations. Deep learning approaches based on path integral inverse reinforcement learning have been successfully applied to predict local situation-dependent reward functions using features of a set of sampled driving policies. Sample-based trajectory planning algorithms are able to approximate a spatio-temporal subspace of feasible driving policies that can be used to encode the context of a situation. However, the interaction with dynamic objects requires an extended planning horizon, which depends on sequential context modeling. In this work, we are concerned with the sequential reward prediction over an extended time horizon. We present a neural network architecture that uses a policy attention mechanism to generate a low-dimensional context vector by concentrating on trajectories with a human-like driving style. Apart from this, we propose a temporal attention mechanism to identify context switches and allow for stable adaptation of rewards. We evaluate our results on complex simulated driving situations, including other moving vehicles. Our evaluation shows that our policy attention mechanism learns to focus on collision-free policies in the configuration space. Furthermore, the temporal attention mechanism learns persistent interaction with other vehicles over an extended planning horizon.
|
|
12:15-12:30, Paper TuBT1.3 | |
>A Geometric Perspective on Visual Imitation Learning |
> Video Attachment
|
|
Jin, Jun | University of Alberta |
Petrich, Laura | University of Alberta |
Dehghan, Masood | University of Alberta |
Jagersand, Martin | University of Alberta |
Keywords: Visual Learning, Imitation Learning, Visual Servoing
Abstract: We consider the problem of visual imitation learning without human kinesthetic teaching or teleoperation, nor access to an interactive reinforcement learning training environment. We present a geometric perspective to this problem where geometric feature correspondences are learned from one training video and used to execute tasks via visual servoing. Specifically, we propose VGS-IL (Visual Geometric Skill Imitation Learning), an end-to-end geometry-parameterized task concept inference method, to infer globally consistent geometric feature association rules from human demonstration video frames. We show that, instead of learning actions from image pixels, learning a geometry-parameterized task concept provides an explainable and invariant representation across demonstrator to imitator under various environmental settings. Moreover, such a task concept representation provides a direct link with geometric vision based controllers (e.g. visual servoing), allowing for efficient mapping of high-level task concepts to low-level robot actions.
|
|
12:30-12:45, Paper TuBT1.4 | |
>RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration |
> Video Attachment
|
|
Pavse, Brahma | University of Texas at Austin |
Torabi, Faraz | University of Texas at Austin |
Hanna, Josiah | The University of Texas at Austin |
Warnell, Garrett | U.S. Army Research Laboratory |
Stone, Peter | University of Texas at Austin |
Keywords: Imitation Learning, Reinforecment Learning
Abstract: Augmenting reinforcement learning with imitation learning is often hailed as a method by which to improve upon learning from scratch. However, most existing methods for integrating these two techniques are subject to several strong assumptions---chief among them that information about demonstrator actions is available. In this paper, we investigate the extent to which this assumption is necessary by introducing and evaluating reinforced inverse dynamics modeling (RIDM), a novel paradigm for combining imitation from observation (IfO) and reinforcement learning with no dependence on demonstrator action information. Moreover, RIDM requires only a single demonstration trajectory and is able to operate directly on raw (unaugmented) state features. We find experimentally that RIDM performs favorably compared to a baseline approach for several tasks in simulation as well as for tasks on a real UR5 robot arm. Experiment videos can be found at https://sites.google.com/view/ridm-reinforced-inverse-dynami.
|
|
12:45-13:00, Paper TuBT1.5 | |
>Learn by Observation: Imitation Learning for Drone Patrolling from Videos of a Human Navigator |
|
Fan, Yue | Johns Hopkins University |
Chu, Shilei | Shandong Univ |
Zhang, Wei | Shandong University |
Song, Ran | Shandong University |
Li, Yibin | Shandong University |
Keywords: Imitation Learning, Deep Learning for Visual Perception, Autonomous Vehicle Navigation
Abstract: We present an imitation learning method for autonomous drone patrolling based only on raw videos. Different from previous methods, we propose to let the drone learn patrolling in the air by observing and imitating how a human navigator does it on the ground. The observation process enables the automatic collection and annotation of data using inter-frame geometric consistency, resulting in less manual effort and high accuracy. Then a newly designed neural network is trained based on the annotated data to predict appropriate directions and translations for the drone to patrol in a lane-keeping manner as humans. Our method allows the drone to fly at a high altitude with a broad view and low risk. It can also detect all accessible directions at crossroads and further carry out the integration of available user instructions and autonomous patrolling control commands. Extensive experiments are conducted to demonstrate the accuracy of the proposed imitating learning process as well as the reliability of the holistic system for autonomous drone navigation. The codes, datasets as well as video demonstrations are available at https://vsislab.github.io/uavpatrol.
|
|
13:00-13:15, Paper TuBT1.6 | |
>Imitation Learning Based on Bilateral Control for Human–Robot Cooperation |
> Video Attachment
|
|
Sasagawa, Ayumu | Saitama University |
Fujimoto, Kazuki | Saitama University |
Sakaino, Sho | University of Tsukuba |
Tsuji, Toshiaki | Saitama University |
Keywords: Imitation Learning, Cognitive Human-Robot Interaction, Manipulation Planning
Abstract: Robots are required to autonomously respond to changing situations. Imitation learning is a promising candidate for achieving generalization performance, and extensive results have been demonstrated in object manipulation. However, cooperative work between humans and robots is still a challenging issue because robots must control dynamic interactions among themselves, humans, and objects. Furthermore, it is difficult to follow subtle perturbations that may occur among coworkers. In this study, we find that cooperative work can be accomplished by imitation learning using bilateral control. Thanks to bilateral control, which can extract response values and command values independently, human skills to control dynamic interactions can be extracted. Then, the task of serving food is considered. The experimental results clearly demonstrate the importance of force control, and the dynamic interactions can be controlled by the inferred action force.
|
|
TuBT2 |
Room T2 |
Imitation Learning II |
Regular session |
Chair: Urain De Jesus, Julen | TU Darmstadt |
Co-Chair: Kolathaya, Shishir | Indian Institute of Science |
|
11:45-12:00, Paper TuBT2.1 | |
>Multi-Instance Aware Localization for End-To-End Imitation Learning |
> Video Attachment
|
|
Gubbi Venkatesh, Sagar | Indian Institute of Science |
Upadrashta, Raviteja | Indian Institute of Science |
Kolathaya, Shishir | Indian Institute of Science |
Amrutur, Bharadwaj | Indian Institute of Science |
Keywords: Imitation Learning, Localization, Learning from Demonstration
Abstract: Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map output of the vision layers with an embedding that can indicate instance preference or take advantage of an implicit preference present in the expert demonstrations, and (b) employing an autoregressive action generator network for the control layers. The proposed architecture for localization has improved accuracy and sample efficiency and can generalize to the presence of more instances of objects than seen during training. When used for end-to-end imitation learning to perform reach, push, and pick-and-place tasks on a real robot, training is achieved with as few as 15 expert demonstrations.
|
|
12:00-12:15, Paper TuBT2.2 | |
>ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows |
|
Urain De Jesus, Julen | TU Darmstadt |
Ginesi, Michele | University of Verona |
Tateo, Davide | Technische Universität Darmstadt |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Learning from Demonstration, Novel Deep Learning Methods, Motion Control
Abstract: We introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, stochastic, nonlinear dynamics. Our approach extends the Normalizing Flows framework to learn stable Stochastic Differential Equations. We prove the Lyapunov stability for a class of Stochastic Differential Equations and we propose a learning algorithm to learn them from a set of demonstrated trajectories. Our model extends the set of stable dynamical systems that can be represented by state-of-the-art approaches, eliminates the Gaussian assumption on the demonstrations, and outperforms the previous algorithms in terms of representation accuracy. We show the effectiveness of our method with both standard datasets and a real robot experiment.
|
|
12:15-12:30, Paper TuBT2.3 | |
>Standard Deep Generative Models for Density Estimation in Configuration Spaces: A Study of Benefits, Limits and Challenges |
|
Gieselmann, Robert | KTH Royal Institute of Technology |
Pokorny, Florian T. | KTH Royal Institute of Technology |
Keywords: Imitation Learning, Motion and Path Planning, Probability and Statistical Methods
Abstract: Deep Generative Models such as Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE) have found multiple applications in Robotics, with recent works suggesting the potential use of these methods as a generic solution for the estimation of sampling distributions for motion planning in parameterized sets of environments. In this work we provide a first empirical study of challenges, benefits and drawbacks of utilizing vanilla GANs and VAEs for the approximation of probability distributions arising from sampling-based motion planner path solutions. We present an evaluation on a sequence of simulated 2D configuration spaces of increasing complexity and a 4D planar robot arm scenario and find that vanilla GANs and VAEs both outperform classical statistical estimation by an n-dimensional histogram in our chosen scenarios. We furthermore highlight differences in convergence and noisiness between the trained models and propose and study a benchmark sequence of planar C-space environments parameterized by opened or closed doors. In this setting, we find that the chosen geometrical embedding of the parameters of the family of considered C-spaces is a key performance contributor that relies heavily on human intuition about C-space structure at present. We discuss some of the challenges of parameter selection and convergence for applying this approach with an out-of-the box GAN and VAE model.
|
|
12:30-12:45, Paper TuBT2.4 | |
>Progressive Automation of Periodic Tasks on Planar Surfaces of Unknown Pose with Hybrid Force/position Control |
> Video Attachment
|
|
Dimeas, Fotios | Aristotle University of Thessaloniki |
Doulgeri, Zoe | Aristotle University of Thessaloniki |
Keywords: Learning from Demonstration
Abstract: This paper presents a teaching by demonstration method for contact tasks with periodic movement on planar surfaces of unknown pose. To learn the motion on the plane, we utilize frequency oscillators with periodic movement primitives and we propose modified adaptation rules along with an extraction method of the task's fundamental frequency by automatically discarding near-zero frequency components. Additionally, we utilize an online estimate of the normal vector to the plane, so that the robot is able to quickly adapt to rotated hinged surfaces such as a window or a door. Using the framework of progressive automation for compliance adaptation, the robot transitions seamlessly and bi-directionally between hand guidance and autonomous operation within few repetitions of the task. While the level of automation increases, a hybrid force/position controller is progressively engaged for the autonomous operation of the robot. Our methodology is verified experimentally in surfaces of different orientation, with the robot being able to adapt to surface orientation perturbations.
|
|
TuBT3 |
Room T3 |
Model Learning I |
Regular session |
Chair: Kelly, Jonathan | University of Toronto |
Co-Chair: Mouret, Jean-Baptiste | Inria |
|
11:45-12:00, Paper TuBT3.1 | |
>Learning Hybrid Object Kinematics for Efficient Hierarchical Planning under Uncertainty |
> Video Attachment
|
|
Jain, Ajinkya | University of Texas at Austin |
Niekum, Scott | University of Texas at Austin |
Keywords: Model Learning for Control, Learning from Demonstration, Manipulation Planning
Abstract: Sudden changes in the dynamics of robotic tasks, such as contact with an object or the latching of a door, are often viewed as inconvenient discontinuities that make manipulation difficult. However, when these transitions are well-understood, they can be leveraged to reduce uncertainty or aid manipulation---for example, wiggling a screw to determine if it is fully inserted or not. Current model-free reinforcement learning approaches require large amounts of data to learn to leverage such dynamics, scale poorly as problem complexity grows, and do not transfer well to significantly different problems. By contrast, hierarchical POMDP planning-based methods scale well via plan decomposition, work well on novel problems, and directly consider uncertainty, but often rely on precise hand-specified models and task decompositions. To combine the advantages of these opposing paradigms, we propose a new method, MICAH, which given unsegmented data of an object's motion under applied actions, (1) detects changepoints in the object motion model using action-conditional inference, (2) estimates the individual local motion models with their parameters, and (3) converts them into a hybrid automaton that is compatible with hierarchical POMDP planning. We show that model learning under MICAH is more accurate and robust to noise than prior approaches. Further, we combine MICAH with a hierarchical POMDP planner to demonstrate that the learned models are rich enough to be used for performing manipulation tasks under uncertainty that require the objects to be used in novel ways not encountered during training.
|
|
12:00-12:15, Paper TuBT3.2 | |
>Learning State-Dependent Losses for Inverse Dynamics Learning |
|
Morse, Kristen | Facebook AI Research |
Das, Neha | Facebook |
Lin, Yixin | Facebook AI Research |
Wang, Austin S. | Carnegie Mellon University |
Rai, Akshara | Facebook AI Research |
Meier, Franziska | Facebook |
Keywords: Model Learning for Control, Novel Deep Learning Methods, Transfer Learning
Abstract: Being able to quickly adapt to changes in dynamics is paramount in model-based control for object manipulation tasks. In order to influence fast adaptation of the inverse dynamics model's parameters, data efficiency is crucial. Given observed data, a key element to how an optimizer updates model parameters is the loss function. In this work, we propose to apply meta-learning to learn structured, state-dependent loss functions during a meta-training phase. We then replace standard losses with our learned losses during online adaptation tasks. We evaluate our proposed approach on inverse dynamics learning tasks, both in simulation and on real hardware data. In both settings, the structured and state-dependent learned losses improve online adaptation speed, when compared to standard, state-independent loss functions.
|
|
12:15-12:30, Paper TuBT3.3 | |
>Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors |
> Video Attachment
|
|
Kaushik, Rituraj | INRIA - Nancy Grand Est, France |
Anne, Timothée | ENS Rennes |
Mouret, Jean-Baptiste | Inria |
Keywords: Model Learning for Control, Reinforecment Learning
Abstract: Meta-learning algorithms can accelerate the model-based reinforcement learning (MBRL) algorithms by finding an initial set of parameters for the dynamical model such that the model can be trained to match the actual dynamics of the system with only a few data-points. However, in the real world, a robot might encounter any situation starting from motor failures to finding itself in a rocky terrain where the dynamics of the robot can be significantly different from one another. In this paper, first, we show that when meta-training situations (the prior situations) have such diverse dynamics, using a single set of meta-trained parameters as a starting point still requires a large number of observations from the real system to learn a useful model of the dynamics. Second, we propose an algorithm called FAMLE that mitigates this limitation by meta-training several initial starting points (i.e., initial parameters) for training the model and allows robots to select the most suitable starting point to adapt the model to the current situation with only a few gradient steps. We compare FAMLE to MBRL, MBRL with a meta-trained model with MAML, and model-free policy search algorithm PPO for various simulated and real robotic tasks, and show that FAMLE allows robots to adapt to novel damages in significantly fewer time-steps than the baselines.
|
|
12:30-12:45, Paper TuBT3.4 | |
>Heteroscedastic Uncertainty for Robust Generative Latent Dynamics |
|
Limoyo, Oliver | University of Toronto |
Chan, Bryan | University of Toronto |
Maric, Filip | University of Toronto Institute for Aerospace Studies |
Wagstaff, Brandon | University of Toronto |
Mahmood, Ashique Rupam | Kindred Inc |
Kelly, Jonathan | University of Toronto |
Keywords: Representation Learning, Model Learning for Control, Reinforecment Learning
Abstract: Learning or identifying dynamics from a sequence of high-dimensional observations is a difficult challenge in many domains, including reinforcement learning and control. The problem has recently been studied from a generative perspective through latent dynamics, where the high-dimensional observations are embedded into a lower-dimensional space in which the dynamics can be learned. Despite some successes, latent dynamics models have not yet been applied to real-world robotic systems where learned representations must be robust to a variety of perceptual confounds and noise sources not seen during training. In this paper, we present a method to jointly learn a latent state representation and the associated dynamics that is amenable for long-term planning and closed-loop control under perceptually difficult conditions. As our main contribution, we describe how our representation is able to capture a notion of heteroscedastic or input-specific uncertainty at test time by detecting novel or out-of-distribution (OOD) inputs. We present results from prediction and control experiments on two image-based tasks: a simulated pendulum balancing task and a real-world robotic manipulator reaching task. We demonstrate that our model produces significantly more accurate predictions and exhibits improved control performance, compared to a model that assumes homoscedastic uncertainty only, in the presence of varying degrees of input degradation.
|
|
12:45-13:00, Paper TuBT3.5 | |
>Multi-Robot Active Sensing and Environmental Model Learning with Distributed Gaussian Process |
> Video Attachment
|
|
Jang, Dohyun | Seoul National University |
Yoo, Jaehyun | Hankyong National University |
Son, Clark Youngdong | Seoul National University |
Kim, Dabin | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Multi-Robot Systems, Distributed Robot Systems, Networked Robots
Abstract: This paper deals with the problem of multiple robots working together to explore and gather at the global maximum of the unknown field. Given noisy sensor measurements obtained at the location of robots with no prior knowledge about the environmental map, Gaussian process regression can be an efficient solution to construct a map that represents spatial information with confidence intervals. However, because the conventional Gaussian process algorithm operates in a centralized manner, it is difficult to process information coming from multiple distributed sensors in real-time. In this work, we propose a multi-robot exploration algorithm that deals with the following challenges: 1) distributed environmental map construction using networked sensing platforms; 2) online learning using successive measurements suitable for a multi-robot team; 3) multi-agent coordination to discover the highest peak of an unknown environmental field with collision avoidance. We demonstrate the effectiveness of our algorithm via simulation and a topographic survey experiment with multiple UAVs.
|
|
13:00-13:15, Paper TuBT3.6 | |
>Gaussians on Riemannian Manifolds: Applications for Robot Learning and Adaptive Control (I) |
|
Calinon, Sylvain | Idiap Research Institute |
|
|
TuBT4 |
Room T4 |
Model Learning II |
Regular session |
Chair: Posner, Ingmar | Oxford University |
Co-Chair: Boularias, Abdeslam | Rutgers University |
|
11:45-12:00, Paper TuBT4.1 | |
>Self-Adapting Recurrent Models for Object Pushing from Learning in Simulation |
> Video Attachment
|
|
Cong, Lin | University of Hamburg |
Görner, Michael | University of Hamburg |
Ruppel, Philipp | University of Hamburg |
Liang, Hongzhuo | University of Hamburg |
Hendrich, Norman | University of Hamburg |
Zhang, Jianwei | University of Hamburg |
Keywords: Model Learning for Control, Reinforecment Learning, AI-Based Methods
Abstract: Planar pushing remains a challenging research topic, where building the dynamic model of the interaction is the core issue. Even an accurate analytical dynamic model is inherently unstable because physics parameters such as inertia and friction can only be approximated. Data-driven models usually rely on large amounts of training data, but data collection is time consuming when working with real robots. In this paper, we collect all training data in a physics simulator and build an LSTM-based model to fit the pushing dynamics. Domain Randomization is applied to capture the pushing trajectories of a generalized class of objects. When executed on the real robot, the trained recursive model adapts to the tracked object's real dynamics within a few steps. We propose the algorithm Recurrent Model Predictive Path Integral (RMPPI) as a variation of traditional MPPI approach, employing state-dependent recurrent models. As a comparison, we also train a Deep Deterministic Policy Gradient (DDPG) network as a model-free baseline, which is also used as the action generator in the data collection phase. During policy training, Hindsight Experience Replay is used to improve exploration efficiency. Pushing experiments on our UR5 platform demonstrate the model's adaptability and the effectiveness of the proposed framework.
|
|
12:00-12:15, Paper TuBT4.2 | |
>A Probabilistic Model for Planar Sliding of Objects with Unknown Material Properties: Identification and Robust Planning |
> Video Attachment
|
|
Song, Changkyu | Rutgers University |
Boularias, Abdeslam | Rutgers University |
Keywords: Model Learning for Control, Manipulation Planning, Probability and Statistical Methods
Abstract: This paper introduces a new technique for learning probabilistic models of mass and friction distributions of unknown objects, and performing robust sliding actions by using the learned models. The proposed method is executed in two consecutive phases. In the exploration phase, a table-top object is poked by a robot from different angles. The observed motions of the object are compared against simulated motions with various hypothesized mass and friction models. The simulation-to-reality gap is then differentiated with respect to the unknown mass and friction parameters, and the analytically computed gradient is used to optimize those parameters. Since it is difficult to disentangle the mass from the friction coefficients in low-data and quasi-motion regimes, our approach retains a set of locally optimal pairs of mass and friction models. A probability distribution on the models is computed based on the relative accuracy of each pair of models. In the exploitation phase, a probabilistic planner is used to select a goal configuration and waypoints that are stable with a high confidence. The proposed technique is evaluated on real objects and using a real manipulator. The results show that this technique can not only identify accurately mass and friction coefficients of non-uniform heterogeneous objects, but can also be used to successfully slide an unknown object to the edge of a table and pick it up from there, without any human assistance or feedback.
|
|
12:15-12:30, Paper TuBT4.3 | |
>Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction |
> Video Attachment
|
|
Nematollahi, Iman | University of Freiburg |
Mees, Oier | Albert-Ludwigs-Universität |
Hermann, Lukas | University of Freiburg |
Burgard, Wolfram | Toyota Research Institute |
Keywords: Model Learning for Control, Representation Learning, Novel Deep Learning Methods
Abstract: A key challenge for an agent learning to interact with the world is to reason about physical properties of objects and to foresee their dynamics under the effect of applied forces. In order to scale learning through interaction to many objects and scenes, robots should be able to improve their own performance from real-world experience without requiring human supervision. To this end, we propose a novel approach for modeling the dynamics of a robot’s interactions directly from unlabeled 3D point clouds and images. Unlike previous approaches, our method does not require ground-truth data associations provided by a tracker or any pre-trained perception network. To learn from unlabeled real-world interaction data, we enforce consistency of estimated 3D clouds, actions and 2D images with observed ones. Our joint forward and inverse network learns to segment a scene into salient object parts and predicts their 3D motion under the effect of applied actions. Moreover, our object-centric model outputs action-conditioned 3D scene flow, object masks and 2D optical flow as emergent properties. Our extensive evaluation both in simulation and with real-world data demonstrates that our formulation leads to effective, interpretable models that can be used for visuomotor control and planning. Videos, code and dataset are available at http://hind4sight.cs.uni-freiburg.de
|
|
12:30-12:45, Paper TuBT4.4 | |
>Multi-Sparse Gaussian Process: Learning Based Semi-Parametric Control |
> Video Attachment
|
|
Khan, Mouhyemen | Georgia Institute of Technology |
Patel, Akash | Georgia Institute of Technology |
Chatterjee, Abhijit | Georgia Institute of Technology |
Keywords: Model Learning for Control, Aerial Systems: Mechanics and Control
Abstract: A key challenge with controlling complex dynamical systems is to accurately model them. However, this requirement is very hard to satisfy in practice. Data-driven approaches such as Gaussian processes (GPs) have proved quite effective by employing regression based methods to capture the unmodeled dynamical effects. However, GPs scale cubically with number of data points n, and it is often a challenge to perform real-time regression. In this paper, we propose a semi-parametric framework exploiting sparsity for learning-based control. We combine the parametric model of the system with multiple sparse GP models to capture any unmodeled dynamics. Multi-Sparse Gaussian Process (MSGP) uses multiple sparse models with unique hyperparameters for each one, thereby, preserving the richness and uniqueness of each sparse model. For a query point, a weighted sparse posterior prediction is performed based on N neighboring sparse models. Hence, the prediction complexity is significantly reduced from O(n^3) to O(Npu^2), p and u are data points and pseudo-inputs respectively for each sparse model. We validate MSGP’s learning performance for a quadrotor using a geometric controller in simulation. Comparison with GP, sparse GP, and local GP shows that MSGP has higher prediction accuracy than sparse and local GP, with significantly lower time complexity than all three. We also validate MSGP on a real quadrotor setup for unmodeled mass, inertia, and disturbances. The experiment video can be seen at: https://youtu.be/zUk1ISux6ao.
|
|
12:45-13:00, Paper TuBT4.5 | |
>Decentralized Deep Reinforcement Learning for a Distributed and Adaptive Locomotion Controller of a Hexapod Robot |
> Video Attachment
|
|
Schilling, Malte | Bielefeld University |
Konen, Kai | Neuroinformatics Group, Bielefeld University |
Ohl, Frank | Leibniz Institute for Neurobiology |
Korthals, Timo | Bielefeld University |
Keywords: Multi-legged Robots, Parallel Robots, Reinforecment Learning
Abstract: Locomotion is a prime example for adaptive behavior in animals and biological control principles have inspired control architectures for legged robots. While machine learning has been successfully applied to many tasks in recent years, Deep Reinforcement Learning approaches still appear to struggle when applied to real world robots in continuous control tasks and in particular do not appear as robust solutions that can handle uncertainties well. Therefore, there is a new interest in incorporating biological principles into such learning architectures. While inducing a hierarchical organization as found in motor control has shown already some success, we here propose a decentralized organization as found in insect motor control for coordination of different legs. A decentralized and distributed architecture is introduced on a simulated hexapod robot and the details of the controller are learned through Deep Reinforcement Learning. We first show that such a concurrent local structure is able to learn good walking behavior. Secondly, that the simpler organization is learned faster compared to holistic approaches.
|
|
13:00-13:15, Paper TuBT4.6 | |
>First Steps: Latent-Space Control with Semantic Constraints for Quadruped Locomotion |
> Video Attachment
|
|
Mitchell, Alexander Luis | University of Oxford |
Engelcke, Martin | University of Oxford |
Parker Jones, Oiwi | University of Oxford |
Surovik, David | University of Oxford |
Gangapurwala, Siddhant | University of Oxford |
Melon, Oliwier Aleksander | University of Oxford |
Havoutis, Ioannis | University of Oxford |
Posner, Ingmar | Oxford University |
Keywords: Model Learning for Control
Abstract: Traditional approaches to quadruped control frequently employ simplified, hand-derived models. This significantly reduces the capability of the robot since its effective kinematic range is curtailed. In addition, kinodynamic constraints are often non-differentiable and difficult to implement in an optimisation approach. In this work, these challenges are addressed by framing quadruped control as optimisation in a structured latent space. A deep generative model captures a statistical representation of feasible joint configurations, whilst complex dynamic and terminal constraints are expressed via high-level, semantic indicators and represented by learned classifiers operating upon the latent space. As a consequence, complex constraints are rendered differentiable and evaluated an order of magnitude faster than analytical approaches. We validate the feasibility of locomotion trajectories optimised using our approach both in simulation and on a real-world ANYmal quadruped. Our results demonstrate that this approach is capable of generating smooth and realisable trajectories. To the best of our knowledge, this is the first time latent space control has been successfully applied to a complex, real robot platform.
|
|
TuBT5 |
Room T5 |
Transfer Learning |
Regular session |
Chair: Kroeger, Torsten | Karlsruher Institut Für Technologie (KIT) |
Co-Chair: Johns, Edward | Imperial College London |
|
11:45-12:00, Paper TuBT5.1 | |
>Stir to Pour: Efficient Calibration of Liquid Properties for Pouring Actions |
> Video Attachment
|
|
Lopez-Guevara, Tatiana | University of Edinburgh |
Pucci, Rita | University of Udine |
Taylor, Nicholas K. | Heriot-Watt University |
Gutmann, Michael U. | University of Edinburgh |
Ramamoorthy, Subramanian | The University of Edinburgh |
Subr, Kartic | The University of Edinburgh |
Keywords: Calibration and Identification, Cognitive Control Architectures, Transfer Learning
Abstract: Humans use simple probing actions to develop intuition about the physical behavior of common objects. Such intuition is particularly useful for adaptive estimation of favorable manipulation strategies of those objects in novel contexts. For example, observing the effect of tilt on a transparent bottle containing an unknown liquid provides clues on how the liquid might be poured. It is desirable to equip general-purpose robotic systems with this capability because it is inevitable that they will encounter novel objects and scenarios. In this paper, we teach a robot to use a simple, specified probing strategy --stirring with a stick-- to reduce spillage when pouring unknown liquids. In the probing step, we continuously observe the effects of a real robot stirring a liquid, while simultaneously tuning the parameters to a model (simulator) until the two outputs are in agreement. We obtain optimal simulation parameters, characterizing the unknown liquid, via a Bayesian Optimizer that minimizes the discrepancy between real and simulated outcomes. Then, we optimize the pouring policy conditioning on the optimal simulation parameters determined via stirring. We show that using stirring as a probing strategy result in reduced spillage for three qualitatively different liquids when executed on a UR10 Robot, compared to probing via pouring. Finally, we provide quantitative insights into the reason for stirring being a suitable calibration task for pouring --a step towards automatic discovery of probing strategies.
|
|
12:00-12:15, Paper TuBT5.2 | |
>Haptic Knowledge Transfer between Heterogeneous Robots Using Kernel Manifold Alignment |
|
Tatiya, Gyan | Tufts University |
Shukla, Yash | Worcester Polytechnic Institute |
Edegware, Michael | Tufts University |
Sinapov, Jivko | Tufts University |
Keywords: Transfer Learning, Haptics and Haptic Interfaces, Multi-Robot Systems
Abstract: Humans learn about object properties using multiple modes of perception. Recent advances show that robots can use non-visual sensory modalities (i.e., haptic and tactile sensory data) coupled with exploratory behaviors (i.e., grasping, lifting, pushing, dropping, etc.) for learning objects' properties such as shape, weight, material and affordances. However, non-visual sensory representations cannot be easily transferred from one robot to another, as different robots have different bodies and sensors. Therefore, each robot needs to learn its task-specific sensory models from scratch. To address this challenge, we propose a framework for knowledge transfer using kernel manifold alignment (KEMA) that enables source robots to transfer haptic knowledge about objects to a target robot. The idea behind our approach is to learn a common latent space from multiple robots' feature spaces produced by respective sensory data while interacting with objects. To test the method, we used a dataset in which 3 simulated robots interacted with 25 objects and showed that our framework speeds up haptic object recognition and allows novel object recognition.
|
|
12:15-12:30, Paper TuBT5.3 | |
>Robo-Gym – an Open Source Toolkit for Distributed Deep Reinforcement Learning on Real and Simulated Robots |
> Video Attachment
|
|
Lucchi, Matteo | Joanneum Research |
Zindler, Friedemann | Joanneum Research |
Mühlbacher-Karrer, Stephan | JOANNEUM RESEARCH Forschungsgesellschaft mbH - ROBOTICS |
Pichler, Horst | Joanneum Research Robotics |
Keywords: Reinforecment Learning, Transfer Learning, Software, Middleware and Programming Environments
Abstract: Applying Deep Reinforcement Learning (DRL) to complex tasks in the field of robotics has proven to be very successful in the recent years. However, most of the publications focus either on applying it to a task in simulation or to a task in a real world setup. Although there are great examples of combining the two worlds with the help of transfer learning, it often requires a lot of additional work and fine-tuning to make the setup work effectively. In order to increase the use of DRL with real robots and reduce the gap between simulation and real world robotics, we propose an open source toolkit: robo-gym. We demonstrate a unified setup for simulation and real environments which enables a seamless transfer from training in simulation to application on the robot. We showcase the capabilities and the effectiveness of the framework with two real world applications featuring industrial robots: a mobile robot and a robot arm. The distributed capabilities of the framework enable several advantages like using distributed algorithms, separating the workload of simulation and training on different physical machines as well as enabling the future opportunity to train in simulation and real world at the same time. Finally, we offer an overview and comparison of robo-gym with other frequently used state-of-the-art DRL frameworks.
|
|
12:30-12:45, Paper TuBT5.4 | |
>Crossing the Gap: A Deep Dive into Zero-Shot Sim-To-Real Transfer for Dynamics |
> Video Attachment
|
|
Valassakis, Eugene | Imperial College London |
Ding, Zihan | Imperial College London |
Johns, Edward | Imperial College London |
Keywords: Transfer Learning, Simulation and Animation, Reinforecment Learning
Abstract: Zero-shot sim-to-real transfer of tasks with complex dynamics is a highly challenging and unsolved problem. A number of solutions have been proposed in recent years, but we have found that many works do not present a thorough evaluation in the real world, or underplay the significant engineering effort and task-specific fine tuning that is required to achieve the published results. In this paper, we dive deeper into the sim-to-real transfer challenge, investigate why this is such a difficult problem, and present objective evaluations of a number of transfer methods across a range of real-world tasks. Surprisingly, we found that a method which simply injects random forces into the simulation performs just as well as more complex methods, such as those which randomise the simulator's dynamics parameters, or adapt a policy online using recurrent network architectures.
|
|
12:45-13:00, Paper TuBT5.5 | |
>Tensor Action Spaces for Multi-Agent Robot Transfer Learning |
> Video Attachment
|
|
Schwab, Devin | Carnegie Mellon University |
Zhu, Yifeng | University of Texas, Austin |
Veloso, Manuela | Carnegie Mellon University |
Keywords: Transfer Learning, Reinforecment Learning, Multi-Robot Systems
Abstract: We explore using reinforcement learning on single and multi-agent systems such that after learning is finished we can apply a policy zero-shot to new environment sizes, as well as different number of agents and entities. Building off previous work, we show how to map back and forth between the state and action space of a standard Markov Decision Process (MDP) and multi-dimensional tensors such that zero-shot transfer in these cases is possible. Like in previous work, we use a special network architecture designed to work well with the tensor representation, known as the Fully Convolutional Q-Network (FCQN). We show simulation results that this tensor state and action space combined with the FCQN architecture can learn faster than traditional representations in our environments. We also show that the performance of a transferred policy is comparable to the performance of policy trained from scratch in the modified environment sizes and with modified number of agents and entities. We also show that the zero-shot transfer performance across team sizes and environment sizes remains comparable to the performance of training from scratch specific policies in the transferred environments. Finally, we demonstrate that our simulation trained policies can be applied to real robots and real sensor data with comparable performance to our simulation results. Using such policies we can run variable sized teams of robots in a variable sized operating environment with no changes to the policy and no additional learning necessary.
|
|
13:00-13:15, Paper TuBT5.6 | |
>TrueÆdapt: Learning Smooth Online Trajectory Adaptation with Bounded Jerk, Acceleration and Velocity in Joint Space |
> Video Attachment
|
|
Kiemel, Jonas | Karlsruhe Institute of Technology |
Weitemeyer, Robin | Karlsruhe Institute of Technology |
Meißner, Pascal | University of Aberdeen |
Kroeger, Torsten | Karlsruher Institut Für Technologie (KIT) |
Keywords: Reactive and Sensor-Based Planning, Transfer Learning, Motion and Path Planning
Abstract: We present TrueÆdapt, a model-free method to learn online adaptations of robot trajectories based on their effects on the environment. Given sensory feedback and future waypoints of the original trajectory, a neural network is trained to predict joint accelerations at regular intervals. The adapted trajectory is generated by linear interpolation of the predicted accelerations, leading to continuously differentiable joint velocities and positions. Bounded jerks, accelerations and velocities are guaranteed by calculating the range of valid accelerations at each decision step and clipping the network’s output accordingly. A deviation penalty during the training process causes the adapted trajectory to follow the original one. Smooth movements are encouraged by penalizing high accelerations and jerks. We evaluate our approach by training a simulated KUKA iiwa robot to balance a ball on a plate while moving and demonstrate that the balancing policy can be directly transferred to a real robot.
|
|
TuBT6 |
Room T6 |
Learning from Demonstration |
Regular session |
Chair: Lee, Dongheui | Technical University of Munich |
Co-Chair: Calinon, Sylvain | Idiap Research Institute |
|
11:45-12:00, Paper TuBT6.1 | |
>Active Improvement of Control Policies with Bayesian Gaussian Mixture Model |
|
Girgin, Hakan | EPFL, Idiap Research Institute |
Pignat, Emmanuel | Idiap Research Institute, Martigny, Switzerland |
Jaquier, Noémie | Idiap Research Institute |
Calinon, Sylvain | Idiap Research Institute |
Keywords: Learning from Demonstration, Model Learning for Control, Imitation Learning
Abstract: Learning from demonstration (LfD) is an intuitive framework allowing non-expert users to easily (re-)program robots. However, the quality and quantity of demonstrations have a great influence on the generalization performances of LfD approaches. In this paper, we introduce a novel active learning framework in order to improve the generalization capabilities of control policies. The proposed approach is based on the epistemic uncertainties of Bayesian Gaussian mixture models (BGMMs). We determine the new query point location by optimizing a closed-form information-density cost based on the quadratic R´enyi entropy. Furthermore, to better represent uncertain regions and to avoid local optima problem, we propose to approximate the active learning cost with a Gaussian mixture model (GMM). We demonstrate our active learning framework in the context of a reaching task in a cluttered environment with an illustrative toy example and a real experiment with a Panda robot.
|
|
12:00-12:15, Paper TuBT6.2 | |
>Collaborative Programming of Conditional Robot Tasks |
> Video Attachment
|
|
Willibald, Christoph | German Aerospace Center (DLR) |
Eiband, Thomas | German Aerospace Center (DLR) |
Lee, Dongheui | Technical University of Munich |
Keywords: Learning from Demonstration, Imitation Learning, Human-Centered Robotics
Abstract: Conventional robot programming methods are not suited for non-experts to intuitively teach robots new tasks. For this reason, the potential of collaborative robots for production cannot yet be fully exploited. In this work, we propose an active learning framework, in which the robot and the user collaborate to incrementally program a complex task. Starting with a basic model, the robot’s task knowledge can be extended over time if new situations require additional skills. An on-line anomaly detection algorithm therefore automatically identifies new situations during task execution by monitoring the deviation between measured- and commanded sensor values. The robot then triggers a teaching phase, in which the user decides to either refine an existing skill or demonstrate a new skill. The different skills of a task are encoded in separate probabilistic models and structured in a high-level graph, guaranteeing robust execution and successful transition between skills. In the experiments, our approach is compared to two state-of-the-art Programming by Demonstration frameworks on a real system. Increased intuitiveness and task performance of the method can be shown, allowing shop-floor workers to program industrial tasks with our framework.
|
|
12:15-12:30, Paper TuBT6.3 | |
>Learning Constraint-Based Planning Models from Demonstrations |
|
Loula, João | MIT |
Allen, Kelsey | Massachusetts Institute of Technology |
Silver, Tom | MIT |
Tenenbaum, Joshua | Massachusetts Institute of Technology |
Keywords: Hybrid Logical/Dynamical Planning and Verification, Learning from Demonstration, Deep Learning in Grasping and Manipulation
Abstract: How can we learn representations for planning that are both efficient and flexible? Hybrid models are a good candidate, having been very successful in long-horizon planning tasks—however, they've proved challenging for learning, relying mostly on hand-coded representations. We present a framework for learning constraint-based task and motion planning models using gradient descent. Our model observes expert demonstrations of a task and decomposes them into modes---segments which specify a set of constraints on a trajectory optimization problem. We show that our model learns these modes from few demonstrations, that modes can be used to plan flexibly in different environments and to achieve different types of goals, and that the model can recombine these modes in novel ways.
|
|
12:30-12:45, Paper TuBT6.4 | |
>Learning Object Manipulation with Dexterous Hand-Arm Systems from Human Demonstration |
> Video Attachment
|
|
Ruppel, Philipp | University of Hamburg |
Zhang, Jianwei | University of Hamburg |
Keywords: Learning from Demonstration, Dexterous Manipulation, Deep Learning in Grasping and Manipulation
Abstract: We present a novel learning and control framework that combines artificial neural networks with online trajectory optimization to learn dexterous manipulation skills from human demonstration and to transfer the learned behaviors to real robots. Humans can perform the demonstrations with their own hands and with real objects. An instrumented glove is used to record motions and tactile data. Our system learns neural control policies that generalize to modified object poses directly from limited amounts of demonstration data. Outputs from the neural policy network are combined at runtime with kinematic and dynamic safety and feasibility constraints as well as a learned regularizer to obtain commands for a real robot through online trajectory optimization. We test our approach on multiple tasks and robots.
|
|
12:45-13:00, Paper TuBT6.5 | |
>MixGAIL: Autonomous Driving Using Demonstrations with Mixed Qualities |
> Video Attachment
|
|
Lee, Gunmin | Seoul National University |
Kim, Dohyeong | Seoul National University |
Oh, Wooseok | Seoul National Univetsity |
Lee, Kyungjae | Seoul National University |
Oh, Songhwai | Seoul National University |
Keywords: Imitation Learning, Autonomous Vehicle Navigation, Collision Avoidance
Abstract: In this paper, we consider autonomous driving of a vehicle using imitation learning. Generative adversarial imitation learning (GAIL) is a widely used algorithm for imitation learning. This algorithm leverages positive demonstrations to imitate the behavior of an expert. In this paper, we propose a novel method, called mixed generative adversarial imitation learning (MixGAIL), which incorporates both of expert demonstrations and negative demonstrations, such as vehicle collisions. To this end, the proposed method utilizes an occupancy measure and a constraint function. The occupancy measure is used to follow expert demonstrations and provides a positive feedback. On the other hand, the constraint function is used for negative demonstrations to assert a negative feedback. Experimental results show that the proposed algorithm converges faster than the other baseline methods. Also, hardware experiments using a real-world RC car shows an outstanding performance and faster convergence compared with existing methods.
|
|
13:00-13:15, Paper TuBT6.6 | |
>Driving through Ghosts: Behavioral Cloning with False Positives |
> Video Attachment
|
|
Bühler, Andreas | ETH Zürich |
Gaidon, Adrien | Toyota Research Institute |
Cramariuc, Andrei | ETHZ |
Ambrus, Rares | Toyota Research Institute |
Rosman, Guy | Massachusetts Institute of Technology |
Burgard, Wolfram | Toyota Research Institute |
Keywords: Learning from Demonstration, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: Safe autonomous driving requires robust detection of other traffic participants. However, robust does not mean perfect, and safe systems typically minimize missed detections at the expense of a higher false positive rate. This results in conservative and yet potentially dangerous behavior such as avoiding imaginary obstacles. In the context of behavioral cloning, perceptual errors at training time can lead to learning difficulties or wrong policies, as expert demonstrations might be inconsistent with the perceived world state. In this work, we propose a behavioral cloning approach that can safely leverage imperfect perception without being conservative. Our core contribution is a novel representation of perceptual uncertainty for learning to plan. We propose a new probabilistic birds-eye-view semantic grid to encode the noisy output of object perception systems. We then leverage expert demonstrations to learn an imitative driving policy using this probabilistic representation. Using the CARLA simulator, we show that our approach can safely overcome critical false positives that would otherwise lead to catastrophic failures or conservative behavior.
|
|
TuBT7 |
Room T7 |
Policy Learning |
Regular session |
Chair: Gonzalez, Joseph E. | UC Berkeley |
Co-Chair: Ramos, Fabio | University of Sydney, NVIDIA |
|
11:45-12:00, Paper TuBT7.1 | |
>Proximal Deterministic Policy Gradient |
|
Maggipinto, Marco | University of Padova |
Susto, Gian Antonio | University of Padova |
Chaudhari, Pratik | University of Pennsylvania |
Keywords: Reinforecment Learning
Abstract: This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algorithms to provide an improved action value estimate through bootstrapping with limited increase of computational resources. Further, we demonstrate significant performance improvement over state-of-the-art algorithms on standard continuous-control RL benchmarks.
|
|
12:00-12:15, Paper TuBT7.2 | |
>Online BayesSim for Combined Simulator Parameter Inference and Policy Improvement |
> Video Attachment
|
|
Possas, Rafael | University of Sydney |
Barcelos, Lucas | University of Sydney |
Oliveira, Rafael | University of Sydney |
Fox, Dieter | University of Washington |
Ramos, Fabio | University of Sydney, NVIDIA |
Keywords: Model Learning for Control, Reinforecment Learning, Optimization and Optimal Control
Abstract: Recent advancements in Bayesian likelihood-free inference enables a probabilistic treatment for the problem of estimating simulation parameters and their uncertainty given sequences of observations. Domain randomization can be performed much more effectively when a posterior distribution provides the correct uncertainty over parameters in a simu- lated environment. In this paper, we study the integration of simulation parameter inference with both model-free reinforce- ment learning and model-based control in a novel sequential algorithm that alternates between learning a better estimation of parameters and improving the controller. This approach exploits the interdependence between the two problems to generate computational efficiencies and improved reliability when a black-box simulator is available. Experimental results suggest that both control strategies have better performance when compared to traditional domain randomization methods
|
|
12:15-12:30, Paper TuBT7.3 | |
>An Online Training Method for Augmenting MPC with Deep Reinforcement Learning |
> Video Attachment
|
|
Bellegarda, Guillaume | University of California, Santa Barbara |
Byl, Katie | UCSB |
Keywords: Reinforecment Learning, Nonholonomic Motion Planning, AI-Based Methods
Abstract: Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any modeling or intuition about the system, at the cost of high sample complexity and the inability to prove any metrics about the learned policies. Trajectory optimization (TO) on the other hand allows for stability and robustness analyses on generated motions and trajectories, but is only as good as the often over-simplified derived model, and may have prohibitively expensive computation times for real-time control, for example in contact rich environments. This paper seeks to combine the benefits from these two areas while mitigating their drawbacks by (1) decreasing RL sample complexity by using existing knowledge of the problem with real-time optimal control, and (2) allowing online policy deployment at any point in the training process by using the TO (MPC) as a baseline or worst-case scenario action, while continuously improving the combined learned-optimized policy with deep RL. This method is evaluated on tasks of successively navigating a car model to a series of goal destinations over slippery terrains as fast as possible, in which drifting will allow the system to more quickly change directions while maintaining high speeds.
|
|
12:30-12:45, Paper TuBT7.4 | |
>Stochastic Neural Control Using Raw Pointcloud Data and Building Information Models |
> Video Attachment
|
|
Ferguson, Max | Stanford University |
Law, Kincho H. | Stanford University |
Keywords: Autonomous Agents, Reinforecment Learning, Path Planning for Multiple Mobile Robots or Agents
Abstract: Recently, there has been a lot of excitement surrounding the use of reinforcement learning for robot control and navigation. However, many of these algorithms encounter difficulty navigating long or complex trajectories. This paper presents a new mobile robot control system called Stochastic Neural Control (SNC), that uses a stochastic policy gradient algorithm for local control and a modified probabilistic roadmap planner for global motion planning. In SNC, each mobile robot control decision is conditioned on observations from the robot sensors as well as pointcloud data, allowing the robot to safely operate within geometrically complex environments. SNC is tested on a number of challenging navigation tasks and learns advanced policies for navigation, collision-avoidance and fall-prevention. Three variants of the SNC system are evaluated against a conventional motion planning baseline. SNC outperforms the baseline and four other similar RL navigation systems in many of trials. Finally, we present a strategy for transferring SNC from a simulated environment to a real robot. We empirically show that the SNC system exhibits good policies for mobile robot navigation when controlling a real mobile robot.
|
|
12:45-13:00, Paper TuBT7.5 | |
>RILaaS: Robot Inference and Learning As a Service |
|
Tanwani, Ajay Kumar | UC Berkeley |
Anand, Raghav | UC Berkeley |
Gonzalez, Joseph E. | UC Berkeley |
Goldberg, Ken | UC Berkeley |
Keywords: Networked Robots, Behavior-Based Systems, Distributed Robot Systems
Abstract: Programming robots is complicated due to the lack of `plug-and-play' modules for skill acquisition. Virtualizing deployment of deep learning models can facilitate large-scale use/re-use of off-the-shelf functional behaviors. Deploying deep learning models on robots entails real-time, accurate and reliable inference service under varying query load. This paper introduces a novel Robot-Inference-and-Learning-as-a-Service (RILaaS) platform for low-latency and secure inference serving of deep models on robots. Unique features of RILaaS include: 1) low-latency and reliable serving with gRPC under dynamic loads by distributing queries over multiple servers on Edge and Cloud, 2) SSH based authentication coupled with SSL/TLS based encryption for security and privacy of the data, and 3) front-end REST API for sharing, monitoring and visualizing performance metrics of the available models. We report experiments to evaluate the RILaaS platform under varying loads of batch size, number of robots, and various model placement hosts on Cloud, Edge, and Fog for providing benchmark applications of object recognition and grasp planning as a service. We address the complexity of load balancing with a Q-learning algorithm that optimizes simulated profiles of networked robots; outperforming several baselines including round robin, least connections, and least model time with 68.30 % and 14.04 % decrease in round-trip latency time across models compared to the worst and the next best baseline respectively. Details and updates are available at: https://sites.google.com/view/rilaas
|
|
13:00-13:15, Paper TuBT7.6 | |
>Actor-Critic Reinforcement Learning for Control with Stability Guarantee |
> Video Attachment
|
|
Han, Minghao | Harbin Institute of Technology |
Zhang, Lixian | Harbin Institute of Technology |
Wang, Jun | University College London |
Pan, Wei | Delft University of Technology |
Keywords: Reinforecment Learning, Motion Control
Abstract: Reinforcement Learning (RL) and its integration with deep learning have achieved impressive performance in various robotic control tasks, ranging from motion planning and navigation to end-to-end visual manipulation. However, stability is not guaranteed in model-free RL by solely using data. From a control-theoretic perspective, stability is the most important property for any control system, since it is closely related to safety, robustness, and reliability of robotic systems. In this paper, we propose an actor-critic RL framework for control which can guarantee closed-loop stability by employing the classic Lyapunov's method in control theory. First of all, a data-based stability theorem is proposed for stochastic nonlinear systems modeled by Markov decision process. Then we show that the stability condition could be exploited as the critic in the actor-critic RL to learn a controller/policy. At last, the effectiveness of our approach is evaluated on several well-known 3-dimensional robot control tasks and a synthetic biology gene network tracking task in three different popular physics simulation platforms. As an empirical evaluation on the advantage of stability, we show that the learned policies can enable the systems to recover to the equilibrium or way-points when interfered by uncertainties such as system parametric variations and external disturbances to a certain extent.
|
|
TuBT8 |
Room T8 |
Reinforcement Learning Algorithms |
Regular session |
Chair: Torras, Carme | Csic - Upc |
Co-Chair: Guan, Yisheng | Guangdong University of Technology |
|
11:45-12:00, Paper TuBT8.1 | |
>TTR-Based Reward for Reinforcement Learning with Implicit Model Priors |
|
Lyu, Xubo | Simon Fraser University |
Chen, Mo | Simon Fraser University |
Keywords: Reinforecment Learning, Optimization and Optimal Control
Abstract: Model-free reinforcement learning (RL) is a powerful approach for learning control policies directly from high-dimensional state and observation. However, it tends to be data-inefficient, which is especially costly in robotic learning tasks. On the other hand, optimal control does not require data if the system model is known, but cannot scale to models with high-dimensional states and observations. To exploit benefits of both model-free RL and optimal control, we propose time-to-reach-based (TTR-based) reward shaping, an optimal control-inspired technique to alleviate data inefficiency while retaining advantages of model-free RL. This is achieved by summarizing key system model information using a TTR function to greatly speed up the RL process, as shown in our simulation results. The TTR function is defined as the minimum time required to move from any state to the goal under assumed system dynamics constraints. Since the TTR function is computationally intractable for systems with high-dimensional states, we compute it for approximate, lower-dimensional system models that still captures key dynamic behaviors. Our approach can be flexibly and easily incorporated into any model-free RL algorithm without altering the original algorithm structure, and is compatible with any other techniques that may facilitate the RL process. We evaluate our approach on two representative robotic learning tasks and three well-known model-free RL algorithms, and show significant improvements in data efficiency and performance.
|
|
12:00-12:15, Paper TuBT8.2 | |
>Learning Hierarchical Acquisition Functions for Bayesian Optimization |
|
Rottmann, Nils | University of Luebeck |
Kunavar, Tjasa | Jozef Stefan Institute |
Babic, Jan | Jozef Stefan Institute |
Peters, Jan | Technische Universität Darmstadt |
Rueckert, Elmar | University of Luebeck |
Keywords: Reinforecment Learning, Humanoid Robot Systems, Human and Humanoid Motion Analysis and Synthesis
Abstract: Learning control policies in robotic tasks requires a large number of interactions due to small learning rates, bounds on the updates or unknown constraints. In contrast humans can infer protective and safe solutions after a single failure or unexpected observation. In order to reach similar performance, we developed a hierarchical Bayesian optimization algorithm that replicates the cognitive inference and memorization process for avoiding failures in motor control tasks. A Gaussian Process implements the modeling and the sampling of the acquisition function. This enables rapid learning with large learning rates while a mental replay phase ensures that policy regions that led to failures are inhibited during the sampling process. The features of the hierarchical Bayesian optimization method are evaluated in a simulated and physiological humanoid postural balancing task. The method outperforms standard optimization techniques, such as Bayesian Optimization, in the number of interactions to solve the task, in the computational demands and in the frequency of observed failures. Further, we show that our method performs similar to humans for learning the postural balancing task by comparing our simulation results with real human data.
|
|
12:15-12:30, Paper TuBT8.3 | |
>Reinforcement Learning in Latent Action Sequence Space |
|
Kim, Heecheol | The University of Tokyo |
Yamada, Masanori | NTT |
Miyoshi, Kosuke | Narrative Nights Inc |
Iwata, Tomoharu | NTT |
Yamakawa, Hiroshi | The Whole Brain Architecture Initiative |
Keywords: Reinforecment Learning, Transfer Learning, Learning from Demonstration
Abstract: One problem in real-world applications of reinforcement learning is the high dimensionality of the action search spaces, which comes from the combination of actions over time. To reduce the dimensionality of action sequence search spaces, macro actions have been studied, which are sequences of primitive actions to solve tasks. However, previous studies relied on humans to define macro actions or assumed macro actions to be repetitions of the same primitive actions. We propose encoded action sequence reinforcement learning (EASRL), a reinforcement learning method that learns flexible sequences of actions in a latent space for a high-dimensional action sequence search space. With EASRL, encoder and decoder networks are trained with demonstration data by using variational autoencoders for mapping macro actions into the latent space. Then, we learn a policy network in the latent space, which is a distribution over encoded macro actions given a state. By learning in the latent space, we can reduce the dimensionality of the action sequence search space and handle various patterns of action sequences. We experimentally demonstrate that the proposed method outperforms other reinforcement learning methods on tasks that require an extensive amount of search.
|
|
12:30-12:45, Paper TuBT8.4 | |
>Deep Adversarial Reinforcement Learning for Object Disentangling |
|
Laux, Melvin | Technische Universtät Darmstadt |
Arenz, Oleg | TU Darmstadt |
Peters, Jan | Technische Universität Darmstadt |
Pajarinen, Joni | Tampere University |
Keywords: Reinforecment Learning, Robust/Adaptive Control of Robotic Systems, Transfer Learning
Abstract: Deep learning in combination with improved training techniques and high computational power has led to recent advances in the field of reinforcement learning (RL) and to successful robotic RL applications such as in-hand manipulation. However, most robotic RL relies on a well known initial state distribution. In real-world tasks, this information is however often not available. For example, when disentangling waste objects the actual position of the robot w.r.t. the objects may not match the positions the RL policy was trained for. To solve this problem, we present a novel adversarial reinforcement learning (ARL) framework. The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states. We train the protagonist and the adversary jointly to allow them to adapt to the changing policy of their opponent. We show that our method can generalize from training to test scenarios by training an end-to-end system for robot control to solve a challenging object disentangling task. Experiments with a KUKA LBR+ 7-DOF robot arm show that our approach outperforms the baseline method in disentangling when starting from different initial states than provided during training.
|
|
12:45-13:00, Paper TuBT8.5 | |
>Contextual Policy Search for Micro-Data Robot Motion Learning through Covariate Gaussian Process Latent Variable Models |
> Video Attachment
|
|
Delgado-Guerrero, Juan Antonio | IRI |
Colomé, Adrià | Institut De Robòtica I Informàtica Industrial (CSIC-UPC), Q28180 |
Torras, Carme | Csic - Upc |
Keywords: Learning from Demonstration, Reinforecment Learning, Robust/Adaptive Control of Robotic Systems
Abstract: In the next few years, the amount and variety of context-aware robotic manipulator applications is expected to increase significantly, especially in household environments. In such spaces, thanks to programming by demonstration, non-expert people will be able to teach robots how to perform specific tasks, for which the adaptation to the environment is imperative, for the sake of effectiveness and users safety. These robot motion learning procedures allow the encoding of such tasks by means of parameterized trajectory generators, usually a Movement Primitive (MP) conditioned on contextual variables. However, naively sampled solutions from these MPs are generally suboptimal/inefficient, according to a given reward function. Hence, Policy Search (PS) algorithms leverage the information of the experienced rewards to improve the robot performance over executions, even for new context configurations. Given the complexity of the aforementioned tasks, PS methods face the challenge of exploring in high-dimensional parameter search spaces. In this work, a solution combining Bayesian Optimization, a data-efficient PS algorithm, with covariate Gaussian Process Latent Variable Models, a recent Dimensionality Reduction technique, is presented. It enables reducing dimensionality and exploiting prior demonstrations to converge in few iterations, while also being compliant with context requirements. Thus, contextual variables are considered in the latent search space, from which a surrogate model for the reward function is built. Then, samples are generated in a low-dimensional latent space, and mapped to a context-dependent trajectory. This allows us to drastically reduce the search space with the covariate GPLVM, e.g. from 105 to 2 parameters, plus a few contextual features. Experimentation in two different scenarios proves the data-efficiency and the power of dimensionality reduction of our approach.
|
|
13:00-13:15, Paper TuBT8.6 | |
>Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning |
> Video Attachment
|
|
Lin, Yijiong | Guangdong University of Technology |
Huang, Jiancong | Guangdong University of Technology |
Zimmer, Matthieu | Shanghai Jiao Tong University |
Guan, Yisheng | Guangdong University of Technology |
Rojas, Juan | Chinese University of Hong Kong |
Weng, Paul | Shanghai Jiao Tong University |
Keywords: Reinforecment Learning, Deep Learning in Grasping and Manipulation, AI-Based Methods
Abstract: Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from real trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques. First, Kaleidoscope Experience Replay exploits reflectional symmetries. Second, Goal-augmented Experience Replay takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain an 8x speed up in multi-goal tasks. Invariant trajectories on RL trajectories are a promising methodology to speed up learning in DRL.
|
|
TuBT9 |
Room T9 |
Reinforcement Learning Applications |
Regular session |
Chair: Büscher, Daniel | Albert-Ludwigs-Universität Freiburg |
Co-Chair: Fantini, Michael | Rice University |
|
11:45-12:00, Paper TuBT9.1 | |
>Efficiency and Equity Are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning |
> Video Attachment
|
|
Yan, Shengchao | University of Freiburg |
Zhang, Jingwei | Albert Ludwigs University of Freiburg |
Büscher, Daniel | Albert-Ludwigs-Universität Freiburg |
Burgard, Wolfram | Toyota Research Institute |
Keywords: Novel Deep Learning Methods, Reinforecment Learning, AI-Based Methods
Abstract: Traffic signal controllers play an essential role in today’s traffic system. However, the majority of them currently is not sufficiently flexible or adaptive to generate optimal traffic schedules. In this paper we present an approach to learn policies for signal controllers using deep reinforcement learning aiming for optimized traffic flow. Our method uses a novel formulation of the reward function that simultaneously considers efficiency and equity. We furthermore present a general approach to find the bound for the proposed equity factor and we introduce the adaptive discounting approach that greatly stabilizes learning and helps to maintain a high flexibility of green light duration. The experimental evaluations on both simulated and real-world data demonstrate that our proposed algorithm achieves state-of-the-art performance (previously held by traditional non- learning methods) on a wide range of traffic situations.
|
|
12:00-12:15, Paper TuBT9.2 | |
>Ultrasound-Guided Robotic Navigation with Deep Reinforcement Learning |
|
Hase, Hannes | Technical University of Munich |
Azampour, Mohammad Farid | Technical Univeristy of Munich |
Tirindelli, Maria | Computer Aided Medical Procedures, Technical University of Munic |
Paschali, Magdalini | Technical Univeristy of Munich |
Simson, Walter | Technical University Munich |
Fatemizadeh, Emad | Sharif University of Technology |
Navab, Nassir | TU Munich |
Keywords: Reinforecment Learning, Medical Robots and Systems, Autonomous Agents
Abstract: In this paper, we introduce the first reinforcement learning (RL) based robotic navigation method which utilizes ultrasound (US) images as an input. Our approach combines state-of-the-art RL techniques, specifically deep Q-networks (DQN) with memory buffers and a binary classifier for deciding when to terminate the task. Our method is trained and evaluated on an in-house collected data-set of 34 volunteers and when compared to pure RL and supervised learning (SL) techniques, it performs substantially better, which highlights the suitability of RL navigation for US-guided procedures. When testing our proposed model, we obtained an 82.91% chance of navigating correctly to the sacrum from 165 different starting positions on 5 different unseen simulated environments.
|
|
12:15-12:30, Paper TuBT9.3 | |
>Deep R-Learning for Continual Area Sweeping |
|
Shah, Rishi | The University of Texas at Austin |
Jiang, Yuqian | University of Texas at Austin |
Hart, Justin | University of Texas at Austin |
Stone, Peter | University of Texas at Austin |
Keywords: Reinforecment Learning, AI-Based Methods, Service Robots
Abstract: Coverage path planning is a well-studied problem in robotics in which a robot must plan a path that passes through every point in a given area repeatedly, usually with a uniform frequency. To address the scenario in which some points need to be visited more frequently than others, this problem has been extended to non-uniform coverage planning. This paper considers the variant of non-uniform coverage in which the robot does not know the distribution of relevant events beforehand and must nevertheless learn to maximize the rate of detecting events of interest. This continual area sweeping problem has been previously formalized in a way that makes strong assumptions about the environment, and to date only a greedy approach has been proposed. We generalize the continual area sweeping formulation to include fewer environmental constraints, and propose a novel approach based on reinforcement learning in a Semi-Markov Decision Process. This approach is evaluated in an abstract simulation and in a high fidelity Gazebo simulation. These evaluations show significant improvement upon the existing approach in general settings, which is especially relevant in the growing area of service robotics. We also present a video demonstration on a real service robot.
|
|
12:30-12:45, Paper TuBT9.4 | |
>Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards |
> Video Attachment
|
|
Schoettler, Gerrit | Siemens Corporation |
Nair, Ashvin | UC Berkeley |
Luo, Jianlan | UC Berkeley |
Bahl, Shikhar | UC Berkeley |
Aparicio Ojea, Juan | Siemens |
Solowjow, Eugen | Siemens Corporation |
Levine, Sergey | UC Berkeley |
Keywords: Reinforecment Learning, Deep Learning in Grasping and Manipulation, Industrial Robots
Abstract: Connector insertion and many other tasks commonly found in modern manufacturing settings involve complex contact dynamics and friction. Since it is difficult to capture related physical effects with first-order modeling, traditional control methods often result in brittle and inaccurate controllers, which have to be manually tuned. Reinforcement learning (RL) methods have been demonstrated to be capable of learning controllers in such environments from autonomous interaction with the environment, but running RL algorithms in the real world poses sample efficiency and safety challenges. Moreover, in practical real-world settings, we cannot assume access to perfect state information or dense reward signals. In this paper, we consider a variety of difficult industrial insertion tasks with visual inputs and different natural reward specifications, namely sparse rewards and goal images. We show that methods that combine RL with prior information, such as classical controllers or demonstrations, can solve these tasks from a reasonable amount of real-world interaction.
|
|
12:45-13:00, Paper TuBT9.5 | |
>Robotic Table Tennis with Model-Free Reinforcement Learning |
> Video Attachment
|
|
Gao, Wenbo | Columbia University |
Graesser, Laura | Google |
Choromanski, Krzysztof | Google Brain Robotics |
Song, Xingyou | Google Brain |
Lazic, Nevena | Deepmind |
Sanketi, Pannag | Google |
Sindhwani, Vikas | Google Brain, NYC |
Jaitly, Navdeep | Google Research |
Keywords: Reinforecment Learning, Novel Deep Learning Methods, Humanoid Robot Systems
Abstract: We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies.
|
|
13:00-13:15, Paper TuBT9.6 | |
>Optimizing a Continuum Manipulator's Search Policy through Model-Free Reinforcement Learning |
|
Frazelle, Chase | Clemson University |
Rogers, Jonathan | NASA Johnson Space Center |
Karamouzas, Ioannis | Clemson University |
Walker, Ian | Clemson University |
Keywords: Flexible Robots, Reinforecment Learning, Modeling, Control, and Learning for Soft Robots
Abstract: Continuum robots have long held a great potential for applications in inspection of remote, hard-to-reach environments. In future environments such as the Deep Space Gateway, remote deployment of robotic solutions will require a high level of autonomy due to communication delays and unavailability of human crews. In this work, we explore the application of policy optimization methods through Actor-Critic gradient descent in order to optimize a continuum manipulator’s search method for an unknown object. We show that we can deploy a continuum robot without prior knowledge of a goal object location and converge to a policy that finds the goal and can be reused in future deployments. We also show that the method can be quickly extended for multiple Degrees-of-Freedom and that we can restrict the policy with virtual and physical obstacles. These two scenarios are highlighted using a simulation environment with 15 and 135 unique states, respectively.
|
|
TuBT10 |
Room T10 |
Reinforcement Learning |
Regular session |
Chair: Driggs-Campbell, Katherine | University of Illinois at Urbana-Champaign |
Co-Chair: Niekum, Scott | University of Texas at Austin |
|
11:45-12:00, Paper TuBT10.1 | |
>Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning |
> Video Attachment
|
|
Chuck, Caleb | University of Texas at Austin |
Chockchowwat, Supawit | The University of Texas at Austin |
Niekum, Scott | University of Texas at Austin |
Keywords: AI-Based Methods, Model Learning for Control, Visual Learning
Abstract: Deep reinforcement learning (DRL) is capable of learning high-performing policies on a variety of complex high-dimensional tasks, ranging from video games to robotic manipulation. However, standard DRL methods often suffer from poor sample efficiency, partially because they aim to be entirely problem-agnostic. In this work, we introduce a novel approach to exploration and hierarchical skill learning that derives its sample efficiency from intuitive assumptions it makes about the behavior of objects both in the physical world and simulations which mimic physics. Specifically, we propose the Hypothesis Proposal and Evaluation (HyPE) algorithm, which discovers objects from raw pixel data, generates hypotheses about the controllability of observed changes in object state, and learns a hierarchy of skills to test these hypotheses. We demonstrate that HyPE can dramatically improve the sample efficiency of policy learning in two different domains: a simulated robotic block-pushing domain, and a popular benchmark task: Breakout. In these domains, HyPE learns high-scoring policies an order of magnitude faster than several state-of-the-art reinforcement learning methods.
|
|
12:00-12:15, Paper TuBT10.2 | |
>Robot Sound Interpretation: Combining Sight and Sound in Learning-Based Control |
> Video Attachment
|
|
Chang, Peixin | University of Illinois at Urbana Champaign |
Liu, Shuijing | University of Illinois at Urbana Champaign |
Chen, Haonan | Zhejiang University-University of Illinois at Urbana-Champaign I |
Driggs-Campbell, Katherine | University of Illinois at Urbana-Champaign |
Keywords: Cognitive Control Architectures, Robot Audition, AI-Based Methods
Abstract: We explore the interpretation of sound for robot decision making, inspired by human speech comprehension. While previous methods separate sound processing unit and robot controller, we propose an end-to-end deep neural network which directly interprets sound commands for visual-based decision making. The network is trained using reinforcement learning with auxiliary losses on the sight and sound networks. We demonstrate our approach on two robots, a TurtleBot3 and a Kuka-IIWA arm, which hear a command word, identify the associated target object, and perform precise control to reach the target. For both robots, we show the effectiveness of our network in generalization to sound types and robotic tasks empirically. We successfully transfer the policy learned in simulator to a real-world TurtleBot3.
|
|
12:15-12:30, Paper TuBT10.3 | |
>"Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer |
> Video Attachment
|
|
Hundt, Andrew | Johns Hopkins University |
Killeen, Benjamin | Johns Hopkins University |
Greene, Nicholas | Johns Hopkins University |
Wu, Hongtao | Johns Hopkins University |
Kwon, Heeyeon | Johns Hopkins University |
Paxton, Chris | NVIDIA Research |
Hager, Gregory | Johns Hopkins University |
Keywords: Deep Learning in Grasping and Manipulation, Computer Vision for Other Robotic Applications, Reinforecment Learning
Abstract: Current Reinforcement Learning (RL) algorithms struggle with long-horizon tasks where time can be wasted exploring dead ends and task progress may be easily reversed. We develop the SPOT framework, which explores within action safety zones, learns about unsafe regions without exploring them, and prioritizes experiences that reverse earlier progress to learn with remarkable efficiency. The SPOT framework successfully completes simulated trials of a variety of tasks, improving a baseline trial success rate from 13% to 100% when stacking 4 cubes, from 13% to 99% when creating rows of 4 cubes, and from 84% to 95% when clearing toys arranged in adversarial patterns. Efficiency with respect to actions per trial typically improves by 30% or more, while training takes just 1-20k actions, depending on the task. Furthermore, we demonstrate direct sim to real transfer. We are able to create real stacks in 100% of trials with 61% efficiency and real rows in 100% of trials with 59% efficiency by directly loading the simulation-trained model on the real robot with no additional real-world fine-tuning. To our knowledge, this is the first instance of reinforcement learning with successful sim to real transfer applied to long term multi-step tasks such as block-stacking and row-making with consideration of progress reversal. Code is available at https://github.com/jhu-lcsr/good_robot.
|
|
12:30-12:45, Paper TuBT10.4 | |
>Deep Reinforcement Learning for Tactile Robotics: Learning to Type on a Braille Keyboard |
> Video Attachment
|
|
Church, Alex | University of Bristol |
Lloyd, John | University of Bristol |
Hadsell, Raia | DeepMind |
Lepora, Nathan | University of Bristol |
Keywords: Force and Tactile Sensing, Reinforecment Learning, Biomimetics
Abstract: Artificial touch would seem well-suited for Reinforcement Learning (RL), since both paradigms rely on interaction with an environment. Here we propose a new environment and set of tasks to encourage development of tactile reinforcement learning: learning to type on a braille keyboard. Four tasks are proposed, progressing in difficulty from arrow to alphabet keys and from discrete to continuous actions. A simulated counterpart is also constructed by sampling tactile data from the physical environment. Using state-of-the-art deep RL algorithms, we show that all of these tasks can be successfully learnt in simulation, and 3 out of 4 tasks can be learned on the real robot. A lack of sample efficiency currently makes the continuous alphabet task impractical on the robot. To the best of our knowledge, this work presents the first demonstration of successfully training deep RL agents in the real world using observations that exclusively consist of tactile images. To aid future research utilising this environment, the code for this project has been released along with designs of the braille keycaps for 3D printing and a guide for recreating the experiments.
|
|
12:45-13:00, Paper TuBT10.5 | |
>Encoding Formulas As Deep Networks: Reinforcement Learning for Zero-Shot Execution of LTL Formulas |
|
Kuo, Yen-Ling | MIT |
Katz, Boris | MIT |
Barbu, Andrei | MIT |
Keywords: AI-Based Methods, Reinforecment Learning
Abstract: We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The input LTL formulas have never been seen before, yet the network performs zero-shot generalization to satisfy them. This is a novel form of multi-task learning for RL agents where agents learn from one diverse set of tasks and generalize to a new set of diverse tasks. The formulation of the network enables this capacity to generalize. We demonstrate this ability in two domains. In a symbolic domain, the agent finds a sequence of letters that is accepted. In a Minecraft-like environment, the agent finds a sequence of actions that conform to the formula. While prior work could learn to execute one formula reliably given examples of that formula, we demonstrate how to encode all formulas reliably. This could form the basis of new multi-task agents that discover sub-tasks and execute them without any additional training, as well as the agents which follow more complex linguistic commands. The structures required for this generalization are specific to LTL formulas, which opens up an interesting theoretical question: what structures are required in neural networks for zero-shot generalization to different logics?
|
|
TuBT11 |
Room T11 |
Representation Learning |
Regular session |
Chair: Jenkins, Odest Chadwicke | University of Michigan |
Co-Chair: Sharf, Inna | McGill University |
|
11:45-12:00, Paper TuBT11.1 | |
>PlaNet of the Bayesians: Reconsidering and Improving Deep Planning Network by Incorporating Bayesian Inference |
> Video Attachment
|
|
Okada, Masashi | Panasonic Corporation |
Kosaka, Norio | Panasonic Corporation |
Taniguchi, Tadahiro | Ritsumeikan University |
Keywords: Representation Learning, Reinforecment Learning, Probability and Statistical Methods
Abstract: In the present paper, we propose an extension of the Deep Planning Network (PlaNet), also referred to as PlaNet of the Bayesians (PlaNet-Bayes). There has been a growing demand in model predictive control (MPC) in partially observable environments in which complete information is unavailable because of, for example, lack of expensive sensors. PlaNet is a promising solution to realize such latent MPC, as it is used to train state-space models via model-based reinforcement learning (MBRL) and to conduct planning in the latent space. However, recent state-of-the-art strategies mentioned in MBRR literature, such as involving uncertainty into training and planning, have not been considered, significantly suppressing the training performance. The proposed extension is to make PlaNet uncertainty-aware on the basis of Bayesian inference, in which both model and action uncertainty are incorporated. Uncertainty in latent models is represented using a neural network ensemble to approximately infer model posteriors. The ensemble of optimal action candidates is also employed to capture multimodal uncertainty in the optimality. The concept of the action ensemble relies on a general variational inference MPC (VI-MPC) framework and its instance, probabilistic action ensemble with trajectory sampling (PaETS). In this paper, we extend VI-MPC and PaETS, which have been originally introduced in previous literature, to address partially observable cases. We experimentally compare the performances on continuous control tasks, and conclude that our method can consistently improve the asymptotic performance compared with PlaNet.
|
|
12:00-12:15, Paper TuBT11.2 | |
>Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation |
> Video Attachment
|
|
Lippi, Martina | Università Degli Studi Di Salerno |
Poklukar, Petra | KTH Royal Institute of Technology |
Welle, Michael C. | KTH Royal Institute of Technology |
Varava, Anastasiia | KTH, the Royal Institute of Technology |
Yin, Hang | KTH |
Marino, Alessandro | University of Cassino and Southern Lazio |
Kragic, Danica | KTH |
Keywords: Representation Learning, Perception-Action Coupling, Novel Deep Learning Methods
Abstract: We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces such as manipulation of deformable object. Planning is performed in a low-dimensional latent state space that embeds images. We define and implement a Latent Space Roadmap (LSR) which is a graph-based structure that globally captures the latent system dynamics. Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them. We show the effectiveness of the method on a simulated box stacking task as well as a T-shirt folding task performed with a real robot.
|
|
12:15-12:30, Paper TuBT11.3 | |
>Learning the Latent Space of Robot Dynamics for Cutting Interaction Inference |
> Video Attachment
|
|
Rezaei-Shoshtari, Sahand | McGill University |
Meger, David Paul | McGill University |
Sharf, Inna | McGill University |
Keywords: Representation Learning, Model Learning for Control, Robotics in Agriculture and Forestry
Abstract: Utilization of the latent space to capture a lower-dimensional representation of a complex dynamics model is explored in this work. The targeted application is of a robotic manipulator executing a complex environment interaction task, in particular, cutting a wooden object. We train two flavours of Variational Autoencoders---standard and Vector-Quantised---to learn the latent space which is then used to infer certain properties of the cutting operation, such as whether the robot is cutting or not, as well as, material and geometry of the object being cut. The two VAE models are evaluated with reconstruction, prediction and a combined reconstruction/prediction decoders. The results demonstrate the expressiveness of the latent space for robotic interaction inference and the competitive prediction performance against recurrent neural networks.
|
|
12:30-12:45, Paper TuBT11.4 | |
>SwingBot: Learning Physical Features from In-Hand Tactile Exploration for Dynamic Swing-Up Manipulation |
> Video Attachment
|
|
Wang, Chen | Shanghai Jiao Tong University |
Wang, Shaoxiong | MIT |
Romero, Branden | Massachusetts Institute of Technology |
Veiga, Filipe Fernandes | MIT |
Adelson, Edward | MIT |
Keywords: Representation Learning, Force and Tactile Sensing, In-Hand Manipulation
Abstract: Several robot manipulation tasks are extremely sensitive to variations of the physical properties of the manipulated objects. One such task is manipulating objects by using gravity or arm accelerations, increasing the importance of mass, center of mass, and friction information. We present SwingBot, a robot that is able to learn the physical features of an held object through tactile exploration. Two exploration actions (tilting and shaking) provide the tactile information used to create a physical feature embedding space. With this embedding, SwingBot is able to predict the swing angle achieved by a robot performing dynamic swing-up manipulations on a previously unseen object. Using these predictions, it is able to search for the optimal control parameters for a desired swing-up angle. We show that with the learned physical features our end-to-end self-supervised learning pipeline is able to substantially improve the accuracy of swinging up unseen objects. We also show that objects with similar dynamics are closer to each other on the embedding space and that the embedding can be disentangled into values of specific physical properties.
|
|
12:45-13:00, Paper TuBT11.5 | |
>Representation and Experience-Based Learning of Explainable Models for Robot Action Execution |
|
Mitrevski, Alex | Hochschule Bonn-Rhein-Sieg |
Plöger, Paul G. | Hochschule Bonn Rhein Sieg |
Lakemeyer, Gerhard | Computer Science Department, RWTH Aachen University |
Keywords: Representation Learning, Probability and Statistical Methods, Cognitive Control Architectures
Abstract: For robots acting in human-centered environments, the ability to improve based on experience is essential for reliable and adaptive operation; however, particularly in the context of robot failure analysis, experience-based improvement is practically useful only if robots are also able to reason about and explain the decisions they make during execution. In this paper, we describe and analyse a representation of execution-specific knowledge that combines (i) a relational model in the form of qualitative attributes that describe the conditions under which actions can be executed successfully and (ii) a continuous model in the form of a Gaussian process that can be used for generating parameters for action execution, but also for evaluating the expected execution success given a particular action parameterisation. The proposed representation is based on prior, modelled knowledge about actions and is combined with a learning process that is supervised by a teacher. We analyse the benefits of this representation in the context of two actions - grasping handles and pulling an object on a table - such that the experiments demonstrate that the joint relational-continuous model allows a robot to improve its execution based on experience, while reducing the severity of failures experienced during execution.
|
|
13:00-13:15, Paper TuBT11.6 | |
>TSBP: Tangent Space Belief Propagation for Manifold Learning |
|
Cohn, Thomas | University of Michigan |
Jenkins, Odest Chadwicke | University of Michigan |
Desingh, Karthik | University of Michigan |
Zeng, Zhen | University of Michigan |
Keywords: Representation Learning
Abstract: We present Tangent Space Belief Propagation (TSBP) as a method for graph denoising to improve the robustness of manifold learning algorithms. Dimension reduction by manifold learning relies heavily on the accurate selection of nearest neighbors, which has proven an open problem for sparse and noisy datasets. TSBP performs loopy nonparametric belief propagation to accurately infer the tangent spaces of the underlying manifold at each data point. Edges of the neighborhood graph that deviate from the tangent spaces are then removed. The resulting denoised graph can then be embedded into a lower-dimensional space using methods from existing manifold learning algorithms, such as ISOMAP. Artificially generated manifold data, as well as simulated sensor data from a mobile robot, demonstrate the efficacy of our method, in comparison to existing manifold learning algorithms. Artificially generated manifold data, as well as simulated sensor data from a mobile robot, are used to demonstrate the efficacy of our TSBP method.
|
|
13:00-13:15, Paper TuBT11.7 | |
>Improving Unimodal Object Recognition with Multimodal Contrastive Learning |
|
Meyer, Johannes | University of Freiburg |
Eitel, Andreas | University of Freiburg |
Brox, Thomas | University of Freiburg |
Burgard, Wolfram | Toyota Research Institute |
Keywords: Representation Learning, Visual Learning, RGB-D Perception
Abstract: Robots perceive their environment using various sensor modalities, e.g., vision, depth, sound or touch. Each modality provides complementary information for perception. However, while it can be assumed that all modalities are available for training, when deploying the robot in real-world scenarios the sensor setup often varies. In order to gain flexibility with respect to the deployed sensor setup we propose a new multimodal approach within the framework of contrastive learning. In particular, we consider the case of learning from RGB-D images while testing with one modality available, i.e., exclusively RGB or depth. We leverage contrastive learning to capture high-level information between different modalities in a compact feature embedding. We extensively evaluate our multimodal contrastive learning method on the Falling Things dataset and learn representations that outperform prior methods for RGB-D object recognition on the NYU-D dataset.
|
|
TuBT12 |
Room T12 |
Collision Avoidance I |
Regular session |
Chair: Gnanasekera, Manaram | University of New South Wales |
|
11:45-12:00, Paper TuBT12.1 | |
>Roadmap Subsampling for Changing Environments |
|
Murray, Sean | Duke University |
Konidaris, George | Brown University |
Sorin, Daniel | Duke University |
Keywords: Collision Avoidance, Motion and Path Planning
Abstract: Precomputed roadmaps can enable effective multi-query motion planning: a roadmap can be built for a robot as if no obstacles were present, and then after edges invalidated by obstacles observed at query time are deleted, path search through the remaining roadmap returns a collision-free plan. However, large roadmaps are memory intensive to store, and can be too slow for practical use. We present an algorithm for compressing a large roadmap so that the collision detection phase fits into a computational budget, while retaining a high probability of finding high-quality paths. Our algorithm adapts work from graph theory and data mining by treating roadmaps as unreliable networks, where the probability of edge failure models the probability of a query-time obstacle causing a collision. We experimentally evaluate the quality of the resulting roadmaps in a suite of four motion planning benchmarks.
|
|
12:00-12:15, Paper TuBT12.2 | |
>Robot Navigation in Crowded Environments Using Deep Reinforcement Learning |
> Video Attachment
|
|
Liu, Lucia | ETH Zurich |
Dugas, Daniel | ETH Zurich |
Cesari, Gianluca | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Dubé, Renaud | ETH Zürich |
Keywords: Collision Avoidance, Motion and Path Planning, Reinforecment Learning
Abstract: Mobile robots operating in public environments require the ability to navigate among humans and other obstacles in a socially compliant and safe manner. This work presents a combined imitation learning and deep reinforcement learning approach for motion planning in such crowded and cluttered environments. By separately processing information related to static and dynamic objects, we enable our network to learn motion patterns that are tailored to real-world environments. Our model is also designed such that it can handle usual cases in which robots can be equipped with sensor suites that only offer limited field of view. Our model outperforms current state-of-the-art approaches, which is shown in simulated environments containing human-like agents and static obstacles. Additionally, we demonstrate the real-time performance and applicability of our model by successfully navigating a robotic platform through real-world environments.
|
|
12:15-12:30, Paper TuBT12.3 | |
>Configuration Space Decomposition for Learning-Based Collision Checking in High-DOF Robots |
|
Han, Yiheng | Tsinghua University |
Zhao, Wang | Tsinghua University |
Pan, Jia | University of Hong Kong |
Liu, Yong-Jin | Tsinghua University |
Keywords: Collision Avoidance, Motion and Path Planning
Abstract: Motion planning for robots of high degrees-of-freedom (DOFs) is an important problem in robotics with sampling-based methods in configuration space C as one popular solution. Recently, machine learning methods have been introduced into sampling-based motion planning methods, which train a classifier to distinguish collision free subspace from in-collision subspace in C. In this paper, we propose a novel configuration space decomposition method and show two nice properties resulted from this decomposition. Using these two properties, we build a composite classifier that works compatibly with previous machine learning methods by using them as the elementary classifiers. Experimental results are presented, showing that our composite classifier outperforms state-of-the-art single-classifier methods by a large margin. A real application of motion planning in a multi-robot system in plant phenotyping using three UR5 robotic arms is also presented.
|
|
12:30-12:45, Paper TuBT12.4 | |
>A Time Optimal Reactive Collision Avoidance Method for UAVs Based on a Modified Collision Cone Approach |
> Video Attachment
|
|
Gnanasekera, Manaram | University of New South Wales |
Katupitiya, Jayantha | The University of New South Wales |
Keywords: Collision Avoidance, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: UAVs or Unmanned Aerial Vehicles are an upcoming technology which has eased human lifestyles in many ways. Due to this trend future skies have a risk of getting congested. In such a situation time optimal collision avoidance would be extremely vital to travel in a shortest possible time by avoiding collisions. The paper proposes a novel method for time optimal collision avoidance for UAVs. The proposed algorithm is constructed as a three-stage approach based on the Collision Cone method with slight modifications. A sliding mode controller is used as the control law for the navigation. Mathematical proofs are included to verify the time optimality of the proposed method. The efficiency and the applicability of the work carried out is confirmed by both simulation and experimental results. An automated Matrice 600 Pro hexacopter has been used for the experiments.
|
|
12:45-13:00, Paper TuBT12.5 | |
>Computationally Efficient Obstacle Avoidance Trajectory Planner for UAVs Based on Heuristic Angular Search Method |
> Video Attachment
|
|
Chen, Han | The Hongkong Polytechnic University |
Lu, Peng | The Hong Kong Polytechnic University |
Keywords: Collision Avoidance, Motion and Path Planning
Abstract: For accomplishing a variety of missions in challenging environments, the capability of navigating with full autonomy while avoiding unexpected obstacles is the most crucial requirement for UAVs in real applications. In this paper, we proposed such a computationally efficient obstacle avoidance trajectory planner that can be used in unknown cluttered environments. Because of the narrow view field of single depth camera on a UAV, the information of obstacles around is quite limited thus the shortest entire path is difficult to achieve. Therefore we focus on the time cost of the trajectory planner and safety rather than other factors. This planner is mainly composed of a point cloud processor, a waypoint publisher with Heuristic Angular Search(HAS) method and a motion planner with minimum acceleration optimization. Furthermore, we propose several techniques to enhance safety by making the possibility of finding a feasible trajectory as large as possible. The proposed approach is implemented to run onboard in real-time and is tested extensively in simulation and the average control output calculating time of iteration steps is less than 18 ms.
|
|
13:00-13:15, Paper TuBT12.6 | |
>Closing the Loop: Real-Time Perception and Control for Robust Collision Avoidance with Occluded Obstacles |
> Video Attachment
|
|
Tulbure, Andreea Roxana | ETH |
Khatib, Oussama | Stanford University |
Keywords: Collision Avoidance, Whole-Body Motion Planning and Control, Perception-Action Coupling
Abstract: Robots have been successfully used in well-structured and deterministic environments, but they are still unable to function in unstructured environments mainly because of missing reliable real-time systems that integrate perception and control. In this paper, we close the loop between perception and control for real-time obstacle avoidance by introducing a new robust perception algorithm and a new collision avoidance strategy, which combines local artificial potential fields with global elastic planning to maintain the convergence towards the goal. We evaluate our new approach in real-world experiments using a Franka Panda robot and show that it is able to robustly avoid dynamic or even partially occluded obstacles while performing position or path following tasks.
|
|
TuBT13 |
Room T13 |
Collision Avoidance II |
Regular session |
Chair: Shames, Iman | The University of Melbourne |
|
11:45-12:00, Paper TuBT13.1 | |
>A Modified Hybrid Reciprocal Velocity Obstacles Approach for Multi-Robot Motion Planning without Communication |
> Video Attachment
|
|
Sainte Catherine, Maxime | CEA |
Lucet, Eric | CEA Tech |
Keywords: Motion and Path Planning, Collision Avoidance, Wheeled Robots
Abstract: Ensuring a safe online motion planning despite a large number of moving agents is the problem addressed in this paper. Collision avoidance is achieved without communication between the agents and without global localization system. The proposed solution is a modification of the Hybrid Reciprocal Velocity Obstacles (HRVO) combined with a tracking error estimation, in order to adapt the Velocity Obstacle paradigm to agents with kinodynamic constraints and unreliable velocity estimates. This solution, evaluated in simulation and in real test scenario with three dynamic unicycle type robots, shows an improvement over HRVO.
|
|
12:00-12:15, Paper TuBT13.2 | |
>Safe and Effective Picking Paths in Clutter Given Discrete Distributions of Object Poses |
> Video Attachment
|
|
Wang, Rui | Rutgers University |
Mitash, Chaitanya | Rutgers University |
Lu, Shiyang | University of Michigan, Ann Arbor |
Boehm, Daniel | Rutgers University |
Bekris, Kostas E. | Rutgers, the State University of New Jersey |
Keywords: Motion and Path Planning, Collision Avoidance, Manipulation Planning
Abstract: Picking an item in the presence of other objects can be challenging as it involves occlusions and partial views. Given object models, one approach is to perform object pose estimation and use the most likely candidate pose per object to pick the target without collisions. This approach, however, ignores the uncertainty of the perception process both regarding the target's and the surrounding objects' poses. This work proposes first a perception process for 6D pose estimation, which returns a discrete distribution of object poses in a scene. Then, an open-loop planning pipeline is proposed to return safe and effective solutions for moving a robotic arm to pick, which (a) minimizes the probability of collision with the obstructing objects; and (b) maximizes the probability of reaching the target item. The planning framework models the challenge as a stochastic variant of the Minimum Constraint Removal (MCR) problem. The effectiveness of the methodology is verified given both simulated and real data in different scenarios. The experiments demonstrate the importance of considering the uncertainty of the perception process in terms of safe execution. The results also show that the methodology is more effective than conservative MCR approaches, which avoid all possible object poses regardless of the reported uncertainty.
|
|
12:15-12:30, Paper TuBT13.3 | |
>Collision Avoidance Based on Robust Lexicographic Task Assignment |
|
Wood, Tony A. | University of Melbourne |
Khoo, Mitchell | The University of Melbourne |
Michael, Elad | The University of Melbourne |
Manzie, Chris | University of Melbourne |
Shames, Iman | The University of Melbourne |
Keywords: Collision Avoidance, Path Planning for Multiple Mobile Robots or Agents, Task Planning
Abstract: Traditional task assignment approaches for multi-agent motion control do not take the possibility of collisions into account. This can lead to challenging requirements for path planning. We derive an assignment method that not only minimises the largest distance between an agent and its assigned destination but also provides local constraints for guaranteed collision avoidance. To this end, we introduce a sequential bottleneck optimisation problem and define a notion of robustness of an optimising assignment to changes of individual assignment costs. Conditioned on a sufficient level of robustness in relation to the size of the agents, we construct time-varying position bounds for every individual agent. These local constraints are a direct byproduct of the assignment procedure and only depend on the initial agent positions, the destinations that are to be visited, and a timing parameter. We prove that no agent that is assigned to move to one of the target locations collides with any other agent if all agents satisfy their local position constraints. We demonstrate the method in a illustrative case study.
|
|
12:30-12:45, Paper TuBT13.4 | |
>Risk-Averse MPC Via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance |
> Video Attachment
|
|
Schperberg, Alexander | University of California Los Angeles |
Chen, Kenny | University of California, Los Angeles |
Tsuei, Stephanie | University of California, Los Angeles |
Jewett, Michael | University of California, Los Angeles |
Hooks, Joshua | UCLA |
Soatto, Stefano | University of California, Los Angeles |
Mehta, Ankur | UCLA |
Hong, Dennis | UCLA |
Keywords: Motion and Path Planning, Collision Avoidance, Visual-Based Navigation
Abstract: In this paper, we propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties for safer navigation through cluttered environments. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates through each step of our MPC’s finite time horizon. The RNN model is trained on a dataset that comprises of robot and landmark poses generated from camera images and inertial measurement unit (IMU) readings via a state-of-the-art visual-inertial odometry framework. To detect and extract object locations for avoidance, we use a custom-trained convolutional neural network model in conjunction with a feature extractor to retrieve 3D centroid and radii boundaries of nearby obstacles. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms, demonstrating autonomous behaviors that can plan fast and collision-free paths towards a goal point.
|
|
12:45-13:00, Paper TuBT13.5 | |
>A Data-Driven Framework for Proactive Intention-Aware Motion Planning of a Robot in a Human Environment |
> Video Attachment
|
|
Peddi, Rahul | University of Virginia |
Di Franco, Carmelo | University of Virginia |
Gao, Shijie | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Collision Avoidance, Motion and Path Planning, Social Human-Robot Interaction
Abstract: For safe and efficient human-robot interaction, a robot needs to predict and understand the intentions of humans who share the same space. Mobile robots are traditionally built to be {em reactive}, moving in unnatural ways without following social protocol, hence forcing people to behave very differently from human-human interaction rules, which can be overcome if robots instead were {em proactive}. In this paper, we build an intention-aware proactive motion planning strategy for mobile robots that coexist with multiple humans. We propose a framework that uses Hidden Markov Model (HMM) theory with a history of observations to: i) predict future states and estimate the likelihood that humans will cross the path of a robot, and ii) concurrently learn, update, and improve the predictive model with new observations at run-time. Stochastic reachability analysis is proposed to identify multiple possibilities of future states and a control scheme that leverages temporal virtual physics inspired by spring-mass systems is proposed to enable safe proactive motion planning. The proposed approach is validated with simulations and experiments involving an unmanned ground vehicle (UGV) performing go-to-goal operations in the presence of multiple humans, demonstrating improved performance and effectiveness of online learning when compared to reactive obstacle avoidance approaches.
|
|
13:00-13:15, Paper TuBT13.6 | |
>Frozone: Freezing-Free, Pedestrian-Friendly Navigation in Human Crowds |
> Video Attachment
|
|
Sathyamoorthy, Adarsh Jagan | University of Maryland |
Patel, Utsav | University of Maryland |
Guan, Tianrui | University of Maryland |
Manocha, Dinesh | University of Maryland |
Keywords: Collision Avoidance, Motion and Path Planning, Computational Geometry
Abstract: We present Frozone, a novel algorithm to deal with the Freezing Robot Problem (FRP) that arises when a robot navigates through dense scenarios and crowds. Our method senses and explicitly predicts the trajectories of pedestrians and constructs a Potential Freezing Zone (PFZ); a spatial zone where the robot could freeze or be obtrusive to humans. Our formulation computes a deviation velocity to avoid the PFZ, which also accounts for social constraints. Furthermore, Frozone is designed for robots equipped with sensors with a limited sensing range and field of view. We ensure that the robot's deviation is bounded, thus avoiding sudden angular motion which could lead to the loss of perception data of the surrounding obstacles. We have combined Frozone with a Deep Reinforcement Learning-based (DRL) collision avoidance method and use our hybrid approach to handle crowds of varying densities. Our overall approach results in smooth and collision-free navigation in dense environments. We have evaluated our method's performance in simulation and on real differential drive robots in challenging indoor scenarios. We highlight the benefits of our approach over prior methods in terms of success rates (up to 50 % increase), pedestrian-friendliness (100 % increase) and the rate of freezing ( > 80 % decrease) in challenging scenarios.
|
|
TuBT14 |
Room T14 |
Perception for Navigation |
Regular session |
Chair: Waslander, Steven Lake | University of Toronto |
Co-Chair: Zhang, Xiaolin | Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Science |
|
11:45-12:00, Paper TuBT14.1 | |
>Dynamic Attention-Based Visual Odometry |
> Video Attachment
|
|
Kuo, Xin-Yu | National Tsing Hua University |
Liu, Chien | National Tsing Hua University |
Lin, Kai-Chen | National Tsing Hua University |
Luo, Evan | National Tsing Hua University |
Chen, Yu-Wen | National Tsing Hua University |
Lee, Chun-Yi | National Tsing Hua University |
Keywords: Localization, Visual Learning
Abstract: This paper proposes a dynamic attention-based visual odometry framework (DAVO), a learning-based VO method, for estimating the ego-motion of a monocular camera. DAVO dynamically adjusts the attention weights on different semantic categories for different motion scenarios based on optical flow maps. These weighted semantic categories can then be used to generate attention maps that highlight the relative importance of different semantic regions in input frames for pose estimation. In order to examine the proposed DAVO, we perform a number of experiments on the KITTI Visual Odometry and SLAM benchmark suite to quantitatively and qualitatively inspect the impacts of the dynamically adjusted weights on the accuracy of the evaluated trajectories. Moreover, we design a set of ablation analyses to justify each of our design choices, and validate the effectiveness as well as the advantages of DAVO. Our experiments on the KITTI dataset shows that the proposed DAVO framework does provide satisfactory performance in ego-motion estimation, and is able deliver competitive performance when compared to the contemporary VO methods.
|
|
12:00-12:15, Paper TuBT14.2 | |
>Richer Aggregated Features for Optical Flow Estimation with Edge-Aware Refinement |
|
Wang, Xianshun | Shanghai Institute of Microsystem and Information Technology, Ch |
Zhu, Dongchen | Shanghai Institute of Microsystem and Information Technology, Chi |
Song, Jiafei | SIMIT |
Liu, Yanqing | Shanghai Institute of Microsystem and Information Technology, Ch |
Li, Jiamao | Shanghai Institute of Microsystem and Information Technology, Chi |
Zhang, Xiaolin | Shanghai Institute of Microsystem and Information Technology, Chi |
Keywords: Computer Vision for Other Robotic Applications, Deep Learning for Visual Perception
Abstract: Recent CNN-based optical flow approaches have a separated structure of feature extraction and flow estimation. The core task of optical flow is finding the corresponding points while rich representation is just the key part of such matching problems. However, the prior work usually pays more attention to the design of flow decoder than the feature extraction. In this paper, we present a novel optical flow estimation network to enrich the feature representation of each pyramid level, with a hierarchical dilated architecture and a bottom-up aggregation scheme. In addition, inspired by edge guided classical methods, we bring the edge-aware idea into our approach and propose an edge-aware refinement (EAR) subnetwork to handle motion boundaries. Using the same decoding structure as PWC-Net, our network outperforms it by a large margin and leads all its derivatives both on KITTI-2012 and KITTI-2015. Further performance analysis proves the effectiveness of proposed ideas.
|
|
12:15-12:30, Paper TuBT14.3 | |
>LiDAR Iris for Loop-Closure Detection |
|
Wang, Ying | Nanjing University of Science and Technology |
Sun, Zezhou | Nanjing University of Science and Technology |
Xu, Cheng-Zhong | University of Macau |
Sarma, Sanjay E. | MIT |
Yang, Jian | Nanjing University of Science & Technology |
Kong, Hui | Nanjing University of Science and Technology |
Keywords: Computer Vision for Other Robotic Applications, Localization
Abstract: In this paper, a global descriptor for a LiDAR point cloud, called LiDAR Iris, is proposed for fast and accurate loop-closure detection. A binary signature image can be obtained for each point cloud after several LoG-Gabor filtering and thresholding operations on the LiDAR-Iris image representation. Given two point clouds, their similarities can be calculated as the Hamming distance of two corresponding binary signature images extracted from the two point clouds, respectively. Our LiDAR-Iris method can achieve a pose-invariant loop-closure detection at a descriptor level with the Fourier transform of the LiDAR-Iris representation if assuming a 3D (x,y,yaw) pose space, although our method can generally be applied to a 6D pose space by re-aligning point clouds with an additional IMU sensor. Experimental results on five road-scene sequences demonstrate its excellent performance in loop-closure detection.
|
|
12:30-12:45, Paper TuBT14.4 | |
>Confidence Guided Stereo 3D Object Detection with Split Depth Estimation |
> Video Attachment
|
|
Li, Chengyao | University of Toronto |
Ku, Jason | University of Toronto |
Waslander, Steven Lake | University of Toronto |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: Accurate and reliable 3D object detection is vital to safe autonomous driving. Despite recent developments, the performance gap between stereo-based methods and LiDAR-based methods is still considerable. Accurate depth estimation is crucial to the performance of stereo-based 3D object detection methods, particularly for those pixels associated with objects in the foreground. Moreover, stereo-based methods suffer from high variance in the depth estimation accuracy, which is often not considered in the object detection pipeline. To tackle these two issues, we propose CG-Stereo, a confidence-guided stereo 3D object detection pipeline that uses separate decoders for foreground and background pixels during depth estimation, and leverages the confidence estimation from the depth estimation network as a soft attention mechanism in the 3D object detector. Our approach outperforms all state-of-the-art stereo-based 3D detectors on the KITTI benchmark.
|
|
12:45-13:00, Paper TuBT14.5 | |
>End-To-End Contextual Perception and Prediction with Interaction Transformer |
|
Li, Lingyun | Uber Advanced Technologies Group |
Yang, Bin | University of Toronto |
Liang, Ming | Uber |
Ren, Mengye | University of Toronto, Uber ATG |
Zeng, Wenyuan | University of Toronto, Uber |
Segal, Sean | Uber ATG, University of Toronto |
Urtasun, Raquel | University of Toronto |
Keywords: Computer Vision for Transportation, Novel Deep Learning Methods, Collision Avoidance
Abstract: In this paper, we tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving. Towards this goal, we design a novel approach that explicitly takes into account the interactions between actors. To capture the spatial-temporal dependency between actors, we propose a recurrent neural network with a novel Transformer architecture, which we call the Interaction Transformer. Importantly, our model can be trained end-to-end, and runs in real-time. We validate our approach on two challenging real-world datasets: ATG4D and nuScenes. We show that our approach can outperform the state-of-the-art results on both datasets. In particular, we significantly improve the social compliance between the estimated future trajectories, resulting in far fewer collisions between the predicted actors.
|
|
13:00-13:15, Paper TuBT14.6 | |
>Inferring Spatial Uncertainty in Object Detection |
> Video Attachment
|
|
Wang, Zining | University of California, Berkeley |
Feng, Di | Technical University of Munich |
Zhou, Yiyang | University of California, Berkeley |
Rosenbaum, Lars | Robert Bosch GmbH |
Timm, Fabian | Robert Bosch GmbH |
Dietmayer, Klaus | University of Ulm |
Tomizuka, Masayoshi | University of California |
Zhan, Wei | Univeristy of California, Berkeley |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: The availability of real-world datasets is the prerequisite for developing object detection methods for autonomous driving. While ambiguity exists in object labels due to error-prone annotation process or sensor observation noises, current object detection datasets only provide deterministic annotations without considering their uncertainty. This precludes an in-depth evaluation among different object detection methods, especially for those that explicitly model predictive probability. In this work, we propose a generative model to estimate bounding box label uncertainties from LiDAR point clouds, and define a new representation of the probabilistic bounding box through spatial distribution. Comprehensive experiments show that the proposed model represents uncertainties commonly seen in driving scenarios. Based on the spatial distribution, we further propose an extension of IoU, called the Jaccard IoU (JIoU), as a new evaluation metric that incorporates label uncertainty. Experiments on the KITTI and the Waymo Open Datasets show that JIoU is superior to IoU when evaluating probabilistic object detectors.
|
|
TuBT15 |
Room T15 |
Vision-Based Navigation I |
Regular session |
Chair: Fermuller, Cornelia | University of Maryland |
Co-Chair: Tran, Quang | AIOZ |
|
11:45-12:00, Paper TuBT15.1 | |
>One-Shot Informed Robotic Visual Search in the Wild |
> Video Attachment
|
|
Koreitem, Karim | McGill University |
Shkurti, Florian | University of Toronto |
Manderson, Travis | McGill University |
Chang, Wei-Di | McGill University |
Gamboa Higuera, Juan Camilo | McGill University |
Dudek, Gregory | McGill University |
Keywords: Visual-Based Navigation, Field Robots, Representation Learning
Abstract: We consider the task of underwater robot navigation for the purpose of collecting scientifically relevant video data for environmental monitoring. The majority of field robots that currently perform monitoring tasks in unstructured natural environments navigate via path-tracking a pre-specified sequence of waypoints. Although this navigation method is often necessary, it is limiting because the robot does not have a model of what the scientist deems to be relevant visual observations. Thus, the robot can neither visually search for particular types of objects, nor focus its attention on parts of the scene that might be more relevant than the pre-specified waypoints and viewpoints. In this paper we propose a method that enables informed visual navigation via a learned visual similarity operator that guides the robot’s visual search towards parts of the scene that look like an exemplar image, which is given by the user as a high-level specification for data collection. We propose and evaluate a weakly supervised video representation learning method that outperforms ImageNet embeddings for similarity tasks in the underwater domain. We also demonstrate the deployment of this similarity operator during informed visual navigation in collaborative environmental monitoring scenarios, in large-scale field trials, where the robot and a human scientist collaboratively search for relevant visual content. Code: https://github.com/rvl-lab-utoronto/visual_search_in_the_wild
|
|
12:00-12:15, Paper TuBT15.2 | |
>Perception-Aware Path Planning for UAVs Using Semantic Segmentation |
> Video Attachment
|
|
Bartolomei, Luca | ETH Zurich |
Teixeira, Lucas | ETH Zurich |
Chli, Margarita | ETH Zurich |
Keywords: Visual-Based Navigation, Aerial Systems: Perception and Autonomy, Autonomous Vehicle Navigation
Abstract: In this work, we present a perception-aware path-planning pipeline for Unmanned Aerial Vehicles (UAVs) for navigation in challenging environments. The objective is to reach a given destination safely and accurately by relying on monocular camera-based state estimators, such as Keyframe-based Visual-Inertial Odometry (VIO) systems. Motivated by the recent advances in semantic segmentation using deep learning, our path-planning architecture takes into consideration the semantic classes of parts of the scene that are perceptually more informative than others. This work proposes a planning strategy capable of avoiding both texture-less regions and problematic areas, such as lakes and oceans, that may cause large drift or failures in the robot's pose estimation, by using the semantic information to compute the next best action with respect to perception quality. We design a hierarchical planner, composed of an A* path-search step followed by B-Spline trajectory optimization. While the A* steers the UAV towards informative areas, the optimizer keeps the most promising landmarks in the camera's field of view. We extensively evaluate our approach in a set of photo-realistic simulations, showing a remarkable improvement with respect to the state-of-the-art in active perception.
|
|
12:15-12:30, Paper TuBT15.3 | |
>Learning Your Way without Map or Compass: Panoramic Target Driven Visual Navigation |
> Video Attachment
|
|
Watkins-Valls, David | Columbia University |
Xu, Jingxi | Columbia University |
Waytowich, Nicholas | University of North Florida |
Allen, Peter | Columbia University |
Keywords: Visual-Based Navigation, Big Data in Robotics and Automation, Imitation Learning
Abstract: We present a robot navigation system that uses an imitation learning framework to successfully navigate in complex environments. Our framework takes a pre-built 3D scan of a real environment and trains an agent from pre-generated expert trajectories to navigate to any position given a panoramic view of the goal and the current visual input without relying on map, compass, odometry, or relative position of the target at runtime. Our end-to-end trained agent uses RGB and depth (RGBD) information and can handle large environments (up to 1031m^2) across multiple rooms (up to 40) and generalizes to unseen targets. We show that when compared to several baselines our method (1) requires fewer training examples and less training time, (2) reaches the goal location with higher accuracy, and (3) produces better solutions with shorter paths for long-range navigation tasks.
|
|
12:30-12:45, Paper TuBT15.4 | |
>Autonomous Navigation in Complex Environments with Deep Multimodal Fusion Network |
> Video Attachment
|
|
Nguyen, Anh | Imperial College London |
Nguyen, Ngoc | AIOZ Pte Ltd |
Tran, Xuan Kim | Company |
Tjiputra, Erman | AIOZ |
Tran, Quang | AIOZ |
Keywords: Visual-Based Navigation, Novel Deep Learning Methods, Deep Learning for Visual Perception
Abstract: Autonomous navigation in complex environments is a crucial task in time-sensitive scenarios such as disaster response or search and rescue. However, complex environments pose significant challenges for autonomous platforms to navigate due to their challenging properties: constrained narrow passages, unstable pathway with debris and obstacles, or irregular geological structures and poor lighting conditions. In this work, we propose a multimodal fusion approach to address the problem of autonomous navigation in complex environments such as collapsed cites, or natural caves. We first simulate the complex environments in a physics-based simulation engine and collect a large-scale dataset for training. We then propose a Navigation Multimodal Fusion Network (NMFNet) which has three branches to effectively handle three visual modalities: laser, RGB images, and point cloud data. The extensively experimental results show that our NMFNet outperforms recent state of the art by a fair margin while achieving real-time performance. We further show that the use of multiple modalities is essential for autonomous navigation in complex environments. Finally, we successfully deploy our network to both simulated and real mobile robots.
|
|
12:45-13:00, Paper TuBT15.5 | |
>Unsupervised Learning of Dense Optical Flow, Depth and Egomotion with Event-Based Sensors |
|
Ye, Chengxi | University of Maryland |
Mitrokhin, Anton | University of Maryland, College Park |
Yorke, James | University of Maryland, College Park |
Fermuller, Cornelia | University of Maryland |
Aloimonos, Yiannis | University of Maryland |
Keywords: Autonomous Vehicle Navigation, Visual-Based Navigation, Deep Learning for Visual Perception
Abstract: We present an unsupervised learning pipeline for dense depth, optical flow and egomotion estimation for autonomous driving applications, using the event-based output of the Dynamic Vision Sensor (DVS) as input. The backbone of our pipeline is a bioinspired encoder-decoder neural network architecture - ECN. To train the pipeline, we introduce a covariance normalization technique which resembles the lateral inhibition mechanism found in animal neural systems. Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only, and is able to transfer from day to night scenes without any additional training. The network works in self-supervised mode and has just 150k parameters. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation. Thanks to the efficient design, we are able to achieve inference rates of 300 FPS on a single Nvidia 1080Ti GPU. Our experiments demonstrate significant improvements upon works that used deep learning on event data, as well as the ability to perform well during both day and night.
|
|
13:00-13:15, Paper TuBT15.6 | |
>HouseExpo: A Large-Scale 2D Indoor Layout Dataset for Learning-Based Algorithms on Mobile Robots |
> Video Attachment
|
|
Li, Tingguang | The Chinese University of Hong Kong |
Ho, Danny | The Chinese University of Hong Kong |
Li, Chenming | The Chinese University of Hong Kong |
Zhu, Delong | The Chinese University of Hong Kong |
Wang, Chaoqun | The Chinese University of HongKong |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: AI-Based Methods, Big Data in Robotics and Automation, Visual-Based Navigation
Abstract: As one of the most promising areas, mobile robots draw much attention these years. Current work in this field is often evaluated in a few manually designed scenarios, due to the lack of a common experimental platform. Meanwhile, with the recent development of deep learning techniques, some researchers attempt to apply learning-based methods to mobile robot tasks, which requires a substantial amount of data. To satisfy the underlying demand, in this paper we build HouseExpo, a large-scale indoor layout dataset containing 35, 126 2D floor plans including 252,550 rooms in total. Together we develop Pseudo-SLAM, a lightweight and efficient simulation platform to accelerate the data generation procedure, thereby speeding up the training process. In our experiments, we build models to tackle obstacle avoidance and autonomous exploration from a learning perspective in simulation as well as real-world experiments to verify the effectiveness of our simulator and dataset. All the data and codes are available online and we hope HouseExpo and Pseudo-SLAM can feed the need for data and benefit the whole community.
|
|
TuBT16 |
Room T16 |
Vision-Based Navigation II |
Regular session |
Chair: Gammell, Jonathan | University of Oxford |
Co-Chair: Yu, Changbin (Brad) | The Australian National University |
|
11:45-12:00, Paper TuBT16.1 | |
>Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning |
> Video Attachment
|
|
Yan, Liqi | Fudan University |
Liu, Dongfang | Purdue University |
Song, Yaoxian | Fudan University |
Yu, Changbin (Brad) | The Australian National University |
Keywords: Visual-Based Navigation, Reinforecment Learning, Motion and Path Planning
Abstract: Vision and voice are two vital keys for agents' interaction and learning. In this paper, we present a novel indoor navigation model called Memory Vision-Voice Indoor Navigation (MVV-IN), which receives voice commands and analyzes multimodal information of visual observation in order to enhance robots' environment understanding. We make use of single RGB images taken by a first-view monocular camera. We also apply a self-attention mechanism to keep the agent focusing on key areas. Memory is important for the agent to avoid repeating certain tasks unnecessarily and in order for it to adapt adequately to new scenes, therefore, we make use of meta-learning. We have experimented with various functional features extracted from visual observation. Comparative experiments prove that our methods outperform state-of-the-art baselines.
|
|
12:00-12:15, Paper TuBT16.2 | |
>Occlusion-Robust MVO: Multimotion Estimation through Occlusion Via Motion Closure |
> Video Attachment
|
|
Judd, Kevin Michael | University of Oxford |
Gammell, Jonathan | University of Oxford |
Keywords: Visual-Based Navigation, Visual Tracking, Autonomous Vehicle Navigation
Abstract: Visual motion estimation is an integral and well studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation, which is especially challenging in highly dynamic environments. Such environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Previous work in object tracking focuses on maintaining the integrity of object tracks but usually relies on specific appearance-based descriptors or constrained motion models. These approaches are very effective in specific applications but do not generalize to the full multimotion estimation problem. This paper presents a pipeline for estimating multiple motions, including the camera egomotion, in the presence of occlusions. This approach uses an expressive motion prior to estimate the SE (3) trajectory of every motion in the scene, even during temporary occlusions, and identify the reappearance of motions through motion closure. The performance of this occlusion-robust multimotion visual odometry (MVO) pipeline is evaluated on real-world data and the Oxford Multimotion Dataset.
|
|
12:15-12:30, Paper TuBT16.3 | |
>IDOL: A Framework for IMU-DVS Odometry Using Lines |
> Video Attachment
|
|
Le Gentil, Cedric | University of Technology Sydney |
Tschopp, Florian | ETH Zurich |
Alzugaray, Ignacio | ETH Zürich |
Vidal-Calleja, Teresa A. | University of Technology Sydney |
Siegwart, Roland | ETH Zurich |
Nieto, Juan | ETH Zürich |
Keywords: Visual-Based Navigation, SLAM, Sensor Fusion
Abstract: In this paper, we introduce IDOL, an optimization-based framework for IMU-DVS Odometry using Lines. Event cameras, also called Dynamic Vision Sensors (DVSs), generate highly asynchronous streams of events triggered upon illumination changes for each individual pixel. This novel paradigm presents advantages in low illumination conditions and high-speed motions. Nonetheless, this unconventional sensing modality brings new challenges to perform scene reconstruction or motion estimation. The proposed method offers to leverage a continuous-time representation of the inertial readings to associate each event with timely accurate inertial data. The method's front-end extracts event clusters that belong to line segments in the environment whereas the back-end estimates the system's trajectory alongside the lines' 3D position by minimizing point-to-line distances between individual events and the lines' projection in the image space. A novel attraction/repulsion mechanism is presented to accurately estimate the lines' extremities, avoiding their explicit detection in the event data. The proposed method is benchmarked against a state-of-the-art frame-based visual-inertial odometry framework using public datasets. The results show that IDOL performs at the same order of magnitude on most datasets and even shows better orientation estimates. These findings can have a great impact on new algorithms for DVS.
|
|
12:30-12:45, Paper TuBT16.4 | |
>Point Cloud Based Reinforcement Learning for Sim-To-Real and Partial Observability in Visual Navigation |
> Video Attachment
|
|
Lobos-Tsunekawa, Kenzo | Universidad De Chile |
Harada, Tatsuya | The University of Tokyo |
Keywords: Visual-Based Navigation, Reinforecment Learning, AI-Based Methods
Abstract: Reinforcement Learning (RL), among other learning-based methods, represents powerful tools to solve complex robotic tasks (e.g., actuation, manipulation, navigation, etc.), with the need for real-world data to train these systems as one of its most important limitations. The use of simulators is one way to address this issue, yet knowledge acquired in simulations does not work directly in the real-world, which is known as the sim-to-real transfer problem. While previous works focus on the nature of the images used as observations (e.g., textures and lighting), which has proven useful for a sim-to-sim transfer, they neglect other concerns regarding said observations, such as precise geometrical meanings, failing at robot-to-robot, and thus in sim-to-real transfers. We propose a method that learns on an observation space constructed by point clouds and environment randomization, generalizing among robots and simulators to achieve sim-to-real, while also addressing partial observability. We demonstrate the benefits of our methodology on the point goal navigation task, in which our method proves to be highly unaffected to unseen scenarios produced by robot-to-robot transfer, outperforms image-based baselines in robot-randomized experiments, and presents high performances in sim-to-sim conditions. Finally, we perform several experiments to validate the sim-to-real transfer to a physical domestic robot platform, confirming the out-of-the-box performance of our system.
|
|
12:45-13:00, Paper TuBT16.5 | |
>Autonomous Robot Navigation Based on Multi-Camera Perception |
|
Zhu, Kunyan | Shandong University |
Chen, Wei | Shandong University |
Zhang, Wei | Shandong University |
Song, Ran | Shandong University |
Li, Yibin | Shandong University |
Keywords: Visual-Based Navigation, Collision Avoidance, Motion and Path Planning
Abstract: In this paper, we propose an autonomous method for robot navigation based on a multi-camera setup that takes advantage of a wide field of view. A new multi-task network is designed for handling the visual information supplied by the left, central and right cameras to find the passable area, detect the intersection and infer the steering. Based on the outputs of the network, three navigation indicators are generated and then combined with the high-level control commands extracted by the proposed MapNet, which are finally fed into the driving controller. The indicators are also used through the controller for adjusting the driving velocity, which assists the robot to adjust the speed for smoothly bypassing obstacles. Experiments in real-world environments demonstrate that our method performs well in both local obstacle avoidance and global goal-directed navigation tasks.
|
|
TuBT17 |
Room T17 |
Vision-Based Navigation III |
Regular session |
Chair: Kim, H. Jin | Seoul National University |
Co-Chair: Tombari, Federico | Technische Universität München |
|
11:45-12:00, Paper TuBT17.1 | |
>Model Quality Aware RANSAC: A Robust Camera Motion Estimator |
|
Yeh, Shu-Hao | Texas A&M University |
Lu, Yan | Google |
Song, Dezhen | Texas A&M University |
Keywords: Visual-Based Navigation, SLAM, Computer Vision for Other Robotic Applications
Abstract: Robust estimation of camera motion under the presence of outlier noise is a fundamental problem in robotics and computer vision. Despite existing efforts that focus on detecting motion and scene degeneracies, the best existing approach that builds on Random Consensus Sampling (RANSAC) still has non-negligible failure rate. Since a single failure can lead to the failure of the entire visual simultaneous localization and mapping, it is important to further improve the robust estimation algorithm. We propose a new robust camera motion estimator (RCME) by incorporating two main changes: a model-sample consistency test at the model instantiation step and an inlier set quality test that verifies model-inlier consistency using differential entropy. We have implemented our RCME algorithm and tested it under many public datasets. The results have shown a consistent reduction in failure rate when comparing to the RANSAC-based Gold Standard approach and two recent variations of RANSAC methods.
|
|
12:00-12:15, Paper TuBT17.2 | |
>A Fast and Robust Place Recognition Approach for Stereo Visual Odometry Using LiDAR Descriptors |
> Video Attachment
|
|
Mo, Jiawei | University of Minnesota, Twin Cities |
Sattar, Junaed | University of Minnesota |
Keywords: Visual-Based Navigation, Autonomous Vehicle Navigation, SLAM
Abstract: Place recognition is a core component of Simultaneous Localization and Mapping (SLAM) algorithms. Particularly in visual SLAM systems, previously-visited places are recognized by measuring the appearance similarity between images representing these locations. However, such approaches are sensitive to visual appearance change and also can be computationally expensive. In this paper, we propose an alternative approach adapting LiDAR descriptors for 3D points obtained from stereo-visual odometry for place recognition. 3D points are potentially more reliable than 2D visual cues (e.g., 2D features) against environmental changes (e.g., variable illumination) and this may benefit visual SLAM systems in long-term deployment scenarios. Stereo-visual odometry generates 3D points with an absolute scale, which enables us to use LiDAR descriptors for place recognition with high computational efficiency. Through extensive evaluations on standard benchmark datasets, we demonstrate the accuracy, efficiency, and robustness of using 3D points for place recognition over 2D methods.
|
|
12:15-12:30, Paper TuBT17.3 | |
>KLIEP-Based Density Ratio Estimation for Semantically Consistent Synthetic to Real Images Adaptation in Urban Traffic Scenes |
|
Savkin, Artem | TUM |
Tombari, Federico | Technische Universität München |
Keywords: Simulation and Animation, Computer Vision for Transportation, Autonomous Vehicle Navigation
Abstract: Synthetic data has been applied in many deep learning based computer vision tasks. Limited performance of algorithms trained solely on synthetic data has been approached with domain adaptation techniques such as the ones based on generative adversarial framework. We demonstrate how adversarial training alone can introduce semantic inconsistencies in translated images. To tackle this issue we propose density prematching strategy using KLIEP-based density ratio estimation procedure. Finally, we show that aforementioned strategy improves quality of translated images of underlying method and their usability for the semantic segmentation task in the context of autonomous driving.
|
|
12:30-12:45, Paper TuBT17.4 | |
>Graduated Assignment Graph Matching for Realtime Matching of Image Wireframes |
|
Menke, Joseph | University of California, Berkeley |
Yang, Allen | University of California, Berkeley |
Keywords: Visual Tracking, Mapping, Semantic Scene Understanding
Abstract: We present an algorithm for the realtime matching of wireframe extractions in pairs of images. Here we treat extracted wireframes as graphs and propose a simplified Graduated Assignment algorithm to use with this problem. Using this algorithm we achieve a 30% accuracy improvement over the baseline method. We show that, for this problem, the simplified Graduated Assignment algorithm can achieve realtime performance without a significant drop in accuracy as compared to the standard Graduated Assignment algorithm. We further demonstrate a method of utilizing this simplified Graduated Assignment algorithm for achieving a similar realtime improvement in the matching quality of standard features without wireframe detection.
|
|
12:45-13:00, Paper TuBT17.5 | |
>Edge-Based Visual Odometry with Stereo Cameras Using Multiple Oriented Quadtrees |
> Video Attachment
|
|
Kim, Changhyeon | Seoul National University |
Kim, Junha | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Visual-Based Navigation, Localization, Mapping
Abstract: We propose an efficient edge-based stereo visual odometry (VO) using multiple quadtrees created according to image gradient orientations. To characterize edges, we classify them into eight orientation groups according to their image gradient directions. Using the edge groups, we construct eight quadtrees and set overlapping areas belonging to adjacent quadtrees for robust and efficient matching. For further acceleration, previously visited tree nodes are stored and reused at the next iteration to warm-start. We propose an edge culling method to extract prominent edgelets and prune redundant edges. The camera motion is estimated by minimizing point-to-edge distances within a re-weighted iterative closest points (ICP) framework, and simultaneously, 3-D structures are recovered by static and temporal stereo settings. To analyze the effects of the proposed methods, we conduct extensive simulations with various settings. Quantitative results on public datasets confirm that our approach has competitive performance with state-of-the-art stereo methods. In addition, we demonstrate the practical values of our system in author-collected modern building scenes with curved edges only.
|
|
TuBT18 |
Room T18 |
Vision-Based Navigation IV |
Regular session |
Chair: Karaman, Sertac | Massachusetts Institute of Technology |
Co-Chair: Jawahar, C.V. | IIIT, Hyderabad |
|
11:45-12:00, Paper TuBT18.1 | |
>Perception-Aware Path Finding and Following of Snake Robot in Unknown Environment |
> Video Attachment
|
|
Yang, Weixin | University of Nevada, Reno |
Wang, Gang | University of Nevada |
Shen, Yantao | University of Nevada, Reno |
Keywords: Perception-Action Coupling, Biomimetics, Visual-Based Navigation
Abstract: In this paper, we investigate the perception-aware path finding, planning and following for a class of snake robots autonomously serpentining in an unmodeled and unknown environment. In the work, the onboard LiDAR sensor mounted on the head of the snake robot is utilized to reconstruct the local environment, by which and the modified rapidly-exploring random tree method, a feasible path from the current position of the robot to a local selected target position can be obtained. Next, the parametric cubic spline interpolation path-planning method and potential functions are applied to make the path more smooth so as to prevent the multi-link and elongated robot body from hitting obstacles. To steer, a time-varying line-of-sight control law is designed to ensure that the robot moves to the local target position along the generated path by the perception-aware method. The robot will repeatedly perform the above search-find-move strategy until it reaches the final predefined target point. Simulation and experimental results demonstrate a good performance of the proposed perception-aware approach, that is, the elongated and underactuated snake robot is capable of autonomously navigating in an unknown environment.
|
|
12:00-12:15, Paper TuBT18.2 | |
>Joint Feature Selection and Time Optimal Path Parametrization for High Speed Vision-Aided Navigation |
|
Spasojevic, Igor | MIT |
Murali, Varun | Massachusetts Institute of Technology |
Karaman, Sertac | Massachusetts Institute of Technology |
Keywords: Perception-Action Coupling, Visual-Based Navigation, Motion and Path Planning
Abstract: We study a problem in vision-aided navigation in which an autonomous agent has to traverse a specified path in minimal time while ensuring extraction of a steady stream of visual percepts with low latency. Vision-aided robots extract motion estimates from the sequence of images of their on-board cameras by registering the change in bearing to landmarks in their environment. The computational burden of the latter procedure grows with the range of apparent motion undertaken by the projections of the landmarks, incurring a lag in pose estimates that should be minimized while navigating at high speeds. This paper addresses the problem of selecting a desired number of landmarks in the environment, together with the time parametrization of the path, to allow the agent execute it in minimal time while both (i) ensuring the computational burden of extracting motion estimates stays below a set threshold and (ii) respecting the actuation constraints of the agent. We provide two efficient approximation algorithms for addressing the aforementioned problem. Also, we show how it can be reduced to a mixed integer linear program for which there exist well-developed optimization packages. Ultimately, we illustrate the performance of our algorithms in experiments using a quadrotor.
|
|
12:15-12:30, Paper TuBT18.3 | |
>AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot |
> Video Attachment
|
|
Qin, Tong | Hong Kong University of Science and Technology |
Chen, Tongqing | Huawei Technology |
Chen, Yilun | Huawei Technology |
Su, Qing | Huawei Technologies Co., Ltd |
Keywords: Localization, Computer Vision for Automation, Visual-Based Navigation
Abstract: Autonomous valet parking is a specific application for autonomous vehicles. In this task, vehicles need to navigate in narrow, crowded and GPS-denied parking lots. Accurate localization ability is of great importance. Traditional visual-based methods suffer from tracking lost due to texture-less regions, repeated structures, and appearance changes. In this paper, we exploit robust semantic features to build the map and localize vehicles in parking lots. Semantic features contain guide signs, parking lines, speed bumps, etc, which typically appear in parking lots. Compared with traditional features, these semantic features are long-term stable and robust to the perspective and illumination change. We adopt four surround-view cameras to increase the perception range. Assisting by an IMU (Inertial Measurement Unit) and wheel encoders, the proposed system generates a global visual semantic map. This map is further used to localize vehicles at the centimeter level. We analyze the accuracy and recall of our system and compare it against other methods in real experiments. Furthermore, we demonstrate the practicability of the proposed system by the autonomous parking application.
|
|
12:30-12:45, Paper TuBT18.4 | |
>DGAZE: Driver Gaze Mapping on Road |
> Video Attachment
|
|
Dua, Isha | IIIT Hyderbad |
John, Thrupthi Ann | IIIT Hyderbad |
Gupta, Riya | IIIT Hyderabad |
Jawahar, C.V. | IIIT, Hyderabad |
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems, Deep Learning for Visual Perception
Abstract: Driver gaze mapping is crucial to estimate driver attention and determine which objects the driver is focusing on while driving. We introduce DGAZE, the first large-scale driver gaze mapping dataset. Unlike previous works, our dataset does not require expensive wearable eye-gaze trackers and instead relies on mobile phone cameras for data collection. The data was collected in a lab setting designed to mimic real driving conditions and has point and object-level annotation. It consists of 227,178 road-driver image pairs collected from 20 drivers and contains 103 unique objects on the road belonging to 7 classes: cars, pedestrians, traffic signals, motorbikes, auto-rickshaws, buses and signboards. We also present I-DGAZE, a fused convolutional neural network for predicting driver gaze on the road, which was trained on the DGAZE dataset. Our architecture combines facial features such as face location and head pose along with the image of the left eye to get optimum results. Our model achieves an error of 186.89 pixels on the road view of resolution 1920x1080 pixels. We compare our model with state-of-the-art eye gaze works and present extensive ablation results.
|
|
TuBT19 |
Room T19 |
Navigation and Collision Avoidance |
Regular session |
Chair: Bera, Aniket | University of Maryland |
Co-Chair: Feng, Chen | New York University |
|
11:45-12:00, Paper TuBT19.1 | |
>Autonomous Obstacle Avoidance for UAV Based on Fusion of Radar and Monocular Camera |
> Video Attachment
|
|
Yu, Hang | Northwestern Polytechnical University |
Zhang, Fan | Northwestern Polytechnical Univeristy |
Huang, Panfeng | Northwestern Polytechnical University |
Wang, Chen | Chang’an University |
Yuanhao, Li | Northwestern Polytechnical University |
Keywords: Sensor Fusion, Collision Avoidance, Visual-Based Navigation
Abstract: UAVs face many challenges in autonomous obstacle avoidance in large outdoor scenarios, specifically the long communication distance from ground stations. The computing power of onboard computers is limited, and the unknown obstacles cannot be accurately detected. In this paper, an autonomous obstacle avoidance scheme based on the fusion of millimeter wave radar and monocular camera is proposed. The visual detection is designed to detect unknown obstacles which is more robust than traditional algorithms. Then extended Kalman filter (EKF) data fusion is used to build exact real 3D coordinates of the obstacles. Finally, an efficient path planning algorithm is used to obtain the path to avoid obstacles. Based on the theoretical design, an experimental platform is built to verify the UAV autonomous obstacle avoidance scheme proposed in this paper. The experiment results show the proposed scheme cannot only detect different kinds of unknown obstacles, but can also take up very little computing resources to run on an onboard computer. The outdoor flight experiment shows the feasibility of the proposed scheme.
|
|
12:00-12:15, Paper TuBT19.2 | |
>UST: Unifying Spatio-Temporal Context for Trajectory Prediction in Autonomous Driving |
|
He, Hao | TuSimple |
Dai, Hengchen | Tusimple |
Wang, Naiyan | TuSimple |
Keywords: Big Data in Robotics and Automation, Autonomous Agents
Abstract: Trajectory prediction has always been a challenging problem for autonomous driving, since it needs to infer the latent intention from the behaviors and interactions from traffic participants. This problem is intrinsically hard, because each participant may behave differently under different environments and interactions. This key is to effectively model the interlaced influence from both spatial context and temporal context. Existing work usually encodes these two types of context separately, which would lead to inferior modeling of the scenarios. In this paper, we first propose a unified approach to treat time and space dimensions equally for modeling spatio-temporal context. The proposed module is simple and easy to implement within several lines of codes. In contrast to existing methods which heavily rely on recurrent neural network for temporal context and hand-crafted structure for spatial context, our method could automatically partition the spatio-temporal space to adapt to the data. Lastly, we test our proposed framework on two recently proposed trajectory prediction dataset ApolloScape and Argoverse. We show that the proposed method substantially outperforms the previous state-of-the-art methods while maintaining its simplicity. These encouraging results further validate the superiority of our approach.
|
|
12:15-12:30, Paper TuBT19.3 | |
>Automatic Failure Recovery and Re-Initialization for Online UAV Tracking with Joint Scale and Aspect Ratio Optimization |
|
Ding, Fangqiang | Tongji University |
Fu, Changhong | Tongji University |
Li, Yiming | Tongji University |
Jin, Jin | Tongji University |
Feng, Chen | New York University |
Keywords: Visual-Based Navigation, Computer Vision for Other Robotic Applications, Visual Learning
Abstract: Current unmanned aerial vehicle (UAV) visual tracking algorithms are primarily limited with respect to: (i) the kind of size variation they can deal with, (ii) the implementation speed which hardly meets the real-time requirement. In this work, a real-time UAV tracking algorithm with powerful size estimation ability is proposed. Specifically, the overall tracking task is allocated to two 2D filters: (i) translation filter for location prediction in the space domain, (ii) size filter for scale and aspect ratio optimization in the size domain. Besides, an efficient two-stage re-detection strategy is introduced for long-term UAV tracking tasks. Large-scale experiments on four UAV benchmarks demonstrate the superiority of the presented method which has computation feasibility on a low-cost CPU.
|
|
12:30-12:45, Paper TuBT19.4 | |
>Asynchronous Event-Based Line Tracking for Time-To-Contact Maneuvers in UAS |
> Video Attachment
|
|
Gómez Eguíluz, Augusto | University of Seville |
Rodriguez-Gomez, Juan Pablo | University of Seville |
Martinez-de-Dios, Jose Ramiro | University of Seville |
Ollero, Anibal | University of Seville |
Keywords: Aerial Systems: Perception and Autonomy, Computer Vision for Other Robotic Applications
Abstract: This paper presents an bio-inspired event-based perception scheme for agile aerial robot maneuvering. It tries to mimic birds, which perform purposeful maneuvers by closing the separation in the retinal image (w.r.t. the goal) to follow time-to-contact trajectories. The proposed approach is based on event cameras, also called artificial retinas, which provide fast response and robustness against motion blur and lighting conditions. Our scheme guides the robot by only adjusting the position of features extracted in the event image plane to their goal positions at a predefined time using smooth time-to-contact trajectories. The proposed scheme is robust, efficient and can be added on top of commonly-used aerial robot velocity controllers. It has been validated on-board a UAV with real-time computation in low-cost hardware during sets of experiments with different descent maneuvers and lighting conditions.
|
|
12:45-13:00, Paper TuBT19.5 | |
>Enhanced Transfer Learning for Autonomous Driving with Systematic Accident Simulation |
|
Akhauri, Shivam | University of Maryland College Park |
Zheng, Laura | University of Maryland, College Park |
Lin, Ming C. | University of Maryland at College Park |
Keywords: Collision Avoidance, Transfer Learning, Autonomous Agents
Abstract: Simulation data can be utilized to extend real-world driving data in order to cover edge cases, such as vehicle accidents. The importance of handling edge cases can be observed in the high societal costs in handling car accidents, as well as potential dangers to human drivers. In order to cover a wide and diverse range of all edge cases, we systemically parameterize and simulate the most common accident scenarios. By applying this data to autonomous driving models, we show that transfer learning on simulated data sets provide better generalization and collision avoidance, as compared to random initialization methods. Our results illustrate that information from a model trained on simulated data can be inferred to a model trained on real-world data, indicating the potential influence of simulation data in real world models and advancements in handling of anomalous driving scenarios.
|
|
13:00-13:15, Paper TuBT19.6 | |
>A Framework for Online Updates to Safe Sets for Uncertain Dynamics |
> Video Attachment
|
|
Shih, Jennifer | Uc Berkeley |
Meier, Franziska | Facebook |
Rai, Akshara | Facebook AI Research |
Keywords: Collision Avoidance, Robot Safety, Reinforecment Learning
Abstract: Safety is crucial for deploying robots in the real world. One way of reasoning about safety of robots is by building safe sets through Hamilton-Jacobi (HJ) reachability. However, safe sets are often computed offline, assuming perfect knowledge of the dynamics, due to high compute time. In the presence of uncertainty, the safe set computed offline becomes inaccurate online, potentially leading to dangerous situations on the robot. We propose a novel framework to learn a safe control policy in simulation, and use it to generate online safe sets under uncertain dynamics. We start with a conservative safe set and update it online as we gather more information about the robot dynamics. We also show an application of our framework to a model-based reinforcement learning problem, proposing a safe model-based RL setup. Our framework enables robots to simultaneously learn about their dynamics, accomplish tasks, and update their safe sets. It also generalizes to complex high- dimensional dynamical systems, like 3-link manipulators and quadrotors, and reliably avoids obstacles, while achieving a task, even in the presence of unmodeled noise.
|
|
13:00-13:15, Paper TuBT19.7 | |
>Nonlinear MPC for Collision Avoidance and Control of UAVs with Dynamic Obstacles |
> Video Attachment
|
|
Lindqvist, Björn | Luleå University of Technology |
Mansouri, Sina Sharif | Lulea University of Technology |
Agha-mohammadi, Ali-akbar | NASA-JPL, Caltech |
Nikolakopoulos, George | Luleå University of Technology |
Keywords: Collision Avoidance, Aerial Systems: Applications
Abstract: This article proposes a Novel Nonlinear Model Predictive Control (NMPC) for navigation and obstacle avoidance of an Unmanned Aerial Vehicle (UAV). The proposed NMPC formulation allows for a fully parametric obstacle trajectory, while in this article we apply a classification scheme to differentiate between different kinds of trajectories to predict future obstacle positions. The trajectory calculation is done from an initial condition, and fed to the NMPC as an additional input. The solver used is the nonlinear, non-convex solver Proximal Averaged Newton for Optimal Control (PANOC) and its associated software OpEn (Optimization Engine), in which we apply a penalty method to properly consider the obstacles and other constraints during navigation. The proposed NMPC scheme allows for real-time solutions using a sampling time of 50 ms and a two second prediction of both the obstacle trajectory and the NMPC problem, which implies that the scheme can be considered as a local path-planner. This paper will present the NMPC cost function and constraint formulation, as well as the methodology of dealing with the dynamic obstacles. We include multiple laboratory experiments to demonstrate the efficacy of the proposed control architecture, and to show that the proposed method delivers fast and computationally stable solutions to the dynamic obstacle avoidance scenarios.
|
|
TuBT20 |
Room T20 |
Learning for Mapping and Navigation |
Regular session |
Chair: Hollinger, Geoffrey | Oregon State University |
Co-Chair: Bauer, Daniel | RTWH |
|
11:45-12:00, Paper TuBT20.1 | |
>DMLO: Deep Matching LiDAR Odometry |
|
Li, Zhichao | Tusimple.ai |
Wang, Naiyan | TuSimple |
Keywords: Novel Deep Learning Methods, SLAM
Abstract: LiDAR odometry is a fundamental task for various areas such as robotics, autonomous driving. This problem is difficult since it requires the systems to be highly robust running in noisy real-world data. Existing methods are mostly local iterative methods. Feature-based global registration methods are not preferred since extracting accurate matching pairs in the nonuniform and sparse LiDAR data remains challenging. In this paper, we present Deep Matching LiDAR Odometry (DMLO), a novel learning-based framework which makes the feature matching method applicable to LiDAR odometry task. Unlike many recent learning-based methods, DMLO explicitly enforces geometry constraints in the framework. Specifically, DMLO decomposes the 6-DoF pose estimation into two parts, a learning-based matching network which provides accurate correspondences between two scans and rigid transformation estimation with a close-formed solution by Singular Value Decomposition (SVD). Comprehensive experimental results on real-world datasets KITTI and Argoverse demonstrate that our DMLO dramatically outperforms existing learning-based methods and comparable with the state-of-the-art geometry-based approaches.
|
|
12:00-12:15, Paper TuBT20.2 | |
>Accurate and Robust Teach and Repeat Navigation by Visual Place Recognition: A CNN Approach |
> Video Attachment
|
|
Camara, Luis G. | CIIRC CTU Prague |
Pivoňka, Tomáš | Czech Institute of Informatics, Robotics and Cybernetics |
Jilek, Martin | Czech Technical University in Prague |
Gäbert, Carl | Czech Institute of Informatics, Robotics and Cybernetics |
Kosnar, Karel | Czech Technical University in Prague |
Preucil, Libor | Czech Technical University in Prague |
Keywords: Localization, Deep Learning for Visual Perception, Visual Servoing
Abstract: We propose a novel teach-and-repeat navigation system, SSM-Nav, which is based on the output of the recently introduced SSM visual place recognition methodology. During the teach phase, a teleoperated wheeled robot stores in a database features of images taken along an arbitrary route. During the repeat phase or navigation, a CNN-based comparison of each captured image is performed against the database. With the help of a particle filter, the image associated with the most likely location is selected at each time and its horizontal offset with respect to the current scene used to correct the steering of the robot and to navigate. Indoor tests in our lab show a maximum error of less than 10 cm and excellent robustness to perturbations such as drastic changes in illumination, lateral displacements, different starting positions, or even kidnapping. Preliminary outdoor tests on a 0.22 km route show promising results, with an estimated maximum error of less than 25 cm.
|
|
12:15-12:30, Paper TuBT20.3 | |
>Self-Supervised Simultaneous Alignment and Change Detection |
|
Furukawa, Yukuko | National Institute of Advanced Industrial Science and Technology |
Suzuki, Kumiko | The National Institute of Advanced Industrial Science and Techno |
Hamaguchi, Ryuhei | National Institute of Advanced Industrial Science and Technology |
Onishi, Masaki | National Inst. of AIST |
Sakurada, Ken | National Institute of Advanced Industrial Science and Technology |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Recognition
Abstract: This study proposes a self-supervised method foretecting scene changes from an image pair. For mobile cameras such as drive recorders, to alleviate the camera viewpoints' difference, image alignment and change detection must be optimized simultaneously because they depend on each other. Moreover, lighting condition makes the scene change detection more difficult because it widely varies in images taken at different times. To solve these challenges, we propose a selfsupervised simultaneous alignment and change detection network (SACD-Net). The proposed network is robust specifically in differences of camera viewpoints and lighting conditions to simultaneously estimate warping parameters and multi-scale change probability maps while change regions are not taken into account of calculation of the feature consistency and semantic losses. Based on comparative analysis between our self-supervised and the previous supervised models as well as ablation study of the losses of SACD-Net, the results show the effectiveness of the proposed method using a synthetic dataset and our new real dataset.
|
|
12:30-12:45, Paper TuBT20.4 | |
>Deep Inverse Sensor Models As Priors for Evidential Occupancy Mapping |
> Video Attachment
|
|
Bauer, Daniel | RTWH |
Kuhnert, Lars | University of Siegen |
Keywords: Deep Learning for Visual Perception, Mapping
Abstract: With the recent boost in autonomous driving, increased attention has been paid on radars as an input for occupancy mapping. Besides their many benefits, the inference of occupied space based on radar detections is notoriously difficult because of the data sparsity and the environment dependent noise (e.g. multipath reflections). Recently, deep learning-based inverse sensor models, from here on called deep ISMs, have been shown to improve over their geometric counterparts in retrieving occupancy information cite{weston2018probably,sless2019road,bauer2019deep}. Nevertheless, these methods perform a data-driven interpolation which has to be verified later on in the presence of measurements. In this work, we describe a novel approach to integrate deep ISMs together with geometric ISMs into the evidential occupancy mapping framework. Our method leverages both the capabilities of the data-driven approach to initialize cells not yet observable for the geometric model effectively enhancing the perception field and convergence speed, while at the same time use the precision of the geometric ISM to converge to sharp boundaries. We further define a lower limit on the deep ISM estimate's certainty together with analytical proofs of convergence which we use to distinguish cells that are solely allocated by the deep ISM from cells already verified using the geometric approach.
|
|
12:45-13:00, Paper TuBT20.5 | |
>Online Exploration of Tunnel Networks Leveraging Topological CNN-Based World Predictions |
|
Saroya, Manish | Oregon State University |
Best, Graeme | Oregon State University |
Hollinger, Geoffrey | Oregon State University |
Keywords: Novel Deep Learning Methods, Reactive and Sensor-Based Planning, Mining Robotics
Abstract: Robotic exploration requires adaptively selecting navigation goals that result in the rapid discovery and mapping of an unknown world. In many real-world environments, subtle structural cues can provide insight about the unexplored world, which may be exploited by a decision maker to improve the speed of exploration. In sparse subterranean tunnel networks, these cues come in the form of topological features, such as loops or dead-ends, that are often common across similar environments. We propose a method for learning these topological features using techniques borrowed from topological image segmentation and image inpainting to learn from a database of worlds. These world predictions then inform a frontier-based exploration policy. Our simulated experiments with a set of real-world mine environments and a database of procedurally-generated artificial tunnel networks demonstrate a substantial increase in the rate of area explored compared to techniques that do not attempt to predict and exploit topological features of the unexplored world.
|
|
13:00-13:15, Paper TuBT20.6 | |
>Building Energy-Cost Maps from Aerial Images and Ground Robot Measurements with Semi-Supervised Deep Learning |
|
Wei, Minghan | University of Minnesota |
Isler, Volkan | University of Minnesota |
Keywords: Energy and Environment-Aware Automation, Motion and Path Planning, Field Robots
Abstract: Planning energy-efficient paths is an important capability in many robotics applications. Obtaining an energy-cost map for a given environment enables planning such paths between any given pair of locations within the environment. However, efficiently building an energy map is challenging, especially for large environments. Some of the prior work uses physics-based laws (friction and gravity force) to model energy costs across environments. These methods work well for uniform surfaces, but they do not generalize well to uneven terrains. In this paper, we present a method to address this mapping problem in a data-driven fashion for the cases where an aerial image of the environment can be obtained. To efficiently build an energy-cost map, we train a neural network that learns to predict the complete energy maps by combining aerial images and sparse ground robot energy-consumption measurements. Field experiments are performed to validate our results. We show that our method can efficiently build an energy-cost map accurately even across different types of ground robots.
|
|
TuBT21 |
Room T21 |
Learning for Navigation |
Regular session |
Chair: Michmizos, Konstantinos | Rutgers University |
Co-Chair: Kanezaki, Asako | National Institute of Advanced Industrial Science and Technology |
|
11:45-12:00, Paper TuBT21.1 | |
>Learning Local Planners for Human-Aware Navigation in Indoor Environments |
> Video Attachment
|
|
Güldenring, Ronja | Mobile Industrial Robots ApS |
Görner, Michael | University of Hamburg |
Hendrich, Norman | University of Hamburg |
Jacobsen, Niels Jul | Mobile Industrial Robots A/S |
Zhang, Jianwei | University of Hamburg |
Keywords: Autonomous Vehicle Navigation, Reinforecment Learning, Intelligent Transportation Systems
Abstract: Established indoor robot navigation frameworks build on the separation between global and local planners. Whereas global planners rely on traditional graph search algorithms, local planners are expected to handle driving dynamics and resolve minor conflicts. We present a system to train neural-network policies for such a local planner component, explicitly accounting for humans navigating the space. DRL-agents are trained in randomized virtual 2D environments with simulated human interaction. Transferability to the real world is achieved through sufficiently abstract state representations, relying on 2D lidar. The trained agents can be deployed as a drop-in replacement for other local planners and significantly improve on traditional implementations. Performance is demonstrated on a MiR-100 transport robot.
|
|
12:00-12:15, Paper TuBT21.2 | |
>Efficient Exploration in Constrained Environments with Goal-Oriented Reference Path |
|
Ota, Kei | Mitsubishi Electric |
Sasaki, Yoko | National Inst. of Advanced Industrial Science and Technology |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Yoshiyasu, Yusuke | CNRS-AIST JRL |
Kanezaki, Asako | National Institute of Advanced Industrial Science and Technology |
Keywords: Reinforecment Learning, Motion and Path Planning, AI-Based Methods
Abstract: In this paper, we consider the problem of building learning agents that can efficiently learn to navigate in constrained environments. The main goal is to design agents that can efficiently learn to understand and generalize to different environments using high-dimensional inputs (a 2D map), while following feasible paths that avoid obstacles in obstacle-cluttered environment. To achieve this, we make use of traditional path planning algorithms, supervised learning, and reinforcement learning algorithms in a synergistic way. The key idea is to decouple the navigation problem into planning and control, the former of which is achieved by supervised learning whereas the latter is done by reinforcement learning. Specifically, we train a deep convolutional network that can predict collision-free paths based on a map of the environment- this is then used by an reinforcement learning algorithm to learn to closely follow the path. This allows the trained agent to achieve good generalization while learning faster. We test our proposed method in the recently proposed Safety Gym suite that allows testing of safety-constraints during training of learning agents. We compare our proposed method with existing work and show that our method consistently improves the sample efficiency and generalization capability to novel environments.
|
|
12:15-12:30, Paper TuBT21.3 | |
>Multiplicative Controller Fusion: Leveraging Algorithmic Priors for Sample-Efficient Reinforcement Learning and Safe Sim-To-Real Transfer |
> Video Attachment
|
|
Rana, Krishan | Queensland University of Technology |
Dasagi, Vibhavari | Queensland University of Technology |
Talbot, Ben | Queensland University of Technology |
Milford, Michael J | Queensland University of Technology |
Sünderhauf, Niko | Queensland University of Technology |
Keywords: Reactive and Sensor-Based Planning, Reinforecment Learning, Collision Avoidance
Abstract: Learning-based approaches often outperform hand-coded algorithmic solutions for many problems in robotics. However, learning long-horizon tasks on real robot hardware can be intractable, and transferring a learned policy from simulation to reality is still extremely challenging. We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions as an algorithmic prior during training and deployment. During training, our gated fusion approach enables the prior to guide the initial stages of exploration, increasing sample-efficiency and enabling learning from sparse long-horizon reward signals. Importantly, the policy can learn to improve beyond the performance of the sub-optimal prior since the prior's influence is annealed gradually. During deployment, the policy's uncertainty provides a reliable strategy for transferring a simulation-trained policy to the real world by falling back to the prior controller in uncertain states. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation and demonstrate safe transfer from simulation to the real world without any fine tuning.
|
|
12:30-12:45, Paper TuBT21.4 | |
>Reinforcement Learning-Based Hierarchical Control for Path Following of a Salamander-Like Robot |
> Video Attachment
|
|
Zhang, Xueyou | Nankai University |
Guo, Xian | Nankai University |
Fang, Yongchun | Nankai University |
Zhu, Wei | Nankai University |
Keywords: Biologically-Inspired Robots, Reinforecment Learning, Legged Robots
Abstract: Path following is a challenging task for legged robots. In this paper, we present a hierarchical control architecture for path following of a quadruped salamander-like robot, in which, the tracking problem is decomposed into two sub-tasks: high-level policy learning based on the framework of reinforcement learning (RL) and low-level traditional controller design. More specifically, the high-level policy is learned in a physics simulator with a low-level controller designed in advance. To improve the tracking accuracy and to eliminate static errors, a soft Actor-Critic algorithm with state integral compensation is proposed. Additionally, to enhance the generalization and transferability,a compact state representation, which only contains the information of the target path and the abstract action similar to front-back and left-right, is proposed. The proposed algorithm is trained offline in the simulation environment and tested on the self-developed real quadruped salamander-like robot for different path following tasks. Simulation and experiments results validate the satisfactory performance of the proposed method.
|
|
12:45-13:00, Paper TuBT21.5 | |
>Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning |
> Video Attachment
|
|
Qiao, Zhiqian | Carnegie Mellon University |
Tyree, Zachariah | General Motors Research and Development |
Mudalige, Priyantha | General Motors |
Schneider, Jeff | Carnegie Mellon University |
Dolan, John M. | Carnegie Mellon University |
Keywords: Behavior-Based Systems, Reinforecment Learning, Autonomous Agents
Abstract: Behavioral decision making is an important aspect of autonomous vehicles (AV). In this work, we propose a behavior planning structure based on hierarchical reinforcement learning (HRL) which is capable of performing autonomous vehicle planning tasks in simulated environments with multiple sub-goals. In this hierarchical structure, the network is capable of 1) learning one task with multiple sub-goals simultaneously; 2) extracting attentions of states according to changing sub-goals during the learning process; 3) reusing the well-trained network of sub-goals for other tasks with the same sub-goals. A hybrid reward mechanism is designed for different hierarchical layers in the proposed HRL structure. Compared to traditional RL methods, our algorithm is more sample-efficient, since its modular design allows reusing the policies of sub-goals across similar tasks for various transportation scenarios. The results show that the proposed method converges to an optimal policy faster than traditional RL methods.
|
|
13:00-13:15, Paper TuBT21.6 | |
>Reinforcement Co-Learning of Deep and Spiking Neural Networks for Energy-Efficient Mapless Navigation with Neuromorphic Hardware |
> Video Attachment
|
|
Tang, Guangzhi | Rutgers University |
Kumar, Neelesh | Rutgers University |
Michmizos, Konstantinos | Rutgers University |
Keywords: Neurorobotics, Reinforecment Learning, Motion and Path Planning
Abstract: Energy-efficient mapless navigation is crucial for mobile robots as they explore unknown environments with limited on-board resources. Although the recent deep reinforcement learning (DRL) approaches have been successfully applied to navigation, their high energy consumption limits their use in several robotic applications. Here, we propose a neuromorphic approach that combines the energy-efficiency of spiking neural networks with the optimality of DRL and benchmark it in learning control policies for mapless navigation. Our hybrid framework, spiking deep deterministic policy gradient (SDDPG), consists of a spiking actor network (SAN) and a deep critic network, where the two networks were trained jointly using gradient descent. The co-learning enabled synergistic information exchange between the two networks, allowing them to overcome each other's limitations through a shared representation learning. To evaluate our approach, we deployed the trained SAN on Intel's Loihi neuromorphic processor. When validated on simulated and real-world complex environments, our method on Loihi consumed 75 times less energy per inference as compared to DDPG on Jetson TX2, and also exhibited a higher rate of successful navigation to the goal, which ranged from 1% to 4.2% and depended on the forward-propagation timestep size. These results reinforce our ongoing efforts to design brain-inspired algorithms for controlling autonomous robots with neuromorphic hardware.
|
|
TuBT22 |
Room T22 |
RL for Navigation and Locomotion |
Regular session |
Chair: Meger, David Paul | McGill University |
Co-Chair: Ruiz-del-Solar, Javier | Universidad De Chile |
|
11:45-12:00, Paper TuBT22.1 | |
>Learning Agile Locomotion Via Adversarial Training |
> Video Attachment
|
|
Tang, Yujin | Google |
Tan, Jie | Google |
Harada, Tatsuya | The University of Tokyo |
Keywords: Reinforecment Learning, Multi-Robot Systems, Legged Robots
Abstract: Developing controllers for agile locomotion is a long-standing challenge for legged robots. Reinforcement learning (RL) and Evolution Strategy (ES) hold the promise of automating the design process of such controllers. However, dedicated and careful human effort is required to design training environments to promote agility. In this paper, we present a multi-agent learning system, in which a quadruped robot (protagonist) learns to chase another robot (adversary) while the latter learns to escape. We find that this adversarial training process not only encourages agile behaviors but also effectively alleviates the laborious environment design effort. In contrast to prior works that used only one adversary, we find that training an ensemble of adversaries, each of which specializes in a different escaping strategy, is essential for the protagonist to master agility. Through extensive experiments, we show that the locomotion controller learned with adversarial training significantly outperforms carefully designed baselines.
|
|
12:00-12:15, Paper TuBT22.2 | |
>Stochastic Grounded Action Transformation for Robot Learning in Simulation |
> Video Attachment
|
|
Desai, Siddharth | The University of Texas at Austin |
Karnan, Haresh | The University of Texas at Austin |
Hanna, Josiah | The University of Texas at Austin |
Warnell, Garrett | U.S. Army Research Laboratory |
Stone, Peter | University of Texas at Austin |
Keywords: Reinforecment Learning, Humanoid and Bipedal Locomotion, Transfer Learning
Abstract: Robot control policies learned in simulation do not often transfer well to the real world. Many existing solutions to this sim-to-real problem, such as the Grounded Action Transformation (GAT) algorithm, seek to correct for—or ground—these differences by matching the simulator to the real world. However, the efficacy of these approaches is limited if they do not explicitly account for stochasticity in the target environment. In this work, we analyze the problems associated with grounding a deterministic simulator in a stochastic real world environment, and we present examples where GAT fails to transfer a good policy due to stochastic transitions in the target domain. In response, we introduce the Stochastic Grounded Action Transformation (SGAT) algorithm, which models this stochasticity when grounding the simulator. We find experimentally—for both simulated and physical target domains—that SGAT can find policies that are robust to stochasticity in the target domain.
|
|
12:15-12:30, Paper TuBT22.3 | |
>Learning Domain Randomization Distributions for Training Robust Locomotion Policies |
|
Mozifian, Melissa | McGill University |
Gamboa Higuera, Juan Camilo | McGill University |
Meger, David Paul | McGill University |
Dudek, Gregory | McGill University |
Keywords: Reinforecment Learning, Transfer Learning
Abstract: This paper considers the problem of learning behaviors in simulation without knowledge of the precise dynamical properties of the target robot platform(s). In this context, our learning goal is to mutually maximize task efficacy on each environment considered and generalization across the widest possible range of environmental conditions. The physical parameters of the simulator are modified by a component of our technique that learns the emph{Domain Randomization} (DR) that is appropriate at each learning epoch to maximally challenge the current behavior policy, without being overly challenging, which can hinder learning progress. This so-called sweet spot distribution is a selection of simulated domains with the following properties: 1) The trained policy should be successful in environments sampled from the domain randomization distribution; and 2) The DR distribution made as wide as possible, to increase variability in the environments. These properties aim to ensure the trajectories encountered in the target system are close to those observed during training, as existing methods in machine learning are better suited for interpolation than extrapolation. We show how adapting the DR distribution while training context-conditioned policies results in improvements on jump-start and asymptotic performance when transferring a learned policy to the target environment. Our code is available at: url{https://github.com/melfm/lsdr}.
|
|
12:30-12:45, Paper TuBT22.4 | |
>Robust RL-Based Map-Less Local Planning: Using 2D Point Clouds As Observations |
|
Leiva, Francisco | Universidad De Chile |
Ruiz-del-Solar, Javier | Universidad De Chile |
Keywords: Reinforecment Learning, Reactive and Sensor-Based Planning
Abstract: In this paper, we propose a robust approach to train map-less navigation policies that rely on variable size 2D point clouds, using Deep Reinforcement Learning (Deep RL). The navigation policies are trained in simulations using the DDPG algorithm. Through experimental evaluations in simulated and real-world environments, we showcase the benefits of our approach when compared to more classical RL-based formulations: better performance, the possibility to interchange sensors at deployment time, and to easily augment the environment observability through sensor preprocessing and/or sensor fusion. Videos showing trajectories traversed by agents trained with the proposed approach can be found in https://youtu.be/AzvRJyN6rwQ.
|
|
12:45-13:00, Paper TuBT22.5 | |
>Deep Reinforcement Learning for Safe Local Planning of a Ground Vehicle in Unknown Rough Terrain |
> Video Attachment
|
|
Josef, Shirel | Technion - Israel Institute of Technology |
Degani, Amir | Technion - Israel Institute of Technology |
Keywords: Reinforecment Learning, Autonomous Vehicle Navigation, Motion and Path Planning
Abstract: Safe unmanned ground vehicle navigation in unknown rough terrain is crucial for various tasks such as exploration, search and rescue and agriculture. Offline global planning is often not possible when operating in harsh, unknown environments, and therefore, online local planning must be used. Most online rough terrain local planners require heavy computational resources, used for optimal trajectory searching and estimating vehicle orientation in positions within the range of the sensors. In this work, we present a deep reinforcement learning approach for local planning in unknown rough terrain with zero-range to local-range sensing, achieving superior results compared to potential fields or local motion planning search spaces methods. Our approach includes reward shaping which provides a dense reward signal. We incorporate self-attention modules into our deep reinforcement learning architecture in order to increase the explainability of the learnt policy. The attention modules provide insight regarding the relative importance of sensed inputs during training and planning. We extend and validate our approach in a dynamic simulation, demonstrating successful safe local planning in environments with a continuous terrain and a variety of discrete obstacles. By adding the geometric transformation between two successive timesteps and the corresponding action as inputs, our architecture is able to navigate on surfaces with different levels of friction.
|
|
13:00-13:15, Paper TuBT22.6 | |
>Exploration Strategy Based on Validity of Actions in Deep Reinforcement Learning |
|
Yoon, Hyungsuk | Seoul National University |
Lee, Sang-Hyun | Seoul National University |
Seo, Seung-Woo | Seoul National University |
Keywords: Reinforecment Learning, Autonomous Vehicle Navigation, Motion and Path Planning
Abstract: How to explore environments is one of the most critical factors for the performance of an agent in reinforcement learning. Conventional exploration strategies such as epsilon-greedy algorithm and Gaussian exploration noise simply depend on pure randomness. However, it is required for an agent to consider its training progress and long-term usefulness of actions to efficiently explore complex environments, which remains a major challenge in reinforcement learning. To address this challenge, we propose a novel exploration method that selects actions based on their validity. The key idea behind our method is to estimate the validity of actions by leveraging zero avoiding property of kullback-leibler divergence to comprehensively evaluate actions in terms of both exploration and exploitation. We also introduce a framework that allows an agent to explore efficiently in environments where reward is sparse or cannot be defined intuitively. The framework uses expert demonstrations to guide an agent to visit task-relevant state space by combining our exploration strategy with imitation learning. We demonstrate our exploration strategy on several tasks ranging from classical control tasks to high-dimensional urban autonomous driving scenarios at roundabout. The results show that our exploration strategy encourages an agent to visit task-relevant state space to enhance validity of actions, outperforming several previous methods.
|
|
13:00-13:15, Paper TuBT22.7 | |
>Autonomous Exploration under Uncertainty Via Deep Reinforcement Learning on Graphs |
> Video Attachment
|
|
Chen, Fanfei | Stevens Institute of Technology |
Martin, John D. | Stevens Institute of Technology |
Huang, Yewei | Stevens Institute of Technology |
Wang, Jinkun | Stevens Institute of Technology |
Englot, Brendan | Stevens Institute of Technology |
Keywords: Reactive and Sensor-Based Planning, Reinforecment Learning, Sensor-based Control
Abstract: We consider an autonomous exploration problem in which a range sensing mobile robot is tasked with accurately mapping the landmarks in an a priori unknown environment efficiently in real-time; it must choose sensing actions that both curb localization uncertainty and achieve information gain. For this problem, belief space planning methods that forward-simulate robot sensing and estimation may often fail in real-time implementation, scaling poorly with increasing size of the state, belief and action spaces. We propose a novel approach that uses graph neural networks (GNNs) in conjunction with deep reinforcement learning (DRL), enabling decision-making over graphs containing exploration information to predict a robot's optimal sensing action in belief space. The policy, which is trained in different random environments without human intervention, offers a real-time, scalable decision-making process whose high-performance exploratory sensing actions yield accurate maps and high rates of information gain.
|
|
TuBT23 |
Room T23 |
Semantic Mapping and Navigation |
Regular session |
Chair: Wang, Danwei | Nanyang Technological University |
Co-Chair: Buerger, Stephen P. | Sandia National Laboratories |
|
11:45-12:00, Paper TuBT23.1 | |
>No Map, No Problem: A Local Sensing Approach for Navigation in Human-Made Spaces Using Signs |
|
Liang, Claire Yilan | Cornell University |
Knepper, Ross | -- |
Pokorny, Florian T. | KTH Royal Institute of Technology |
Keywords: Reactive and Sensor-Based Planning, Human-Centered Robotics, Service Robotics
Abstract: Robot navigation in human spaces today largely relies on the construction of precise geometric maps and a global motion plan. In this work, we navigate with only local sensing by using available signage --- as designed for humans --- in human-made environments such as airports. We propose a formalization of ``signage'' and define 4 levels of signage that we call complete, fully-specified, consistent and valid. The signage formalization can be used on many space skeletonizations, but we specifically provide an approach for navigation on the medial axis. We prove that we can achieve global completeness guarantees without requiring a global map to plan. We validate with two sets of experiments: (1) with real-world airports and their real signs and (2) real New York City neighborhoods. In (1) we show we can use real-world airport signage to improve on a simple random-walk approach, and we explore augmenting signage to further explore signs' impact on trajectory length. In (2), we navigate in varied sized subsets of New York City to show that, since we only use local sensing, our approach scales linearly with trajectory length rather than freespace area.
|
|
12:00-12:15, Paper TuBT23.2 | |
>Rapid Autonomous Semantic Mapping |
|
Parikh, Anup | Sandia National Laboratories |
Koch, Mark | Sandia National Laboratories |
Blada, Timothy | Sandia National Laboratories |
Buerger, Stephen P. | Sandia National Laboratories |
Keywords: Mapping, Semantic Scene Understanding, Task Planning
Abstract: A semantic understanding of the environment is needed to enable high level autonomy in robotic systems. Recent results have demonstrated rapid progress in underlying technology areas, but few results have been reported on end-to-end systems that enable effective autonomous perception in complex environments. In this paper, we describe an approach for rapidly and autonomously mapping unknown environments with integrated semantic and geometric information. We use surfel-based RGB-D SLAM techniques, with incremental object segmentation and classification methods to update the map in realtime. Information theoretic and heuristic measures are used to quickly plan sensor motion and drive down map uncertainty. Preliminary experimental results in simple and cluttered environments are reported.
|
|
12:15-12:30, Paper TuBT23.3 | |
>Lifelong Update of Semantic Maps in Dynamic Environments |
|
Narayana, Manjunath | IRobot Corp |
Kolling, Andreas | Amazon |
Nardelli, Lucio | IRobot |
Fong, Philip | IRobot |
Keywords: Mapping, SLAM, Visual-Based Navigation
Abstract: A robot understands its world through the raw information it senses from its surroundings. This raw information is not suitable as a shared representation between the robot and its user. A semantic map, containing high-level information that both the robot and user understand, is better suited to be a shared representation. We use the semantic map as the user-facing interface on our fleet of floor-cleaning robots. Jitter in the robot's sensed raw map, dynamic objects in the environment, and exploration of new space by the robot are common challenges for robots. Solving these challenges effectively in the context of semantic maps is key to enabling semantic maps for lifelong mapping. First, as a robot senses new changes and alters its raw map in successive missions, the semantics must be updated appropriately. We update the map using a spatial transfer of semantics. Second, it is important to keep semantics and their relative constraints consistent even in the presence of dynamic objects. Inconsistencies are automatically determined and resolved through the introduction of a map layer of meta-semantics. Finally, a discovery phase allows the semantic map to be updated with new semantics whenever the robot uncovers new information. Deployed commercially on thousands of floor-cleaning robots in real homes, our user-facing semantic maps provide a intuitive user experience through a lifelong mapping robot.
|
|
12:30-12:45, Paper TuBT23.4 | |
>Efficient Object Search through Probability-Based Viewpoint Selection |
> Video Attachment
|
|
Hernandez Silva, Alejandra Carolina | University Carlos III of Madrid |
Derner, Erik | Czech Technical University in Prague |
Gomez, Clara | University Carlos III of Madrid |
Barber, Ramon | Universidad Carlos III of Madrid |
Babuska, Robert | Delft University of Technology |
Keywords: Service Robots, Semantic Scene Understanding
Abstract: The ability to search for objects is a precondition for various robotic tasks. In this paper, we address the problem of finding objects in partially known indoor environments. Using the knowledge of the floor plan and the mapped objects, we consider object–object and object–room co-occurrences as prior information for identifying promising locations where an unmapped object can be present. We propose an efficient search strategy that determines the best pose of the robot based on the analysis of the candidate locations. We optimize the probability of finding the target object and the distance travelled through a cost function. To evaluate our method, several experiments in simulated and real-world environments were performed. The results show that the robot successfully finds the target object in the environment while covering only a small portion of the search space. The real-world experiments with the TurtleBot 2 mobile robot validate the proposed approach and demonstrate that the method performs well also in real environments.
|
|
12:45-13:00, Paper TuBT23.5 | |
>Dense Incremental Metric-Semantic Mapping Via Sparse Gaussian Process Regression |
> Video Attachment
|
|
Zobeidi, Ehsan | University of California San Diego |
Koppel, Alec | University of Pennsylvania |
Atanasov, Nikolay | University of California, San Diego |
Keywords: Mapping, Semantic Scene Understanding, RGB-D Perception
Abstract: We develop an online probabilistic metric-semantic mapping approach for autonomous robots relying on streaming RGB-D observations. We cast this problem as a Bayesian inference task, requiring encoding both the geometric surfaces and semantic labels (e.g., chair, table, wall) of the unknown environment. We propose an online Gaussian Process (GP) training and inference approach, which avoids the complexity of GP classification by regressing a truncated signed distance function representation of the regions occupied by different semantic classes. Online regression is enabled through sparse GP approximation, compressing the training data to a finite set of inducing points, and through spatial domain partitioning into an Octree data structure with overlapping leaves. Our experiments demonstrate the effectiveness of this technique for large-scale probabilistic metric-semantic mapping of 3D environments. A distinguishing feature of our approach is that the generated maps contain full continuous distributional information about the geometric surfaces and semantic labels, making them appropriate for uncertainty-aware planning.
|
|
13:00-13:15, Paper TuBT23.6 | |
>Collaborative Semantic Perception and Relative Localization Based on Map Matching |
|
Yue, Yufeng | Nanyang Technological University |
Zhao, Chunyang | Nanyang Technological University |
Wen, Mingxing | Nanyang Technological University |
Wu, Zhenyu | Nanyang Technological University |
Wang, Danwei | Nanyang Technological University |
Keywords: Mapping, Semantic Scene Understanding, Cooperating Robots
Abstract: In order to enable a team of robots to operate successfully, retrieving accurate relative transformation between robots is the fundamental requirement. So far, most research on relative localization mainly focus on geometry features such as points, lines and planes. To address this problem, collaborative semantic map matching is proposed to perform semantic perception and relative localization. This paper performs semantic perception, probabilistic data association and nonlinear optimization within an integrated framework. Since the voxel correspondence between partial maps is a hidden variable, a probabilistic semantic data association algorithm is proposed based on Expectation-Maximization. Instead of specifying hard geometry data association, semantic and geometry association are jointly updated and estimated. The experimental verification on Semantic KITTI benchmarks demonstrate the improved robustness and accuracy.
|
|
TuCT1 |
Room T1 |
Performance Evaluation and Benchmarking |
Regular session |
Chair: Ye, Cang | Virginia Commonwealth University |
Co-Chair: Paull, Liam | Université De Montréal |
|
14:00-14:15, Paper TuCT1.1 | |
>3D Odor Source Localization Using a Micro Aerial Vehicle: System Design and Performance Evaluation |
> Video Attachment
|
|
Ercolani, Chiara | EPFL |
Martinoli, Alcherio | EPFL |
Keywords: Performance Evaluation and Benchmarking, Aerial Systems: Applications, Environment Monitoring and Management
Abstract: Finding chemical compounds in the air has applications when situations such as gas leaks, environmental emergencies and toxic chemical dispersion occur. Enabling robots to undertake this task would provide a powerful tool to prevent dangerous situations and assist humans when emergencies arise. While the dispersion of chemical compounds in the air is intrinsically a three-dimensional (3D) phenomenon, the scientific community tackled primarily two-dimensional (2D) scenarios so far. This is mainly due to the challenges of developing a platform able to successfully provide chemical compounds samples of a 3D space. In this paper, a 3D bio-inspired algorithm for odor source localization, previously validated in a controlled physical environment leveraging a robotic manipulator, is adapted for deployment on a micro aerial vehicle equipped with an odor sensor. Given the effect that the propellers have on a gas distribution, the algorithmic adaptation focused on enhancing the sensing strategy of the platform. Additionally, two sensor placement configurations are assessed to determine which one yields best sensing results. A performance evaluation in different environmental scenarios is carried out to test the robustness of the implementation. Two different localization systems are used for the performance evaluation experiments to quantify the impact of localization accuracy on the algorithm's outcome.
|
|
14:15-14:30, Paper TuCT1.2 | |
>BARK: Open Behavior Benchmarking in Multi-Agent Environments |
|
Bernhard, Julian | Fortiss GmbH |
Esterle, Klemens | Fortiss GmbH |
Hart, Patrick | Fortiss GmbH |
Kessler, Tobias | Fortiss GmbH |
Keywords: Performance Evaluation and Benchmarking, Agent-Based Systems, Planning, Scheduling and Coordination
Abstract: Predicting and planning interactive behaviors in complex traffic situations presents a challenging task. Especially, in scenarios having multiple traffic participants that interact densely, autonomous vehicles still struggle to interpret situations and to eventually achieve their own driving goal. As driving tests are costly and challenging scenarios are hard to find and reproduce, simulation is widely used to develop, test, and benchmark behavior models. However, most simulations rely on datasets and simplistic behavior models for traffic participants and do not cover the full complexity. In this work, we introduce our open-source behavior benchmarking environment BARK, that is designed to mitigate the above-stated shortcomings. In BARK, behavior models are (re-)used for planning, prediction, and simulation. Currently, there is a wide range of models available, such as an interaction-aware Monte-Carlo Tree Search and Reinforcement Learning-based behavior model. We use a public dataset and sampling-based scenario generation to show the inter-exchangeability of the behavior models. We evaluate how well the used models cope with interactions and how robust they are towards exchanging behavior models.
|
|
14:30-14:45, Paper TuCT1.3 | |
>The VCU-RVI Benchmark: Evaluating Visual Inertial Odometry for Indoor Navigation Applications with an RGB-D Camera |
> Video Attachment
|
|
Zhang, He | Virginia Commonwealth University |
Jin, Lingqiu | Virginia Commonwealth University |
Ye, Cang | Virginia Commonwealth University |
Keywords: Performance Evaluation and Benchmarking, Visual-Based Navigation, SLAM
Abstract: This paper presents VCU-RVI, a new visual inertial odometry (VIO) benchmark with a set of diverse data sequences in different indoor scenarios. The benchmark was captured using an Structure Core (SC) sensor, consisting of an RGB-D camera and an IMU. It provides aligned color and depth images with 640x480 resolution at 30 Hz. The camera’s data is synchronized with the IMU’s data at 100 Hz. Thirty-nine data sequences covering a total of ~3.7 kilometers trajectory were recorded in various indoor environments by two experimental setups: hand-holding the SC sensor or installing it on a wheeled robot. For the data sequences from the handheld SC, some were recorded in our laboratory under three challenging conditions: fast sensor motion, radical illumination changing, and dynamic objects, and the rest were collected in various indoor spaces outside the laboratory in the East Engineering Building, including corridors, halls, and stairways, during long-distance navigation scenarios. For the data sequences captured using the wheeled robot, half of them were recorded with sufficient IMU excitation in the beginning of the sequence, to meet the need of testing the VIO methods with the requirement of sufficient motion conditions for initialization. We placed three bumpers on the floor of the lab to create an uneven terrain to make the robot motion 6-DOF. The sequences also include data collected from navigational courses with a long trajectory. For trajectory evaluation, a motion capture system is used to generate accurate pose data (at a rate of 120 Hz), which will be used as the ground truth.We conducted experiments to evaluate the state-of-the-art VIO algorithms using our benchmark. These algorithms together with the evaluation tools and the VCU-RVI dataset are made publicly available.
|
|
14:45-15:00, Paper TuCT1.4 | |
>A Framework for Human-Robot Interaction User Studies |
> Video Attachment
|
|
Rajendran, Vidyasagar | University of Waterloo |
Carreno, Pamela | Monash University |
Fisher, Wesley | University of Waterloo |
Werner, Alexander | University of Waterloo |
Kulic, Dana | Monash University |
Keywords: Performance Evaluation and Benchmarking, Human-Centered Robotics, Physical Human-Robot Interaction
Abstract: Human-Robot Interaction (HRI) user studies are challenging to evaluate and compare due to a lack of standardization and the infrastructure required to implement each study. The lack of experimental infrastructure also makes it difficult to systematically evaluate the impact of individual components (e.g., the quality of perception software) on overall system performance. This work proposes a framework to ease the implementation and reproducibility of human-robot interaction user studies. The framework utilizes ROS middleware and is implemented with four modules: perception, decision, action, and metrics. The perception module aggregates sensor data to be used by the decision and action modules. The decision module is the task-level executive and can be designed by the HRI researcher for their specific task. The action module takes subtask requests from the decision module and breaks them down into motion primitives for execution on the robot. The metrics module tracks and generates quantitative metrics for the study. The framework is implemented with modular interfaces to allow for alternate implementations within each module and can be generalized for a variety of tasks and human/robot roles. The framework is illustrated through an example scenario involving a human and a Franka Emika Panda arm collaboratively assembling a toolbox together.
|
|
15:00-15:15, Paper TuCT1.5 | |
>Autonomous Vehicle Benchmarking Using Unbiased Metrics |
> Video Attachment
|
|
Paz, David | University of California, San Diego |
Lai, Po-Jung | University of California San Diego |
Chan, Nathan | UCSD |
Jiang, Yuqing | UC San Diego |
Christensen, Henrik Iskov | UC San Diego |
Keywords: Performance Evaluation and Benchmarking, Autonomous Vehicle Navigation, Robot Safety
Abstract: With the recent development of autonomous vehicle technology, there have been active efforts on the deployment of this technology at different scales that include urban and highway driving. While many of the prototypes showcased have been shown to operate under specific cases, little effort has been made to better understand their shortcomings and generalizability to new areas. Distance, uptime and number of manual disengagements performed during autonomous driving provide a high-level idea on the performance of an autonomous system but without proper data normalization, testing location information, and the number of vehicles involved in testing, the disengagement reports alone do not fully encompass system performance and robustness. Thus, in this study a complete set of metrics are applied for benchmarking autonomous vehicle systems in a variety of scenarios that can be extended for comparison with human drivers and other autonomous vehicle systems. These metrics have been used to benchmark UC San Diego’s autonomous vehicle platforms during early deployments for micro-transit and autonomous mail delivery applications.
|
|
15:15-15:30, Paper TuCT1.6 | |
>Integrated Benchmarking and Design for Reproducible and Accessible Evaluation of Robotic Agents |
|
Tani, Jacopo | Swiss Federal Institute of Technology in Zurich (ETH Zurich) |
Daniele, Andrea F | Toyota Technological Institute at Chicago |
Camus, Amaury | ETHZ |
Petrov, Aleksandar | ETH Zurich |
Courchesne, Anthony | Mila, Université De Montréal |
Mehta, Bhairav | Mila |
Suri, Rohit | ETH Zurich |
Bernasconi, Gianmarco | ETHZ |
Walter, Matthew | Toyota Technological Institute at Chicago |
Frazzoli, Emilio | ETH |
Paull, Liam | Université De Montréal |
Censi, Andrea | ETH Zürich & NuTonomy |
Keywords: Performance Evaluation and Benchmarking
Abstract: As robotics matures and increases in complexity, it is more necessary than ever that robot autonomy research be "reproducible". Compared to other sciences, there are specific challenges to benchmarking autonomy, such as the complexity of the software stacks, the variability of the hardware and the reliance on data-driven techniques, amongst others. In this paper, we describe a new concept for reproducible robotics research that integrates development and benchmarking, so that reproducibility is obtained "by design" from the beginning of the research/development processes. We first provide the overall conceptual objectives to achieve this goal and then a concrete instance that we have built: the DUCKIENet. One of the central components of this setup is the Duckietown Autolab, a remotely accessible standardized setup that is itself also relatively low-cost and reproducible. When evaluating agents, careful definition of interfaces allows users to choose among local versus remote evaluation using simulation, logs, or remote automated hardware setups. We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
|
|
TuCT2 |
Room T2 |
Robot Safety |
Regular session |
Chair: Sadigh, Dorsa | Stanford University |
Co-Chair: Spenko, Matthew | Illinois Institute of Technology |
|
14:00-14:15, Paper TuCT2.1 | |
>Provably Safe Trajectory Optimization in the Presence of Uncertain Convex Obstacles |
|
Dawson, Charles | MIT |
M. Jasour, Ashkan | MIT |
Hofmann, Andreas | MIT |
Williams, Brian | MIT |
Keywords: Motion and Path Planning, Robot Safety, Optimization and Optimal Control
Abstract: Real-world environments are inherently uncertain, and to operate safely in these environments robots must be able to plan around this uncertainty. In the context of motion planning, we desire systems that can maintain an acceptable level of safety as the robot moves, even when the exact locations of nearby obstacles are not known. In this paper, we solve this chance-constrained motion planning problem using a sequential convex optimization framework. To constrain the risk of collision incurred by planned movements, we employ geometric objects called epsilon-shadows to compute upper bounds on the risk of collision between the robot and uncertain obstacles. We use these epsilon-shadow-based estimates as constraints in a nonlinear trajectory optimization problem, which we then solve by iteratively linearizing the non-convex risk constraints. This sequential optimization approach quickly finds trajectories that accomplish the desired motion while maintaining a user-specified limit on collision risk. Our method can be applied to robots and environments with arbitrary convex geometry; even in complex environments, it runs in less than a second and provides provable guarantees on the safety of planned trajectories, enabling fast, reactive, and safe robot motion in realistic environments.
|
|
14:15-14:30, Paper TuCT2.2 | |
>Safety Considerations in Deep Control Policies with Safety Barrier Certificates under Uncertainty |
> Video Attachment
|
|
Hirshberg, Tom | Technion |
Vemprala, Sai | Texas A&M University |
Kapoor, Ashish | MicroSoft |
Keywords: Robot Safety, Collision Avoidance, Perception-Action Coupling
Abstract: Recent advances in Deep Machine Learning have shown promise in solving complex perception and control loops via methods such as reinforcement and imitation learning. However, guaranteeing safety for such learned deep policies has been a challenge due to issues such as partial observability and difficulties in characterizing the behavior of the neural networks. While a lot of emphasis in safe learning has been placed during training, it is non-trivial to guarantee safety at deployment or test time. This paper extends how under mild assumptions, Safety Barrier Certificates can be used to guarantee safety with deep control policies despite uncertainty arising due to perception and other latent variables. Specifically for scenarios where the dynamics are smooth and uncertainty has a finite support, the proposed framework wraps around an existing deep control policy and generates safe actions by dynamically evaluating and modifying the policy from the embedded network. Our framework utilizes control barrier functions to create spaces of control actions that are safe under uncertainty, and when the original actions are found to be in violation of the safety constraint, uses quadratic programming to minimally modify the original actions to ensure they lie in the safe set. Representations of the environment are built through Euclidean signed distance fields that are then used to infer the safety of actions and to guarantee forward invariance. We implement this method in simulation in a drone-racing environment and show that our method results in safer actions compared to a baseline that only relies on imitation learning to generate control actions.
|
|
14:30-14:45, Paper TuCT2.3 | |
>Infusing Reachability-Based Safety into Planning and Control for Multi-Agent Interactions |
|
Wang, Xinrui | Stanford University |
Leung, Karen | Stanford University |
Pavone, Marco | Stanford University |
Keywords: Robot Safety, Collision Avoidance, Path Planning for Multiple Mobile Robots or Agents
Abstract: Within a robot autonomy stack, the planner and controller are typically designed separately, and serve different purposes. As such, there is often a diffusion of responsibilities when it comes to ensuring safety for the robot. We propose that a planner and controller should share the same interpretation of safety but apply this knowledge in a different yet complementary way. To achieve this, we use Hamilton-Jacobi (HJ) reachability theory at the planning level to provide the robot planner with the foresight to avoid entering regions with possible inevitable collision. However, this alone does not guarantee safety. In conjunction with this HJ reachability-infused planner, we propose a minimally-interventional multi-agent safety-preserving controller also derived via HJ-reachability theory. The safety controller maintains safety for the robot without unduly impacting planner performance. We demonstrate the benefits of our proposed approach in a multi-agent highway scenario where a robot car is rewarded to navigate through traffic as fast as possible, and we show that our approach provides strong safety assurances yet achieves the highest performance compared to other safety controllers.
|
|
14:45-15:00, Paper TuCT2.4 | |
>Multi-Agent Safe Planning with Gaussian Processes |
|
Zhu, Zheqing | Stanford University |
Bıyık, Erdem | Stanford University |
Sadigh, Dorsa | Stanford University |
Keywords: Robot Safety, Multi-Robot Systems
Abstract: Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together. In such settings, we need to ensure the safety of not only each individual agent, but also the overall system. In this paper, we introduce a novel multi-agent safe learning algorithm that enables decentralized safe navigation when there are multiple different agents in the environment. This algorithm makes mild assumptions about other agents and is trained in a decentralized fashion, i.e. with very little prior knowledge about other agents' policies. Experiments show our algorithm performs well with the robots running other algorithms when optimizing various objectives.
|
|
15:00-15:15, Paper TuCT2.5 | |
>Safe Path Planning with Multi-Model Risk Level Sets |
> Video Attachment
|
|
Huang, Zefan | Singapore-MIT Alliance for Research and Technology |
Schwarting, Wilko | Massachusetts Institute of Technology (MIT) |
Pierson, Alyssa | Massachusetts Institute of Technology |
Hongliang, Guo | Singapore MIT Alliance of Research and Technology |
Ang Jr, Marcelo H | National University of Singapore |
Rus, Daniela | MIT |
Keywords: Robot Safety, Motion and Path Planning
Abstract: This paper investigates the safe path planning problem with large number of moving objects in the cluttered environments. Some of the objects can be detected and tracked very well with canonical perception algorithms, while some of the objects can only be roughly detected with LiDAR scan snapshot differences. For objects with good detection and tracking algorithms, we use a Gaussian Process (GP) regulated risk map to describe the risk map information; for objects with not-so-good detection and/or tracking results, we construct an overall occupancy and velocity field from LiDAR scan snapshots and use the results for risk level set (RLS) calculation. Several methods are proposed for combining the GP risk map and RLS, and the resultant hybrid risk map is used for the proposed safe path planning algorithm. Experimental results show that the hybrid risk map is able to yield a safe path planner to navigate the autonomous testbed within the cluttered environments.
|
|
15:15-15:30, Paper TuCT2.6 | |
>Localization Safety Validation for Autonomous Robots |
|
Duenas Arana, Guillermo | Illinois Institute of Technology |
Abdul Hafez, Osama | Illinois Institute of Technology |
Joerger, Mathieu | Virginia Tech |
Spenko, Matthew | Illinois Institute of Technology |
Keywords: Localization, Autonomous Vehicle Navigation, Robot Safety
Abstract: This paper presents a method to validate localization safety for a preplanned trajectory in a given environment. Localization safety is defined as integrity risk and quantified as the probability of an undetected localization failure. Integrity risk differs from previously used metrics in robotics in that it accounts for unmodeled faults and evaluates safety under the worst possible combination of faults. The methodology can be applied prior to mission execution and thus can be employed to evaluate the safety of potential trajectories. The work has been formulated for localization via smoothing, which differs from previously reported integrity monitoring methods that rely on Kalman filtering. Simulation and experimental results are analyzed to show that localization safety is effectively quantified.
|
|
TuCT3 |
Room T3 |
Trust and Explainability |
Regular session |
Chair: Bryant, De'Aira | Georgia Institute of Technology |
Co-Chair: Soh, Harold | National Universtiy of Singapore |
|
14:00-14:15, Paper TuCT3.1 | |
>Human-Robot Trust Assessment Using Motion Tracking & Galvanic Skin Response |
> Video Attachment
|
|
Hald, Kasper | Aalborg University |
Rehm, Matthias | Aalborg University |
Moeslund, Thomas B. | Aalborg University |
Keywords: Cooperating Robots, Visual Tracking, Human-Centered Robotics
Abstract: In this study we set out to design a computer vision-based system to assess human-robot trust in real time during close-proximity human-robot collaboration. This paper presents the setup and hardware for an augmented reality-enabled human-robot collaboration cell as well as a method of measuring operator proximity using an infrared camera. We tested this setup as a tool for assessing trust through physical apprehension signals in a collaborative drawing task, where participants hold a piece of paper on a table while the robot draws between their hands. Midway through the test we attempt to induce a decrease in trust with an unexpected change in robot speed and evaluate subject motions along with self-reported trust and emotional arousal through galvanic skin response. After performing the experiment with forty participants, we found that reported trust was significantly affected when robot movement speed was increased. The galvanic skin response measurement were not significantly different between the test conditions. The motion tracking method used in this study did not suggest that subjects' motions were significantly affected by the decrease in trust.
|
|
14:15-14:30, Paper TuCT3.2 | |
>Organizing the Internet of Robotic Things: The Effect of Organization Structure on Users’ Evaluation and Compliance Toward IoRT Service Platform |
|
Moon, Byeong June | Seoul National University |
Kwak, Sonya Sona | Korea Institute of Science and Technology (KIST) |
Choi, Jongsuk | Korea Inst. of Sci. and Tech |
Keywords: Social Human-Robot Interaction, Service Robots, Domestic Robots
Abstract: As robots and robotic things become to have more agency, IoRT which consists of robots and robotic things can be considered as a social organization. Accordingly, social organization structure of IoRT could affect users’ behavior and perception of IoRT. In this study, in order to examine the effect of social organization structure on people’s acceptance of IoRT, we conducted a 2 (social organization structure: flat vs. hierarchical) within-participants experiment (N=30). In the experiment, a participant was asked to take part in cooking task with the aid of a robot, a robotic measuring cup, and a robotic mixer. We executed a post-experimental survey and counted the duration of participants’ following the instruction given by the platform. People gave higher scores of trustworthiness and purchase intention to the platform with flat organization structure than that with hierarchical one. On the contrary, participants were more compliant with the hierarchical IoRT service platform than a flat one. Implications for the theory and design of IoRT are discussed.
|
|
14:30-14:45, Paper TuCT3.3 | |
>Getting to Know One Another: Calibrating Intent, Capabilities, and Trust for Human-Robot Collaboration |
|
Lee, Joshua Kai Sheng | National University of Singapore |
Fong, Jeffrey | National University of Singapore |
Kok, Bing Cai | National University of Singapore |
Soh, Harold | National Universtiy of Singapore |
Keywords: Cognitive Human-Robot Interaction, Human Factors and Human-in-the-Loop, Social Human-Robot Interaction
Abstract: Common experience suggests that agents who know each other well are better able to work together. In this work, we address the problem of calibrating intention and capabilities in human-robot collaboration. In particular, we focus on scenarios where the robot is attempting to assist a human who is unable to directly communicate her intent. Moreover, both agents may have differing capabilities that are unknown to one another. We adopt a decision-theoretic approach and propose the TICC-POMDP for modeling this setting, with an associated online solver. Experiments show our approach leads to better team performance both in simulation and in a real-world study with human subjects.
|
|
14:45-15:00, Paper TuCT3.4 | |
>Online Explanation Generation for Planning Tasks in Human-Robot Teaming |
|
Zakershahrak, Mehrdad | Arizona State University |
Gong, Ze | Arizona State University |
Sadassivam, Nikhillesh | Arizona State University |
Zhang, Yu (Tony) | Arizona State University |
Keywords: Cognitive Human-Robot Interaction, Task Planning, Human Factors and Human-in-the-Loop
Abstract: As AI becomes an integral part of our lives, the development of explainable AI, embodied in the decision-making process of an AI or robotic agent, becomes imperative. For a robotic teammate, the ability to generate explanations to justify its behavior is one of the key requirements of explainable agency. Prior work on explanation generation has been focused on supporting the rationale behind the robot's decision or behavior. These approaches, however, fail to consider the mental demand for understanding the received explanation. In other words, the human teammate is expected to understand an explanation no matter how much information is presented. In this work, we argue that explanations, especially those of a complex nature, should be made in an online fashion during the execution, which helps spread out the information to be explained and thus reduce the mental workload of humans in highly cognitive demanding tasks. However, a challenge here is that the different parts of an explanation may be dependent on each other, which must be taken into account when generating online explanations. To this end, a general formulation of online explanation generation is presented with three variations satisfying different “online” properties. The new explanation generation methods are based on a model reconciliation setting introduced in our prior work. We evaluated our methods both with human subjects in a simulated rover domain, using NASA Task Load Index (TLX), and synthetically with ten different problems across two standard IPC domains. Results strongly suggest that our methods generate explanations that are perceived as less cognitively demanding and much preferred over the baselines and are computationally efficient.
|
|
TuCT4 |
Room T4 |
Actuator & Joint Mechanisms I |
Regular session |
Chair: Taylor, Rebecca | Carnegie Mellon University |
Co-Chair: Gaponov, Igor | Innopolis University |
|
14:00-14:15, Paper TuCT4.1 | |
>IMU-Based Parameter Identification and Position Estimation in Twisted String Actuators |
|
Nedelchev, Simeon | Innopolis University |
Kirsanov, Daniil | Innopolis University |
Gaponov, Igor | Innopolis University |
Keywords: Calibration and Identification, Actuation and Joint Mechanisms, Kinematics
Abstract: This study proposes a technique to estimate the output state of twisted string actuators (TSAs) based on payload's acceleration measurements. We outline differential kinematics relationships of the actuator, re-formulate these into a nonlinear parameter identification problem and then apply linearization techniques to efficiently solve it as a quadratic program. Using accurate estimates of string parameters obtained with the proposed method, we can predict TSA position with sub-millimeter accuracy via conventional kinematic relationships. In addition, the proposed method supports accurate estimation under varying operating conditions, unpredictable perturbations, and poorly-excited trajectories. This technique can be employed to improve the accuracy of trajectory tracking when the use of direct position measurements is challenging, with the list of potential applications including flexible and soft robots, long-span cable robots, multi-DOF joints and others.
|
|
14:15-14:30, Paper TuCT4.2 | |
>Reliable Chattering-Free Simulation of Friction Torque in Joints Presenting High Stiction |
|
Cisneros Limon, Rafael | National Institute of Advanced Industrial Science and Technology |
Benallegue, Mehdi | AIST Japan |
Kikuuwe, Ryo | Hiroshima University |
Morisawa, Mitsuharu | National Inst. of AIST |
Kanehiro, Fumio | National Inst. of AIST |
Keywords: Simulation and Animation, Contact Modeling, Actuation and Joint Mechanisms
Abstract: The simulation of static friction, and especially the effect of stiction, is cumbersome to perform in discrete-time due to its discontinuity at zero velocity and its switching behavior. However, it is essential to achieve reliable simulations of friction to develop compliant torque control algorithms, as they are much disturbed by this phenomenon. This paper takes as a base an elastoplastic model approach for friction, which is free from chattering and drift. It proposes two closed-form solutions that can be used to reliably simulate the effect of stiction consistently with the physics-based Stribeck model. These solutions consider the nonlinearity and velocity dependency, which are main characteristics of lubricated joints. One is directly inspired by the Stribeck nonlinear terms, and the other is a simplified rational approximation. The reliability of this simulation method is shown in simulation, where the consistency and stability are assessed. We also demonstrate the accuracy of these methods by comparing them to experimental data obtained from a robot joint equipped with a high gear reduction harmonic drive.
|
|
14:30-14:45, Paper TuCT4.3 | |
>A Study on the Elongation Behaviour of Synthetic Fibre Ropes under Cyclic Loading |
|
Asane, Deoraj | Waseda University |
Schmitz, Alexander | Waseda University |
Wang, Yushi | Waseda University |
Sugano, Shigeki | Waseda University |
Keywords: Actuation and Joint Mechanisms, Mechanism Design, Performance Evaluation and Benchmarking
Abstract: Synthetic fibre ropes have high tensile strength, lower friction coefficient and are more flexible than steel ropes, and are therefore increasingly used in robotics. However, their characteristics are not well studied. In particular, previous work investigated the long-term behaviour only under static loading. In this paper, we investigate the elongation behaviour of synthetic fibre ropes under cyclic loading. In particular, we use ropes made from Dyneema DM20 (UHMWPE) and ZYLON HM (PBO), which according to prior work have low creep. While Dyneema is more widely used, Zylon has higher tensile strength. We could show that under cyclic loading the Dyneema DM20 rope elongated more than 9% and kept on extending even after 500 cycles. Zylon exhibited a more stable and lower elongation of less than 3%.
|
|
14:45-15:00, Paper TuCT4.4 | |
>Steering Magnetic Robots in Two Axes with One Pair of Maxwell Coils |
> Video Attachment
|
|
Benjaminson, Emma | Carnegie Mellon University |
Travers, Matthew | Carnegie Mellon University |
Taylor, Rebecca | Carnegie Mellon University |
Keywords: Actuation and Joint Mechanisms
Abstract: This work demonstrates a novel approach to steering a magnetic swimming robot in two dimensions with a single pair of Maxwell coils. By leveraging the curvature of the magnetic field gradient, we achieve motion along two axes. This method allows us to control medical magnetic robots using only existing MRI technology, without requiring additional hardware or posing any additional risk to the patient. We implement a switching time optimization algorithm which generates a schedule of control inputs that direct the swimming robot to a goal location in the workspace. By alternating the direction of the magnetic field gradient produced by the single pair of coils per this schedule, we are able to move the swimmer to desired points in two dimensions. Finally, we demonstrate the feasibility of our approach with an experimental implementation on the millimeter scale and discuss future opportunities to expand this work to the microscale, as well as other control problems and real-world applications.
|
|
TuCT5 |
Room T5 |
Actuator & Joint Mechanisms II |
Regular session |
Chair: Park, Jaeheung | Seoul National University |
Co-Chair: Verstraten, Tom | Vrije Universiteit Brussel |
|
14:00-14:15, Paper TuCT5.1 | |
>Scaling Laws for Parallel Motor-Gearbox Arrangements |
|
Saerens, Elias | Vrije Universiteit Brussel |
Crispel, Stein | Vrije Universiteit Brussel |
Lopez Garcia, Pablo | Vrije Universiteit Brussel |
Ducastel, Vincent | Vrije Universiteit Brussel |
Beckers, Jarl | Vrije Universiteit Brussel |
De Winter, Joris | Vrije Universiteit Brussel |
Furnémont, Raphaël | Vrije Universiteit Brussel |
Vanderborght, Bram | Vrije Universiteit Brussel |
Verstraten, Tom | Vrije Universiteit Brussel |
Lefeber, Dirk | Vrije Universiteit Brussel |
Keywords: Mechanism Design, Actuation and Joint Mechanisms
Abstract: Research towards (compliant) actuators, especially redundant ones like the Series Parallel Elastic Actuator (SPEA), has led to the development of drive trains, which have demonstrated to increase efficiency, torque-to-mass-ratio, power-to-mass ratio, etc. In the field of robotics such drive trains can be implemented, enabling technological improvements like safe, adaptable and energy-efficient robots. The choice of the used motor and transmission system, as well as the compliant elements composing the drive train, are highly dependent of the application and more specifically on the allowable weight and size. In order to optimally design an actuator adapted to the desired characteristics and the available space, scaling laws governing the specific actuator can simplify and enhance the reliability of the design process. Although scaling laws of electric motors and links are known, none have been investigated for a complete redundant drive train. The present study proposes to fill this gap by providing scaling laws for electric motors in combination with their transmission system. These laws are extended towards parallelization, i.e. replacing one big motor with gearbox by several smaller ones in parallel. The results of this study show that the torque/mass ratio for a motor-gearbox can not be increased by parallelization, but that it can increase the torque/volume ratio. This is however only the case if a good topology is chosen.
|
|
14:15-14:30, Paper TuCT5.2 | |
>A Concept of a Miniaturized MR Clutch Utilizing MR Fluid in Squeeze Mode |
|
Pisetskiy, Sergey | Western University |
Kermani, Mehrdad R. | University of Western Ontario |
Keywords: Mechanism Design, Compliant Assembly, Physical Human-Robot Interaction
Abstract: This paper presents a novel design concept of a miniaturized Magneto-Rheological (MR) clutch. The design uses a set of spur gears as a means to control the torque. MR clutches with various configurations such as disk-, drum-, and armature-based have in the past been reported in the literature. However, to the best of our knowledge, the design of a clutch with spur gears to use MR fluid in squeeze mode is a novel concept that has never been reported previously. After a brief description of the MR clutch principles, the details of the mechanical design of the spur gear MR clutch are discussed. The distribution of the magnetic flux inside the MR clutch is studied using finite element analysis in COMSOL Multiphysics software. Preliminary experimental results using a prototype MR clutch that validate the new concept and the results therein will be presented next. To clearly show the performance of the proposed design, we compared the torque capacity of our MR clutch obtained experimentally with that of a simulated disk-type MR clutch of a similar size.
|
|
14:30-14:45, Paper TuCT5.3 | |
>Development and Evaluation of a Linear Series Clutch Actuator for Vertical Joint Application with Static Balancing |
|
Kulkarni, Shardul | Waseda University |
Schmitz, Alexander | Waseda University |
Funabashi, Satoshi | Waseda University, Sugano Lab |
Sugano, Shigeki | Waseda University |
Keywords: Industrial Robots, Mechanism Design, Robot Safety
Abstract: Future robots are expected to share their workspace with humans. Controlling and limiting the forces that such robots exert on their environment is crucial. While force control can be achieved actively with the help of force sensing, passive mechanisms have no time delay in their response to external forces, and would therefore be preferable. Series clutch actuators can be used to achieve high levels of safety and backdriveability. This work presents the first implementation of a linear series clutch actuator. It can exert forces of more than 110N while weighing less than 2kg. Force controllability and safety are demonstrated. Static balancing, which is important for the application in a vertical joint, is also implemented. The power consumption is evaluated, and for a payload of 3kg and with the maximum speed of 94mm/s, thepowerconsumedbytheactuatoris11W.Overall,apractical implementation of a linear series clutch actuator is reported, which can be used for future collaborative robots.
|
|
14:45-15:00, Paper TuCT5.4 | |
>Elastomeric Continuously Variable Transmission Combined with Twisted String Actuator |
> Video Attachment
|
|
Kim, Seungyeon | Graduate School of Convergence Science and Technology, Seoul Nat |
|