| |
Last updated on August 10, 2022. This conference program is tentative and subject to change
Technical Program for Tuesday September 13, 2022
|
TuO1O Regular Session, ArtsTwo Lecture Theatre |
Add to My Program |
Predictive/Adaptive Behaviours |
|
|
|
10:00-10:20, Paper TuO1O.1 | Add to My Program |
Developing Hierarchical Anticipations Via Neural Network-Based Event Segmentation |
|
Gumbsch, Christian (University of Tübingen & Max Planck Institute for Intelligent Sy), Adam, Maurits (University of Potsdam), Elsner, Birgit (University of Potsdam), Martius, Georg (Max Planck Institute for Intelligent Systems), Butz, Martin Volker (University of Tuebingen) |
Keywords: Architectures for Cognitive Development and Open-Ended Learning, Development of skills in biological systems and robots, Sensorimotor development
Abstract: Humans can make predictions on various time scales and hierarchical levels. Thereby, the learning of event encodings seems to play a crucial role. In this work we model the development of hierarchical predictions via autonomously learned latent event codes. We present a hierarchical recurrent neural network architecture, whose inductive learning biases foster the development of sparsely changing latent state that compress sensorimotor sequences. A higher level network learns to predict the situations in which the latent states tend to change. Using a simulated robotic manipulator, we demonstrate that the system (i) learns latent states that accurately reflect the event structure of the data, (ii) develops meaningful temporal abstract predictions on the higher level, and (iii) generates goal-anticipatory behavior similar to gaze behavior found in eye-tracking studies with infants. The architecture offers a step towards the autonomous learning of compressed hierarchical encodings of gathered experiences and the exploitation of these encodings to generate adaptive behavior.
|
|
10:20-10:40, Paper TuO1O.2 | Add to My Program |
Learning Intrinsically Motivated Transition Models for Autonomous Systems |
|
Doctor, Khoshrav (University of Massachusetts), Ghosh, Hia (Univeristy of Massachusetts), Grupen, Rod (University of Massachusetts) |
Keywords: Intrinsic Motivation, Exploration and Play, Affordance learning and perception, Action selection and planning
Abstract: To support long-term autonomy and rational decision making, robotic systems should be risk aware and actively maintain the fidelity of critical state information. This is particularly difficult in natural environments that are dynamic, noisy, and partially observable. To support autonomy, predictive probabilistic models of robot-object interaction can be used to guide the agent toward rewarding and controllable outcomes with high probability while avoiding undesired states and allowing the agent to be aware of the amount of risk associated with acting. In this paper, we propose an intrinsically motivated learning technique to model probabilistic transition functions in a manner that is task-independent and sample efficient. We model them as Aspect Transition Graphs (ATGs)---a state-dependent control roadmap that depends on transition probability functions grounded in the sensory and motor resources of the robot. Experimental data that changes the relative perspective of an actively-controlled RGB-D camera is used to train empirical models of the transition probability functions. Our experiments demonstrate that the transition function of the underlying Partially Observable Markov Decision Process (POMDP) can be acquired efficiently using intrinsically motivated structure learning approach.
|
|
10:40-11:00, Paper TuO1O.3 | Add to My Program |
RAPid-Learn: A Framework for Learning to Recover for Handling Novelties in Open-World Environments |
|
Goel, Shivam (Tufts University), Shukla, Yash (Tufts University), Sarathy, Vasanth (Tufts University), Scheutz, Matthias (Tufts University), Sinapov, Jivko (Tufts University) |
Keywords: Prediction, planning and problem solving, Architectures for Cognitive Development and Open-Ended Learning, Action selection and planning
Abstract: We propose RAPid-Learn (Learning to Recover and Plan Again), a hybrid planning and learning method, to tackle the problem of adapting to sudden and unexpected changes in an agent’s environment (i.e., novelties). RAPid-Learn is designed to formulate and solve modifications to a task’s Markov Decision Process (MDPs) on-the-fly. It is capable of exploiting the domain knowledge to learn action executors which can be further used to resolve execution impasses, leading to a successful plan execution. We demonstrate its efficacy by introducing a wide variety of novelties in a gridworld environment inspired by Minecraft, and compare our algorithm with transfer learning baselines from the literature. Our method is (1) effective even in the presence of multiple novelties, (2) more sample efficient than transfer learning RL baselines, and (3) robust to incomplete model information, as opposed to pure symbolic planning approaches.
|
|
TuO2O Regular Session, ArtsTwo Lecture Theatre |
Add to My Program |
Cognition |
|
|
|
11:30-11:50, Paper TuO2O.1 | Add to My Program |
MIMo: A Multi-Modal Infant Model for Studying Cognitive Development in Humans and AIs |
|
Mattern, Dominik (Goethe Universität Frankfurt), López, Francisco Martín (Frankfurt Institute for Advances Studies), Ernst, Markus Roland (Frankfurt Institute for Advanced Studies), Aubret, Arthur (University Clermont Auvergne, CNRS), Triesch, Jochen (Frankfurt Institute for Advanced Studies) |
Keywords: Sensorimotor development, Embodiment, Multimodal perception
Abstract: A central challenge in the early cognitive development of humans is making sense of the rich multimodal experiences originating from interactions with the physical world. AIs that learn in an autonomous and open-ended fashion based on multimodal sensory input face a similar challenge. To study such development and learning in silico, we have created MIMo, a multimodal infant model. MIMo's body is modeled after an 18-month-old child and features binocular vision, a vestibular system, proprioception, and touch perception through a full body virtual skin. MIMo is an open source research platform based on the MuJoCo physics engine for constructing computational models of human cognitive development as well as studying open-ended autonomous learning in AI. We describe the design and interfaces of MIMo and provide examples illustrating its use.
|
|
11:50-12:10, Paper TuO2O.2 | Add to My Program |
The Impact of Action in Visual Representation Learning |
|
Devillers, Alexandre (Univ Lyon, Université Lyon 1, LIRIS, UMR5205), Chaffraix, Valentin (Univ Lyon, INSA Lyon, LIRIS, UMR5205), Armetta, Frederic (Liris, Cnrs, France), Duffner, Stefan (INSA Lyon , LIRIS, CNRS, FRANCE), Lefort, Mathieu (Liris, Cnrs, France) |
Keywords: Grounding of Knowledge and Development of Representations, Sensorimotor development, Statistical Learning
Abstract: Sensori-motor theories, inspired by work in neuroscience, psychology and cognitive science, claim that actions, through learning and mastering of a predictive model, are a key element in the perception of the environment. On the computational side, in the domains of representation learning and reinforcement learning, models are increasingly using self-supervised pretext tasks, such as predictive or contrastive ones, in order to increase the performance on their main task. These pretext tasks are action-related even if the action itself is usually not used in the model. In this paper, we propose to study the influence of considering action in the learning of visual representations in deep neural network models, an aspect which is often underestimated w.r.t. sensori-motor theories. More precisely, we quantify two independent factors: 1- whether or not to use the action during the learning of visual characteristics, and 2- whether or not to integrate the action in the representations of the current images. Other aspects will be kept as simple and comparable as possible, that is why we will not consider any specific action policies and combine simple architectures (VAE and LSTM), while using datasets derived from MNIST. In this context, our results show that explicitly including action in the learning process and in the representations improves the performance of the model, which opens interesting perspectives to improve state-of-the-art models of representation learning.
|
|
12:10-12:30, Paper TuO2O.3 | Add to My Program |
Leveraging Developmental Psychology to Evaluate Artificial Intelligence |
|
Moore, David (Pitzer College), Oakes, Lisa (University of California, Davis), Romero, Victoria (CACI International Inc), McCrink, Koleen (Barnard College Columbia University) |
Keywords: Reasoning with abstract knowledge, Grounding of Knowledge and Development of Representations
Abstract: Artificial intelligence (AI) systems do not exhibit human-like common sense. The principles and practices of experimental psychology – specifically, work on infant cognition – can be used to develop and test AIs, providing insight into the building blocks of common sense. Here, we describe how the evaluation team for DARPA’s Machine Common Sense program is applying conceptual content, experimental design expertise, and analysis techniques used in the field of infant cognitive development to the field of AI evaluation.
|
|
TuTO Teasers Session, ArtsTwo Lecture Theatre |
Add to My Program |
Poster Teasers 1 |
|
|
|
12:30-13:00, Paper TuTO.1 | Add to My Program |
Identifying and Localizing Dynamic Affordances to Improve Interactions with Other Agents |
|
Gay, Simon L. (LCIS, Univ. Grenoble Alpes), Jamont, Jean-Paul (Univ. Grenoble Alpes), Georgeon, Olivier (University of Lyon) |
Keywords: Affordance learning and perception, Grounding of Knowledge and Development of Representations, Architectures for Cognitive Development and Open-Ended Learning
Abstract: Allowing robots to learn by themselves to coordinate their actions and cooperate requires that they be able to recognize each other and be capable of intersubjectivity. To comply with artificial developmental learning and self motivation, we follow the radical interactionism hypothesis, in which an agent has no a priori knowledge on its environment (not even that the environment is 2D), and does not receive rewards defined as a direct function of the environment's state. We aim at designing agents that learn to efficiently interact with other entities that may be static or may make irregular moves following their own motivation. This paper presents new mechanisms to identify and localize such mobile entities. The agent has to learn the relation between its perception of mobile entities and the interactions that they afford. These relations are recorded under the form of data structures, called signatures of interaction, that characterize entities in the agent's point of view, and whose properties are exploited to interact with distant entities. These mechanisms were tested in a simulated prey-predator environment. The obtained signatures showed that the predator successfully learned to identify mobile preys and their probabilistic moves, and to efficiently localize distant preys in the 2D environment.
|
|
12:30-13:00, Paper TuTO.2 | Add to My Program |
I Have Seen That Before: Memory Augmented Neural Network for Learning Affordances by Analogy |
|
Schydlo, Paul (University of Lisbon, Instituto Superior Técnico, Institute For), Santos, Laura (Instituto Superior Tecnico, Universidade De Lisboa), Dehban, Atabak (Ist-Id 509 830 072), John, Akhil (Instituto Superior Tecnico), Santos-Victor, José (Instituto Superior Técnico - Lisbon) |
Keywords: Affordance learning and perception, Machine Learning methods for robot development, Lifelong learning
Abstract: Humans show a remarkable ability to quickly adapt to new situations without forgetting past ones. Existing learning methods still face problems with catastrophic forgetting when learning about new situations and contexts while taking many iterations to adjust the network weights to new input and output pairs. In this work, we propose the application of a memory augmented network to the problem of learning tool affordances. We consider a network that explicitly indexes an episodic memory of past experiences and retrieves samples of past experience to reason about new situations by analogy, in an approach we call affordances by analogy. The work takes advantage of a tool-object interaction dataset to learn affordances. Our experiments show the model performs similar to baselines in the low sample regime and retains information better when re-trained on a different data distribution. Hinting at a promising direction for enabling learning algorithms to retain information better.
|
|
12:30-13:00, Paper TuTO.3 | Add to My Program |
Disentangling Patterns and Transformations from One Sequence of Images with Shape-Invariant Lie Group Transformer |
|
Takada, Takumi (The University of Tokyo), Shimaya, Wataru (The University of ToKyo), Ohmura, Yoshiyuki (The University of Tokyo), Kuniyoshi, Yasuo (The University of Tokyo) |
Keywords: Architectures for Cognitive Development and Open-Ended Learning, Cogntive vision, Grounding of Knowledge and Development of Representations
Abstract: An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is extremely difficult to equip robots with such a learning mechanism. In recent years, there has been significant research on autonomously learning representations of the world using the deep learning; however, most studies have taken a statistical approach, which requires a large number of training data. Contrary to such existing methods, we take a novel algebraic approach for representation learning based on a simpler and more intuitive formulation that the observed world is the combination of multiple independent patterns and transformations that are invariant to the shape of patterns. Since the shape of patterns can be viewed as the invariant features against symmetric transformations such as translation or rotation, we can expect that the patterns can naturally be extracted by expressing transformations with symmetric Lie group transformers and attempting to reconstruct the scene with them. Based on this idea, we propose a model that disentangles the scenes into the minimum number of basic components of patterns and Lie transformations from only one sequence of images, by introducing the learnable shape-invariant Lie group transformers as transformation components. Experiments show that given one sequence of images in which two objects are moving independently, the proposed model can discover the hidden distinct objects and multiple shape-invariant transformations that constitute the scenes.
|
|
12:30-13:00, Paper TuTO.4 | Add to My Program |
Symbol Emergence As Inter-Personal Categorization with Head-To-Head Latent Word |
|
Furukawa, Kazuma (Ritsumeikan University), Taniguchi, Akira (Ritsumeikan University), Hagiwara, Yoshinobu (Ritsumeikan University), Taniguchi, Tadahiro (Ritsumeikan University) |
Keywords: Concept formation and symbol grounding/emergence, Emergence of verbal and non-verbal communication, Grounding of Knowledge and Development of Representations
Abstract: In this study, we propose a head-to-head type (H2H-type) inter-personal multimodal Dirichlet mixture (Inter-MDM) by modifying the original Inter-MDM, which is a probabilistic generative model that represents the symbol emergence between two agents as multiagent multimodal categorization. A Metropolis--Hastings method-based naming game based on the Inter-MDM enables two agents to collaboratively perform multimodal categorization and share signs with a solid mathematical foundation of convergence. However, the conventional Inter-MDM presumes a tail-to-tail connection across a latent word variable, causing inflexibility of the further extension of Inter-MDM for modeling a more complex symbol emergence. Therefore, we propose herein a head-to-head type (H2H-type) Inter-MDM that treats a latent word variable as a child node of an internal variable of each agent in the same way as many prior studies of multimodal categorization. On the basis of the H2H-type Inter-MDM, we propose a naming game in the same way as the conventional Inter-MDM. The experimental results show that the H2H-type Inter-MDM yields almost the same performance as the conventional Inter-MDM from the viewpoint of multimodal categorization and sign sharing.
|
|
12:30-13:00, Paper TuTO.5 | Add to My Program |
What Kind of Player Are You? Continuous Learning of a Player Profile for Adaptive Robot Teleoperation |
|
Jouaiti, Melanie (University of Waterloo), Dautenhahn, Kerstin (University of Waterloo) |
Keywords: Development of skills in biological systems and robots, Use of Robots in Applied Settings such as Autism Therapy
Abstract: Play is important for child development and robot-assisted play is very popular in Human-Robot Interaction as it creates more engaging and realistic setups for user studies. Adaptive game-play is also an emerging research field and a good way to provide a personalized experience while adapting to individual user's needs. In this paper, we analyze joystick data and investigate player learning during a robot navigation game. We collected joystick data from healthy adult participants playing a game with our custom robot MyJay, while participants teleoperated the robot to perform goal-directed navigation. We evaluated the performance of both novice and proficient joystick users. Based on this analysis, we propose some robot learning mechanisms to provide a personalized game experience. Our findings can help improving human-robot interaction in the context of teleoperation in general, and could be particularly impactful for children with disabilities who have problems operating off-the-shelf joysticks.
|
|
12:30-13:00, Paper TuTO.6 | Add to My Program |
Evaluating Sensorimotor Abstraction on Curricula for Learning Mobile Manipulation Skills |
|
Youngquist, Oscar (University of Massachusetts Amherst), Spiro, Alenna (University of Massachusetts Amherst), Doctor, Khoshrav (University of Massachusetts), Grupen, Rod (University of Massachusetts) |
Keywords: Developmental Stages, Sensorimotor development, General Principles of Development and Learning
Abstract: Developmental mechanisms in newborn animals shepherd the infant through interactions with the world that form the foundation for hierarchical skills. An important part of this guidance resides in mechanisms of growth and maturation, wherein patterns of sensory and motor recruitment constrain learning complexity while building foundational expertise and transferable control knowledge. The resulting control policies represent a sensorimotor state abstraction that can be leveraged when developing new behaviors. This paper uses a computational model of developmental learning with parameters for controlling the recruitment of sensory and motor resources, and evaluates how this influences sample efficiency and fitness for a specific mobile manipulation task. We find that a developmental curriculum driven by sensorimotor abstraction drastically improves (by up to an order of magnitude) learning performance and sample efficiency over non-developmental approaches. Additionally, we find that the developmental policies/state abstractions offer significant robustness properties, enabling skill transfer to novel domains without additional training.
|
|
12:30-13:00, Paper TuTO.7 | Add to My Program |
Toddler-Inspired Embodied Vision for Learning Object Representations |
|
Aubret, Arthur (University Clermont Auvergne, CNRS), Teuliere, Celine (Institut Pascal, Clermont Auvergne University), Triesch, Jochen (Frankfurt Institute for Advanced Studies) |
Keywords: Embodiment, Statistical Learning, Cogntive vision
Abstract: Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by mapping successive views of an object onto close-by internal representations. When considering this learning approach as a model of the development of human object recognition, it is important to consider what visual input a toddler would typically observe while interacting with objects. First, human vision is highly foveated, with high resolution only available in the central region of the field of view. Second, objects may be seen against a blurry background due to toddlers' limited depth of field. Third, during object manipulation a toddler mostly observes close objects filling a large part of the field of view due to their rather short arms. Here, we study how these effects impact the quality of visual representations learnt through time-contrastive learning. To this end, we let a visually embodied agent ``play'' with objects in different locations of a near photo-realistic flat. During each play session the agent views an object in multiple orientations before turning its body to view another object. The resulting sequence of views feeds a time-contrastive learning algorithm. Our results show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments. We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions. The results of our model suggest that several influences on toddler’s visual input statistics support their unsupervised learning of object representations.
|
|
12:30-13:00, Paper TuTO.8 | Add to My Program |
Master of Puppets: Multi-Modal Robot Activity Segmentation from Teleoperated Demonstrations |
|
Coppola, Claudio (Queen Mary University of London), Jamone, Lorenzo (Queen Mary University London) |
Keywords: Haptic and tactile perception, Machine Learning methods for robot development, Action selection and planning
Abstract: Programming robots for complex tasks in unstructured settings (e.g., light manufacturing, extreme environments) cannot be accomplished solely by analytical methods. Learning from teleoperated human demonstrations is a promising approach to decrease the programming burden and to obtain more effective controllers. However, the recorded demonstrations need to be decomposed into atomic actions to facilitate the representation of the desired behaviour, which can be very challenging in real-world settings. In this study, we propose a method that uses features extracted from robot motion and tactile data to automatically segment atomic actions from a teleoperation sequence. We created a publicly available dataset with demonstrations of robotic pick-and-place of three different objects in single-object and cluttered situations. We use a custom-built teleoperation system that maps the user's hand and fingertips poses into a three-fingered dexterous robot hand equipped with tactile sensors. Our findings suggest that the proposed feature set generalises the activities in different episodes of the same object and between items of similar size. Furthermore, they suggest that tactile features contribute to higher performance in recognising activities within demonstrations.
|
|
12:30-13:00, Paper TuTO.9 | Add to My Program |
Leveraging Symmetry Detection to Speed up Haptic Object Exploration in Robots |
|
Bonzini, Aramis Augusto (Queen Mary University of London), Seminara, Lucia (University of Genova-DITEN), Macciò, Simone (University of Genoa), Carfì, Alessandro (University of Genoa), Jamone, Lorenzo (Queen Mary University London) |
Keywords: Haptic and tactile perception, Prediction, planning and problem solving, Machine Learning methods for robot development
Abstract: Most objects are symmetric. In fact, humans are very good at detecting symmetry, both by vision and by touch, and they use such information to facilitate the perception of other object properties, such as shape and size; overall, this contributes to human’s ability to successfully manipulate objects in unstructured environments. Inspired by this human skill, in this paper we propose a haptic exploration procedure that enables a robot to detect object symmetry, and uses such information to estimate the shape of an object with higher accuracy and in less time. We achieve this by incorporating symmetries in a Gaussian Process model, and by introducing a novel strategy to detect the presence of such symmetry. We report results obtained with a Baxter robot equipped with a custom tactile sensor on the gripper: we show that when the robot explores objects with unknown symmetries the time required to estimate the object shape is reduced by up to 50% thanks to our method.
|
|
12:30-13:00, Paper TuTO.10 | Add to My Program |
Real-Time Acoustic Touch Localization in Human-Robot Interaction Based on Steered Response Power |
|
Gamboa-Montero, Juan Jose (Universidad Carlos III De Madrid), Basiri, Meysam (Instituto Superior Tecnico), Castillo, Jose Carlos (University Carlos III of Madrid), Marques Villarroya, Sara (Universidad Carlos III of Madrid), Salichs, Miguel A. (University Carlos III of Madrid) |
Keywords: Haptic and tactile perception, Human-human and human-robot interaction and communication, Social robots and social learning
Abstract: The sense of touch plays an important role in Human-Robot Interaction, allowing a natural form of communication with humans and improving the rest of the robot's senses. This is even more important in the subject of social robotics, where robots must interact with people and adhere to social conventions while doing so.Touch interfaces can be implemented as part of a robotic platform serving multiple purposes. Some examples include the use of tactile commands to control the movement of a robot, or attempting to endow a robot with the ability to understand human emotional states. This work proposes a system to localize a contact performed on the rigid, non-planar shell of a service robot in real-time, based on set of spatially separated piezo transducers attached to the shell of the robot and the Steered Response Power sound source localization algorithm. Results show the potential capability of the system to correctly detect and localize human touches.
|
|
12:30-13:00, Paper TuTO.11 | Add to My Program |
Brain-Inspired Probabilistic Generative Model for Double Articulation Analysis of Spoken Language |
|
Taniguchi, Akira (Ritsumeikan University), Muro, Maoko (Ritsumeikan University), Yamakawa, Hiroshi (The Whole Brain Architecture Initiative), Taniguchi, Tadahiro (Ritsumeikan University) |
Keywords: Language acquisition, Speech perception and production, Machine Learning methods for robot development
Abstract: The human brain, among its several functions, analyzes the double articulation structure in spoken language, i.e., double articulation analysis (DAA). A hierarchical structure in which words are connected to form a sentence and words are composed of phonemes or syllables is called a double articulation structure. Where and how DAA is performed in the human brain has not been established, although some insights have been obtained. In addition, existing computational models based on a probabilistic generative model (PGM) do not incorporate neuroscientific findings, and their consistency with the brain has not been previously discussed. This study compared, mapped, and integrated these existing computational models with neuroscientific findings to bridge this gap, and the findings are relevant for future applications and further research. This study proposes a PGM for a DAA hypothesis that can be realized in the brain based on the outcomes of several neuroscientific surveys. The study involved (i) investigation and organization of anatomical structures related to spoken language processing, and (ii) design of a PGM that matches the anatomy and functions of the region of interest. Therefore, this study provides novel insights that will be foundational to further exploring DAA in the brain.
|
|
TuO3O Regular Session, ArtsTwo Lecture Theatre |
Add to My Program |
Perception/Vision |
|
|
|
15:00-15:20, Paper TuO3O.1 | Add to My Program |
Active Gaze Control for Foveal Scene Exploration |
|
Dias, Alexandre (Instituto Superior Técnico), Simões, Luís (Instituto Superior Técnico), Moreno, Plinio (IST-ID), Bernardino, Alexandre (IST - Técnico Lisboa) |
Keywords: Cogntive vision, Affordance learning and perception, Active learning
Abstract: Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects present in their surroundings in the least number of gaze shifts. Our approach is based on three key methods. First, we take an off-the-shelf deep object detector, pre-trained on a large dataset of regular images, and calibrate the classification outputs to the case of foveated images. Second, a body-centered semantic map, encoding the objects classifications and corresponding uncertainties, is sequentially updated with the calibrated detections, considering several data fusion techniques. Third, the next best gaze fixation point is determined based on information-theoretic metrics that aim at minimizing the overall expected uncertainty of the semantic map. When compared to the random selection of next gaze shifts, the proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts and reduces to one third the number of required gaze shifts to attain similar performance.
|
|
15:20-15:40, Paper TuO3O.2 | Add to My Program |
Towards Third-Person Visual Imitation Learning Using Generative Adversarial Networks |
|
Garello, Luca (Italian Institute of Technology and University of Genoa), Rea, Francesco (Istituto Italiano Di Tecnologia), Noceti, Nicoletta (University of Genova), Sciutti, Alessandra (Italian Institute of Technology) |
Keywords: Cogntive vision, Embodiment, Machine Learning methods for robot development
Abstract: Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.
|
|
15:40-16:00, Paper TuO3O.3 | Add to My Program |
Binding Dancers into Attractors |
|
Kaltenberger, Franziska (University of Tuebingen), Otte, Sebastian (University of Tübingen), Butz, Martin Volker (University of Tuebingen) |
Keywords: Models of human motion and state, Prediction, planning and problem solving, Body schema and body image
Abstract: To effectively perceive and process observations in our environment, feature binding and perspective taking are crucial cognitive abilities. Feature binding combines observed features into one entity, called a Gestalt. Perspective taking transfers the percept into a canonical, observer-centered frame of reference. Here we propose a recurrent neural network model that solves both challenges. We first train an LSTM to predict 3D motion dynamics from a canonical perspective. We then present similar motion dynamics with novel viewpoints and feature arrangements. Retrospective inference enables the deduction of the canonical perspective. Combined with a robust mutual-exclusive softmax selection scheme, random feature arrangements are reordered and precisely bound into known Gestalt percepts. To corroborate evidence for the architecture's cognitive validity, we examine its behavior on the silhouette illusion, which elicits two competitive Gestalt interpretations of a rotating dancer. Our system flexibly binds the information of the rotating figure into the alternative attractors resolving the illusion's ambiguity and imagining the respective depth interpretation and the corresponding direction of rotation. We finally discuss the potential universality of the proposed mechanisms.
|
| |