"Non-rigid models in science and entertainment"
Visual tracking of non-rigid human and biological motion has received increased attention in several contexts, from replacing the hero character in movies to large scale video analytics. The conventional way to track non-rigid motion is done in a two-step process: 1 -- Learn a model from large amounts of training data; 2 -- Track in new footage the degrees of freedom of the prior model. Most recently these 2 steps have been merged into so called Non-Rigid-Structure-From-Motion techniques (NRSFM), that circumvent many shortcomings of the traditional model based approach, but also introduce new challenges. We discuss both paradigms and demonstrate them on a range of problems, from high-end Hollywood productions to very high-degree-of-freedom human pose and gesture analytics applied to athletes, academics, and politicians.
"What do deforming shapes reveal about structure from motion"
Q Zaidi, A Jain
Many organisms and objects deform when moving, requiring perceivers to separate shape changes from object motions. We have discovered interesting details about human perception of deforming shapes from motion cues, by using movies of rigid and flexing point-light cylinders rotating simultaneously around the depth and vertical axes in perspective: 1) Observers can discern cross-sectional shapes of flexing and rigid cylinders equally well, suggesting no advantage for structure-from-motion models using rigidity assumptions. 2) Symmetric cylinders appear asymmetric when oblique rotation axes generate asymmetric velocity profiles, highlighting the primacy of velocity patterns in shape perception. 3) Inexperienced observers are generally incapable of using motion cues to detect inflation/deflation of cylinders, but this handicap can be overcome with practice equally well for rigid and flexing objects. 4) Observers successfully classify cylinders as rigid, flexing in depth, or flexing in the image plane from combinations of motion and contour cues, but not contour cues alone. Parsing image velocity flows into kinematic differential invariants, the gradient of def is zero for rigid but non-zero for flexing cylinders, and combinations of def with curl and div classify plane and depth deformations respectively. The visual system could use these invariants to confirm rigidity and identify shape deformations.
"Perceiving dynamic faces"
When faces move we can recover information about what parts of the face move together and which parts move independently. Thus motion allows us to determine how faces look on average, how they vary with respect to the average and what information it makes sense to code separately. Thus motion is critical to the representation of both moving and static faces. Principal Components Analysis can be used to construct a “face space” encoding a range of expressions from the movements of a single individual. Face adaptation has been used as evidence for norm-based coding of facial identity. We asked whether adaptation could alter perception in the case of a single individual’s expression space. We took expressions 1 standard deviation from the mean expression along the directions of the first and second principal components. Adapting to one end of a continuum in expression space shifted the perception of faces along that axis but had no effect on the orthogonal dimension. In this experiment subjects had no experience of the target face. The results imply that adaptation does not affect the internal representation of a particular identity rather adaptation modifies the representational structure on which the description of individual faces is based.
"Coding of static and acting bodies in monkey temporal cortex"
Visual actions are changes in body pose over time. Computational models of visual action recognition suggested that actions are coded by motion- and body pose-based mechanisms. We studied the coding of visual actions in macaque inferior temporal cortex in particular in the rostral Superior Temporal Sulcus (STS). In a first study we used a parameterized set of morphed stick figure actions. A population of STS neurons represented the similarities amongst the actions, and this held for neurons requiring motion in order to respond and neurons that responded as well to static as to the acting agent. In a second study, we examined the coding of movies of walking agents. The population of STS neurons reliably classified walking direction and – to a lesser degree– forward from backward walking. The latter action patterns differed only in their pose-sequence but not in poses. Overall, the data suggest that macaque STS neurons code for momentary body pose but also carry a pose sequence mechanism, in agreement with the computational models of action coding. Given the link between action coding and body representations, we are currently examining the representation of bodies in macaque temporal cortex, using both fMRI and single cell recording.
"Primitive-based representations of human body motion"
M A Giese
Human body movements are characterized by complex hierarchical spatio-temporal patterns. The learning of the spatio-temporal structure of such patterns might be the key for their robust recognition. In motor control as well as robotics the idea of the organization of complex movements in terms of a limited number of movement primitives has been a cornerstone for the development of efficient representations of complex body movements. We demonstrate how this concept can be applied for the analysis of the visual perception of complex body movements. Starting from work on the analysis of the key features that determine the emotional style of movements of individual actors and its perception, we extend this work to the analysis of interactive emotional body movements. Combining supervised and unsupervised learning techniques with methods for the learning of movement dynamics, we analyze the parameters that characterize interactive emotional movements and investigate how such parameters influence the perception of emotional style.
"Learning hierarchical models of shape"
A L Yuille
This talk summarizes recent work on learning hierarchical models of shape. Objects are represented by recursive compositional models (RCMs) which are constructed from hierarchical dictionaries of more elementary RCMs. These dictionaries are learnt in an unsupervised manner using principles such as suspicious coincidences and competitive exclusion. Dictionary elements are analogous to receptive field structures found in the visual cortex.For multiple objects, we learn hierarchical dictionaries which encourage part-sharing (i.e. sharing dictionary elements between different objects). This gives an efficient representation of multiple objects while enabling efficient inference and learning, We describe how this work can be formalized in terms of a breadth first search through the space of models. We demonstrate results on benchmarked real images,
"Bag of features and beyond"
Bag-of-features have recently shown very good performance for category classification of still images and action classification in videos. In this presentation we first review the underlying principles of bag-of-features, in particular the type of local features used and the way to build bag-of-features. We, then, present ways to integrate spatial and temporal information, namely spatial pyramids and actom sequence models. Spatial pyramids recognize scenes based on approximate global geometric correspondence. They partition the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting "spatial pyramid" is a simple and computationally efficient extension of an orderless bag-of-features image representation, and it shows significantly improved performance on challenging scene categorization tasks. Action sequence models localize actions in challenging video material based on sequences of atomic actions, which represents the temporal structure by sequences of histograms of actom-anchored visual features. Our representation, which can be seen as a temporally structured extension of the bag-of-features, is flexible, sparse and discriminative. The resulting actom sequence model is shown to significantly improve performance over existing methods for temporal action localization.
"Spike based recognition - from biology to hardware"
How can neurons learn to recognize stimuli? I will review recent work that has demonstrated that simple integrate-and-fire neurons equipped with Spike-Time Dependent Plasticity (STDP) will naturally become selective to patterns of afferent activity that occur repetitively [Masquelier & Thorpe, 2007, PLoS Comp Biol, 3, e21]. Indeed, modelling and experimental studies suggest that a few tens of presentations may be sufficient for selectivity to emerge. Furthermore, by adding inhibitory connections between neurons, the system functions as a competitive learning mechanism in which different neurons will learn to respond to different patterns [Masquelier et al, 2009, Neural Comp, 21, 1259]. Interestingly, these sorts of STDP-based learning mechanisms can potentially be implemented in electronic circuits that make use of memristor devices – a special type of resistor than can be programmed by applying an appropriately chosen pattern of voltages. This opens up the exciting prospect that future computer vision systems could be developed that implement computations in a way that is directly analogous to the way in which computations are performed in biological vision systems - an integrated approach to biological and computer vision systems that was proposed by David Marr over thirty years ago.
"Computations, circuits and biophysics in visual cortex: Learning to learn"
I will begin by highlighting some of the remarkable systems emerging from the last two decades of research in ML/AI. Though some of them show human-level performance in narrow domains of intelligence, the problem of intelligence is still wide open. I will argue that a new phase of basic research combining computer science with neuroscience is needed to eventually replicate intelligence in machines. Vision is a good proxy for intelligence. I will review our understanding of systems-level computations for visual recognition in the ventral stream. Classical ideas stemming from the work of Hubel and Wiesel about hierarchical processing in visual cortex have been summarized by a class of hierarchical feedforward architectures for object and action recognition based on the anatomy and the physiology of the primate visual cortex. The most recent version of this straightforward class of models is consistent with many data at different levels -- from the computational to the biophysical level. Being testable across all these levels of understanding is a very high bar, reached by very few models in neuroscience. I will speak about recent work on (1) the role of bilateral symmetry in the tuning of IT neurons, (2) learning position and scale invariance during development, and (3) learning class-specific viewpoint-tolerance. Finally, I will argue that some of us should start looking beyond the classical model, beyond the question of what is where in order to develop theories that can pass a full Turing test for vision.
"Predicting scene memorability"
When glancing at a magazine, or a book, we are continuously being exposed to photographs. Despite of this overflow of visual information, humans are extremely good at remembering thousands of pictures along with their visual details (Brady et al., 2008; Konkle et al., 2010). But not all images are created equal. Some images stitch to our minds, and we are able to recognize them even after longs periods of time. In this talk, I will focus on the problem of predicting how memorable an image is. Making memorable images is a challenging task in visualization and photography, and is generally presented as a vague concept hard to quantify. In a recent work (Isola et al., 2011) we show that the memorability of a photograph is a stable property of an image that is shared across different viewers. We introduce a database for which we have measured the probability that each picture will be remembered after a single view. We analyze some of the image features and objects that contribute to making an image memorable, and we train a predictor based on global image descriptors. We show that predicting image memorability is a task that can be addressed with current computer vision techniques, including the type of global images features used for scene recognition tasks. Work in collaboration with: P. Isola, D. Parikh, A. Torralba, J. Xiao.
"Objects and parts: Bottom-up, top-down and context"
Object recognition provides not only category and object labels, but a full interpretation of an object in terms of parts and sub-parts at multiple levels. I will describe a model that learns to create such full object interpretations. Starting from image examples, the model constructs a hierarchical representation using parts and sub-parts selected by maximizing the information delivered for recognition. Recognition of objects and their parts is obtained by a feed-forward sweep from low to high levels of the hierarchy, followed by a sweep from the high to low levels. Residual part ambiguities are resolved by using their surrounding context. I will briefly discuss relationships between the model and parts of the human visual system involved in object perception.
"Behavioural signatures of brain rhythms"
A Landau, P Fries
The measurement of neural oscillations is prevalent in a wide range of cognitive functions. In addition to linking certain rhythms to cognitive processes, neural rhythms provide a rich conceptual framework for physiological models of perception and attention. In my introduction to the symposium “rhythms for perception and attention”, I will demonstrate how behavioral measurements can provide an assay of physiological models and potentially shape and elaborate such models. In turn, the measurement of neural oscillations as they relate to behavior provides new insights to long-standing debates in cognitive theories of attention and perception.
"Attentional control in primate fronto-cingulate cortex: Selective neuronal synchronization at theta (9Hz) and beta rhythms reflect stimulus valuation and attentional rule implementation during shifts of attention"
Attentional control describes network processes, which select sensory information most relevant in a given context for prioritized processing. Understanding how brain circuitry controls attentional stimulus selection therefore requires to elucidate (1) how the relevance of stimuli, i.e. their expected value, is computed, and (2) how contextual rule information is encoded and reactivated to guide flexible shifts of attention. We identify critical neuronal network nodes subserving both functions within fronto-cingulate cortex of macaques. I survey results revealing that spiking activity of neurons within and across these nodes in fronto-cingulate cortex predict the expected value and location of attentional target stimuli at the time of attentional stimulus selection. These attentional control signals evolve with high temporal precision, with many neurons selectively synchronizing their spike times to an underlying theta or beta rhythm when attention is successfully shifted towards only one versus another target stimulus. For a subset of neurons, spike timing changed from theta to beta rhythmicity depending on the expected value of attentional targets. These findings reveal a switch-like behavior of neuronal synchronization, which was frequency-specific, anatomically-confined, and ultimately could reflect the dynamic instantiation of a functional neuronal assembly representing attentional target information.
"Gamma-band activity dissociates between bottom-up and top-down processes in human infero-temporal cortex"
We studied the role of ventral visual pathway areas in visual imagery and working memory. We analyzed intracerebral EEG recordings from the left inferior temporal lobe of an epileptic patient during working memory tasks and mental imagery. We found that high frequency gamma-band activity (50-150 Hz) in the inferior temporal gyrus (ITG) increased with memory load only during visuo-spatial, but not verbal, working memory. Using a real-time set-up to measure and visualize gamma-band activity online - BrainTV - we found a systematic activity increase in ITG when the patient was visualizing a letter (visual imagery), but not during perception of letters. In contrast, only 7 mm more medially, neurons located in the fusiform gyrus exhibited a complete opposite pattern, responding during verbal working memory retention and letter presentation, but not during imagery or visuo-spatial working memory maintenance. We conclude that neural networks supporting imagination of a visual element are not necessarily the same as those underlying perception of that element. Additionally, we present evidence that, by just reading gamma-band activity in these two recording sites, it is possible to determine, accurately and in real-time, whether a given memory content is verbal or visuo-spatial.
"Gating by alpha inhibition in attention tasks"
In order to understand the working brain as a network, it is essential to identify the mechanisms by which information is gated between regions. We here propose that information is gated by inhibiting task-irrelevant regions, thus routing information to task-relevant regions. The functional inhibition is reflected in oscillatory activity in the alpha band (8–13 Hz). From a physiological perspective the alpha activity provides pulsed inhibition reducing the processing capabilities of a given area. Specifically the framework predicts that optimal task performance will correlate with alpha activity in task-irrelevant areas. This frame is supported by recent MEG studies and laminar and single unit recordings in monkey. These studies in conjunction support the notion that alpha activity plays an essential role for routing information through the brain during. In particular alpha activity produced in infragranular areas seems to phasically modulate neuronal processing in (supra)granual layers. Future challenges involve identifying the frontal, striatal and thalamic areas involved in controlling the alpha activity produced in posterior brain regions.
"Ongoing EEG phase as a trial-by-trial signature of perceptual and attentional rhythms"
Many theories posit a role for oscillations in sensory perception and attention. An often disregarded consequence of such theories is that perceptual and attentional processes should function periodically, preferentially exerting their effects at certain phases of the oscillatory cycle but not others. Recently, our group started testing this prediction by evaluating whether the precise phase of presentation of a visual stimulus with respect to ongoing oscillations could influence perceptual and attentional performance. We showed that the phase of an EEG oscillation reflecting the rapid waxing and waning of sustained attention can predict the perception of a subsequent visual stimulus at threshold. This phase dependency was also observed for illusory perceptions (phosphenes) that were triggered by direct activation (TMS) rather than external stimulation. Similar ongoing periodicities accounted for a portion of the trial-by-trial variability of saccadic reaction times, and attentional deployment latencies. All of these phase effects were observed over fronto-central regions (sometimes along with occipital effects), and for oscillation frequencies between 7Hz and 15Hz. Our findings imply that certain perceptual and attentional mechanisms operate periodically or rhythmically, and that ongoing oscillations can be used as a signature of these rhythms.
"A Rosetta stone for interpreting brain waves in visual perception?"
P Schyns, G Thut, J Gross
Neural oscillations are ubiquitous measurements of cognitive processes and dynamic routing and gating of information. The fundamental and so far unresolved problem for neuroscience remains to understand how oscillatory activity in the brain codes information for human cognition. In a biologically-relevant cognitive task, we instructed six human observers to categorize facial expressions of emotion while we measured the observers’ EEG. We combined state-of-the-art stimulus control with statistical information theory analysis to quantify how the three parameters of oscillations (i.e. power, phase and frequency) code the visual information relevant for behavior in a cognitive task. We make three points: First, we demonstrate that phase codes considerably more information (2.4 times) relating to the cognitive task than power. Second, we show that the conjunction of power and phase coding reflects detailed visual features relevant for behavioral response—i.e. features of facial expressions predicted by behavior. Third, we demonstrate, in analogy to communication technology, that oscillatory frequencies in the brain multiplex the coding of visual features, increasing coding capacity. Together, our findings about the fundamental coding properties of neural oscillations will redirect the research agenda in neuroscience by establishing the differential role of frequency, phase and amplitude in coding behaviorally relevant information in the brain.
"The sight of sound: Interactions between audition and vision"
A Rich, R Chiou, M Stelter, T Horowitz
Auditory-visual synaesthesia, a rare condition in which sounds evoke involuntary visual experiences, provides a window into how the brain normally integrates audition and vision. I will discuss a series of experiments exploring this form of synaesthesia, and the effects of having visual experiences in response to sounds. We recorded the visual experiences reported in response to a variety of sounds, documenting consistency within an individual and patterns across synaesthetes. The characteristics were consistent with implicit mappings seen in non-synaesthetes: as pitch gets higher, synaesthetic objects become smaller, brighter/lighter and higher in spatial location. We demonstrated the involuntary nature of these experiences using a cross-modal interference paradigm manipulating the congruency of the synaesthetic experience elicited by a sound with display colour, shape and location. We then examined auditory and visual memory in these subjects. Auditory-visual synaesthesia appears to rely on the same mechanisms as cross-modal mapping in non-synaesthetes, but may have a different effect on cognitive processes like memory.
"Perceptual processing in synaesthesia"
Synaesthesia is a condition in which one property of a stimulus induces a conscious experience of an additional attribute. For example, in grapheme-colour synaesthesia, a visually presented achromatic grapheme results in synaesthetic experiences of colour. The authenticity of the condition is well established and there has been growing interest in using synaesthesia to investigate wider aspects of perception and cognition. Despite this, few studies have addressed whether synaesthesia is linked to more widespread differences in perception that extend beyond the synaesthetic experience itself. In this talk, I will discuss findings from our own neuorimaging and psychophysical investigations examining perceptual processing in synaesthesia. This will include findings from a voxel-based morphemetry study in grapheme and tone-colour synaesthesia showing that these variants of synaesthesia are associated with increased gray matter volume in posterior fusiform gyrus but a reduction in MT / V5, and psychophysical studies demonstrating differences between grapheme-colour synaesthetes and non-synaesthetes in their processing of colour and motion. I will argue that synaesthesia for colour is linked to wider perceptual manifestations than the synaesthetic experience and discuss potential mechanisms that may contribute to this.
"Unravelling grapheme-colour synaesthesia: From brain wiring to multisensory perception"
F N Newell
During the course of development, sensory systems become structurally and functionally differentiated, through a combination of genetically programmed and experience-depended processes. Synaesthesia provides a unique model in which to investigate the interplay between these processes on brain and perceptual function. Several studies have provided insight into the neural basis of, in particular, grapheme-colour synaesthesia but, curiously, there is quite a degree of variability of findings across these studies: some have failed to find any difference in activation in colour regions, others have found increased activation in non-visual areas in e.g. parietal cortex, whilst we recently found functional differences in other brain regions in synaesthetes along with multiple regions across the brain where synaesthetes have greater volumes of grey or white matter. These findings suggest that connectivity differences may be more extensive in the brains of synaesthetes than previously thought. Moreover, our findings that multiple types of synaesthesia co-occur within individuals and families, that it is associated with other perceptual characteristics, such as multisensory integration, vivid imagery, and sensory sensitivity and that differences in sensory function are observed earlier in brain processing than predicted, suggest support for a greater role of genetic influences on the multisensory brain.
"Neural basis of grapheme-colour synaesthesia"
Grapheme-colour synaesthetes arbitrarily associate a specific colour to each letter or number. Already in the XIXth century appealing theories proposed that synaesthetes exhibit extra neuronal connections between the neural centres responsible for grapheme identification and those related to colour perception, leading to spurious activity of ‘colour areas’ by graphemes. Tests of this hypothesis with brain imaging techniques delivered mixed results so far, with only some of the studies suggesting indeed activations of colour areas by achromatic graphemes and related, increased structural connectivity in synaesthetes. We observed no activation of ‘colour areas’ by graphemes in 10 synaesthetes, whatever the strength of their synaesthetic associations, and no structural difference between synaesthetes and 25 control subjects in the ‘colour regions’. The localizationist conception of visual processing is therefore too simplistic to account for the synaesthetic experience, and further research should look for distributed correlates of synaesthetic colours. Moreover, we did discover structural differences between synaesthetes and controls: synaesthetes had more white matter in the retrosplenial cortex bilaterally. The key to synaesthetic colour experience might not lie in the colour system, but may be related to the complex construction of meaning by the brain, involving not only perception, but language, memory and emotion.
"What is the neural basis of synesthetic experiences?"
H Scholte, R Rouw
In this presentation, we discuss neuroimaging data on the neural basis of synesthetic experiences. First, in accordance with its sensory nature, it seems that a particular type of synesthetic experience is related to the corresponding sensory brain areas. Moreover, have found with FA and VBM that individual differences in the highly subjective experiences are related to distinct neural mechanisms. Particularly, the outside-world or ‘projector’ experience is related to brain areas involved in perceiving and acting in the outside world, whereas the in-the-mind or ‘associator’ experience is related to the hippocampus. An review of neuroimaging results in synesthesia research furthermore shows that a network of brain areas rather than a single brain area underlies synesthesia. Six brain regions of overlapping results emerge. These regions could be related to the different cognitive processes that together shape the synesthetic experience. Finally it will be discussed that while individual differences in brain-anatomy are relevant for identifying the brain systems involved in synesthesia they do not address the structure of the neural processes that result into synesthetic experience.
"Observing and understanding action: A hierarchical approach"
Watching other people's actions provides a critical way to learn about other's intentions and about the physical world. A broad network of brain regions which respond to observation of actions has been identified, but the roles and cognitive contributions of different components of this network are still debated. I will argue that action understanding must be considered in a hierarchical framework, in which different representations of action kinematics, goals and intentions can exist simultaneously. I present data from behavioural, fMRI and new eye-tracking experiments to support these claims.
"Artificial agents to study social perception"
Artificial agents, thanks to the controlled variations in their appearance and behaviours, provide useful tools to test hypotheses about action understanding. I used these agents to investigate one theoretical framework, motor resonance, which is defined, at the behavioural and neural levels, as the automatic activation of motor control systems during perception of actions, and is particularly relevant for the motor foundations of embodied social interactions. In a series of experiment, we've found that the perception of biological motion is influenced by the anthropomorphism of the computer avatar used to render the motion, but this effect is absent if autistic children. We've reported that while perceptual processes in the human occipital and temporal lobe are more strongly engaged when perceiving a humanoid robot than a human action, the activation of higher order representations depends on attentional process for artificial agent more strongly than for human agents. Finally, investigation of android action perception in a repetition priming fMRI experiment has led us to explain the Uncanny Valley phenomenon within the predictive coding framework. Altogether, these studies using artificial agents offer valuable insight into perceptual processes and in particular motor resonance.
"Like Me? Investigating the role of experience in action perception"
E S Cross
How do we understand other people’s actions? A dominant view holds that action understanding occurs via a direct matching process that automatically maps observed actions onto one’s own motor system. It has been further argued that the neural basis of this direct matching mechanism lies within the ‘mirror system’, comprising inferior frontal and parietal cortices of the human brain. A series of functional magnetic resonance imaging experiments examine this proposal by probing how different kinds of subjective and objective action experience influence action perception. Using complex action stimuli (drawn from the realms of dance, contortion and robotics), in concert with in-depth training paradigms and functional neuroimaging, these experiments evaluate how participants' prior experience or subjective evaluation of an action is manifest in neural and behavioural activity whilst perceiving that action. Findings from this work suggest that direct matching theories of action understanding within the mirror system require updating to take into account how an observer's experience influences action perception.
"Role of prior information in action understanding"
Explaining or predicting the behaviour of our conspecifics requires the ability to infer the intentions that motivate it. Such inferences are assumed to rely on two types of information: (1) the sensory information conveyed by movement kinematics and (2) the observer’s prior expectations – acquired from past experience or derived from prior knowledge. However, the respective contribution of these two sources of information is still controversial. This controversy could stem in part from the fact that “intention” is an umbrella term that may embrace various sub-types each being assigned different scopes and targets. We hypothesized that variations in the scope and target of intentions may account for variations in the contribution of visual kinematics and prior knowledge to the intention inference process. To test this hypothesis 4 behavioural experiments were conducted in which participants were instructed to identify different types of intention: motor intentions (i.e. simple goal of a motor act), superordinate intentions (i.e. general goal of a sequence of motor acts), or social intentions (i.e. intentions accomplished in a context of reciprocal interaction). For each of the above-mentioned intentions, we also varied (1) the amount of visual information available from the action scene and (2) participant’s prior expectations concerning the intention which was more likely to be accomplished. First, we showed that intentional judgments depend on a consistent interaction between visual information and participants’ prior expectations. Moreover, we demonstrated that this interaction varied according to the type of intention to be inferred, with participant’s priors exerting a greater effect on the inference of social and superordinate intentions, to the detriment of perceptual evidence. The results are discussed by appealing to the specific properties of each type of intention considered.
"Perceiving control over joint action outcomes"
Research on agency, the feeling of control over one's actions and their consequences has so far exclusively addressed individual performance. However, when individuals perform actions together they also need to assess their individual contributions to joint action outcomes. I will report results from an experiment that addressed agency for individual contributions to joint action in a pole-balancing task. This task required continuous coordination between the two hands of two actors (joint action) or the two hands of one actor (individual condition). Agency judgments were compared between these two conditions. The results demonstrate that exclusivity was the main factor that affected perceived agency during joint action whereas performance parameters governed perceived agency during individual action.
"Gestalts emerging again 100 years later: A modern view on a radical vision"
As a radical alternative to elementalism and associationism, Gestalt theory maintained that experienced objects are fundamentally different from collections of sensations. Gestalts are dynamic structures in experience that determine what will be wholes and parts, based on continuous “whole-processes” in the brain. Already in 1912, Wertheimer hypothesized a physiological short circuit to explain phi motion. Köhler compared the dynamics of self-organization in the brain to the tendency of physical systems to approach maximum order with minimal energy. He also measured cortical currents to support his electrical field theory but experiments by Lashley and Sperry in the 1950s provided stronger counterevidence. After Hubel and Wiesel’s discovery of simple and complex cells in primary visual cortex, neuroscience assumed a predominantly reductionist, elementalist approach. Gestalt phenomena have now become of central interest again but how to understand them within our current views on how the brain operates? Can we decode Gestalts in the fMRI signals somewhere along the visual system’s hierarchy? Do they emerge in the interplay between bottom-up processing, lateral connections, and feedback processing, as synchronization of EEG activity between brain areas, or as traveling waves across the whole brain? These questions are addressed in the symposium on “Gestalts in the brain”.
"How Gestalt rules constrain the spread of attention in visual cortex"
P Roelfsema, L Stanisor, A Wannig
Visual attention can select spatial locations, features and objects. Theories of object-based attention suggest that attention enhances the representation of all parts of an object, even parts that are not task-relevant. However, the automaticity of the spread of attention to parts of an object that are not task-relevant has been disputed because previous studies did not always rule out the possibility of strategic attention shifts. Here we investigated if attention spreads automatically to task-irrelevant features in three macaque monkeys by monitoring neuronal activity in area V1 with chronically implanted electrode arrays. We trained the monkeys to make eye movements to one of two (relevant) image elements and also presented two task-irrelevant image elements that could be grouped with the relevant elements or not. One of these irrelevant image elements was placed in the receptive field of the V1 neurons to investigate if attentional response modulation spreads from the relevant to the irrelevant elements according to a number of Gestalt-grouping cues. Our first experiment tested the spread of attention from relevant to irrelevant contours that were either collinear or orthogonal (good continuation). The second experiment tested if attention spreads from relevant to irrelevant elements with the same or a different colour (similarity). Our third experiment tested the combined influence of colour-similarity and collinearity and the fourth experiment tested the effect of element motion in the same or in a different direction (common fate). When the task-irrelevant image elements in the receptive field were grouped with one of the relevant contour elements, then the selection of this element for an eye movement response influenced V1 activity. Activity was stronger if eye movement target was grouped with the element in the receptive field. In contrast, the effects of eye movement selection were comparatively weak if the relevant and irrelevant image elements were not related by grouping cues. In addition we found that the effects of grouping cues were additive: the strength of the attentional spread in case of grouping by collinearity and colour similarity was the sum of the spread caused by either grouping cue alone. We conclude that enhanced neuronal activity spreads automatically from attended image elements to elements that are not yet attended but are related to them by Gestalt grouping cues. Our results support the hypothesis that enhanced neuronal activity can highlight all the image elements that belong to a single perceptual object, and that it can thereby act to bind them together in perception.
"The neural instantiation of Gestalt principles as uncovered by lesion studies"
The Gestaltists held that figure-ground perception occurred before stored object representations are accessed. On this view, figure-ground processes first determine where a shaped entity (the figure) lies with respect to a border; the figure then accesses memory representations. The groundside of the border lacks shape, and therefore can't access shape memories. This traditional theory replicates figure-ground phenomenology, yet phenomenology doesn't necessarily illuminate process. Indeed, recent evidence indicates that a figure is more likely to be perceived on that side of a border where the parts of well-known objects are present in their familiar spatial configuration rather than in a novel arrangement. Such results support the view that properties of objects that might be perceived on opposite sides of borders are assessed in a fast pass of processing that reaches high levels, including those representing the spatial configuration of well-known objects; properties on opposite sides of borders compete; the winner is perceived as the shaped figure, the loser is perceived as a shapeless ground. This view is consistent with neurophysiological evidence and is supported by tests of visual agnosics, amnesics, and non-brain-damaged individuals, which also reveal a critical role for feedback, supporting a dynamic view of figure-ground perception.
"Brain-decoding fMRI reveals the content of neural representations underlying visual Gestalts"
H Op De Beeck
Functional imaging in humans has been very useful to highlight which brain areas are active when global patterns or Gestalts are perceived. However, psychologists and cognitive scientists are not primarily interested in where certain visual properties are computed, but rather in how these properties are computed and the associated content of the neural representations. For example, to understand why “the whole is different from the sum of its parts”, we need to investigate the properties of the representations of the whole and the parts. Recently, it has been shown that the content of neural representations is accessible by the application of multi-voxel pattern analysis methods to functional imaging data. I will describe how these methods can elucidate the representations of parts and wholes and the relationship among them. This will be illustrated with several case studies, including the configural superiority effect and the representation of scenes composed of multiple objects.
"Restless minds, wandering brains: The neurodynamics of visual awareness"
C Van Leeuwen
Brain activity is, to a large extent, spontaneous activity. Large-scale electrocortical measurement (scalp EEG or MEG) shows that this activity is characterized by a great degree of variability. Yet, this activity is far from random but reveals a patterned structure in space and time. I will discuss the spatiotemporal nature of these patterns, and address the following two questions: does spontaneous activity share some of its characteristics with patterns of EEG (or MEG) activity that are evoked by stimulation and can we identify a meaningful relationship between stimulus information and the properties of evoked activity patterns? I will present evidence that in evoked activity, there are intervals in which synchronized activity spreads through brain regions. The synchronized activity takes the form of traveling or standing waves. The duration of these intervals corresponds to the amount of information in the visual stimulus pattern: the more information in the pattern, the longer the interval duration. I propose that the intervals reflect the time needed to communicate information computed within a brain region to the rest of the brain, and that this activity reflects our awareness of a visual stimulus pattern.
"From perception to action: The role of ongoing and evoked activity"
We have studied the spatio-temporal organization of ongoing and evoked coherent activity in neuronal assemblies and the way it affects the actual behavior. In my talk I will bridge the gap between the recordings of single neurons and the recordings of large populations of neurons: from intracellular recording to LFP, VSD, EEG & fMRI; from sensory to motor processing; from anesthetized cat to alert human. We found that ongoing activity encompasses a set of dynamically switching cortical states, including the orientation pinwheel structure. The neuron is most likely to fire when this pattern (cortical state) emerges in the area surrounding the neuron. Following visual stimulation, there is a strong decrease in the correlation between the evoked cell and the surrounding neuronal population. So while in the absence of a stimulus the cortical population works together as a highly coordinated group, in the presence of a stimulus each cell tends to go its own way and follow the dictates of its receptive field properties. Furthermore, the ongoing activity affects the behavioral response of the monkey, indicating that the brain does not ‘average out’ the variability found in cortical evoked activity. Rather, this variability has a direct impact on the manifest behavior.
"Probabilistic computation: A possible functional role for spontaneous activity in the cortex"
J Fiser, P Berkes, G Orban, M Lengyel
Although a number of recent behavioral studies implied that the brain maintains probabilistic internal models of the environment for perception, motor control, and higher order cognition, the neural correlates of such models has not been characterized so far. To address this issue, we introduce a new framework with two key ingredients: the “sampling hypothesis” and spontaneous activity as a computational factor. The sampling hypothesis proposes that the cortex represent and compute with probability distributions through sampling from these distributions and neural activity reflect these samples. The second part of the proposal posits that spontaneous activity represents the prior knowledge of the cortex based on internal representations about the outside world and internal states. First, I will describe the reasoning behind the proposals, the evidence supporting them, and will derive a number of empirically testable predictions based on the framework. Next, I will provide some new results that confirm these predictions in both the visual and the auditory cortices. Finally, I will show how this framework can handle previously reported observations about trial-to-trial variability and contrast-independent coding. These results provide a general functional interpretation of the surprisingly high spontaneous activity in the sensory cortex.
"Probing perceptual consequences of ongoing activity variations"
Using functional neuroimaging and sparse event-related paradigms we have assessed the functional impact of spontaneous fluctuations of ongoing brain activity on evoked neural responses and human perceptual performance. We used sensory probes that could be either ambiguous with respect to perceptual categories (faces) or peri-liminal for a given feature (visual motion coherence). In both instances, fluctuations in ongoing signal of accordingly specialized brain regions (FFA, hMT+) biased how upcoming stimuli were perceived. Moreover, the relation between evoked and ongoing activity was not simply additive but showed an interaction with perceptual outcome. This latter observation questions the logic of event-related averaging where responses are thought to be unrelated from the level of pre-stimulus activity. We have further analyzed the functional connotation of the imaging signal by analyzing false alarm trials. Counter the notion of this signal being a proxy of sensory evidence, false alarms were preceded by especially low signal, suggesting this codes precision rather than percept. A theoretical framework that is compatible with our observations comes from the family of predictive coding models. Our findings underline that ongoing activity fluctuations are functionally relevant and should hence not be left unaccounted for as in the traditional mainstream of data analysis.
"Baseline MEG activity fluctuations in decision-related regions bias upcoming perceptual choice"
F De Lange, D Rahnev, T Donner, H Lau
Based on noisy sensory evidence, our brain “decides” what we see. While it has often been assumed that this decision process is bottom-up and passively driven by input from the sensory areas, it is becoming increasingly clear that perception is an active inferential process, in which prior expectations are combined with sensory input. The result of this inference process is what constitutes the contents of our awareness. I will present recent data that show how perception is biased by both experimentally induced and spontaneous activity fluctuations in decision-related areas. Participants performed a visual motion discrimination task, while neural activity was measured using magneto-encephalography (MEG). On some trials, subjects were given a cue, informing them about the likely direction of upcoming motion. This expectation cue induced strong baseline shifts in decision-related activity during the interval before motion stimulus onset, and strongly biased perceptual choice. Interestingly, in the absence of any cue, spontaneous fluctuations in this decision-related baseline activity also strongly biased upcoming perceptual choice. Together, these results provide a neural mechanism for how implicit and explicit expectations shape upcoming perceptual choice.
"Inattentional inflation of subjective visibility reflected by spontaneous pre-stimulus neural activity"
Attention is known to boost perceptual capacity: inattentional and change blindness experiments show that unattended events are often missed. However, subjects are usually surprised at how poorly they perform in these tasks, a reaction which suggests that they may have an inflated subjective impression of vividly perceiving the individual unattended objects. Through signal detection theoretic analysis, we showed that spatial cuing led to a conservative bias in detection, as well as lowered subjective ratings in discrimination (although it boosted signal processing capacity as expected). Based on these findings, we predicted and confirmed a negative relationship between subjective ratings in a discrimination task and pre-stimulus fMRI activity in the dorsal attention network. Intracranial EEG recording in presurgical epileptics showed similar effects. These results point to a somewhat paradoxical notion: inattention can lead to relatively liberal subjective perceptual decision making (higher hit rates and subjective ratings). We suggest this may be because inattention distorts the internal representations of signal strength statistics.
"The effect of brain state on variability in visual responses"
M Scholvinck, A Benucci, K Harris, M Carandini
Our perception of visual stimuli can vary from moment to moment, depending on factors such as wakefulness [Weismann et al, 2006, Nat Neurosci, 9(7), 971-978] and attention [Ress et al, 2000, Nat Neurosci, 3(9), 940-945]. I will argue that such differences in ‘brain state’ also affect trial-to-trial variability in responses of visual cortical neurons. Brain state generally varies between a ‘passive’ state dominated by low frequency EEG or LFP power, and an ‘active’ state characterised by increased neuronal firing. Using recordings from the visual cortex of anesthetised cats, I show that when the brain is in an active state, neurons respond independently to sequences of gratings and trial-to-trial variability is low. Conversely, during a passive brain state neurons are mostly engaged in synchronous network activity and respond less reliably to the stimulus. Current efforts aim to extend these findings to awake mice running on a spherical treadmill while being shown visual stimuli; I investigate several factors indicative of brain state, including low frequency LFP power and running behaviour, to explain trial-to-trial variability in neuronal responses to flickering gratings. Together, these studies suggest that brain state is crucial in shaping the brain’s response to a visual stimulus on a trial-to-trial basis.
"Space representation across eye movements"
Localization of targets in the environment is of ultimate importance in everyday life. Eye movements challenge this task because they continuously induce a shift of the retinal image of the outside world. Nevertheless we perceive the world as being stable. Different from introspection, however, visual stability is not perfect. In recent years, many studies have demonstrated spatial mislocalization of stimuli flashed during eye movements. During visually guided slow eye movements (smooth pursuit and slow phases of optokinetic nystagmus, OKN), perceived locations are shifted in the direction of the eye-movement. In a series of experiments, we asked (i) for the neural basis of this perceptual phenomenon and (ii) whether this mislocalization occurs also during open loop eye movements. In order to answer the latter question, human observers had to localize briefly flashed targets around the time of pursuit initiation or during a 300ms blanking of the pursuit-target during steady state pursuit. Our data clearly show that mislocalization starts well before the onset of the eye-movement and that it can also be found during the pursuit gap. The observed perceptual effects could result from a global shift of visual receptive fields (RFs) during slow eye movements with respect to their location during fixation. To test this hypothesis, we performed neurophysiological recordings in area MT of the macaque during slow phase OKN. Our results clearly show identical RF locations during fixation and OKN. Taken together, our psychophysical and neurophysiological results point towards an efference-copy of the eye-movement signals as neural source for the observed perceptual effects.
"Spatiotopic coding and remapping in humans"
D Burr, C Morrone
Saccades cause profound transient changes to vision, both to the spatial properties of receptive fields of parietal cortex of macaque monkey and to human perception. It remains unclear, however, how these transient events contribute to stability. One critical, but largely overlooked, aspect is that as receptive fields shift pre-saccadically in space to respond to the “new receptive field”, their responses are also delayed in time – leading to a receptive field that is oriented in space-time. We studied the transient peri-saccadic alterations of RFs by measuring mislocalization of pairs of brief visual stimuli presented successively at the same or different positions, one before and one after the saccade. When two bars were displayed within 40 – 120 ms of each other, no mislocalization occurred with either, even when the saccade caused them to be separated on the retina by up to 20 deg. We postulate that the interaction between the bars is mediated by a common neuronal mechanism that responds to both pre- and post-saccadic stimuli, producing perisaccadic RFs that extend both space and time to encompass both the “future” and “current” fields. These fields generate simultaneous responses to the asynchronous bar stimuli, which therefore become perceptually fused. We suggest that these mechanisms, which result in perceptual compression in both space and time, are sufficient to achieve perceptual stability.
"The search for spatiotopic representations in the human visual pathways"
Objects are initially mapped on the retina, but actions are carried out in a body-based coordinate frame. The search for an extra-retinotopic representation was therefore focused on areas implicated in visuo-motor action. An extra-retinotopic representation, however, may also be useful for object recognition and memory. We have been searching for non-retinotopic representations throughout the human visual system using behavioral and imaging techniques. Our results indicate that information on attended objects is accumulated across saccades, but trans-saccadic image matching is purely retinotopic. Spatiotopic encoding of a target is evident in the rapid inhibition of return (IOR), elicited by a cue positioned in the same screen location as the final target, in a double-step saccade task. These findings are consistent with functional imaging results, indicating that retinotopic mapping is the prevalent representation throughout the visual system, and the cortical eye fields. A notable exception to this rule is the middle section of the intraparietal sulcus (IPS), which contains information about a future saccade target in (at least) a head-centered coordinate frame. We conclude that spatial attention (when tightly linked to oculomotor intention) can be allocated to target positions defined in extra-retinal coordinates, a process that is likely to involve the middle IPS.
"Control of spatial attention by the parietal and frontal cortex"
Spatial attention mechanisms in humans and in non-human primates involve a large cortical network which encompasses discrete areas within the frontal and parietal cortices. Using reversible inactivation and single unit recording techniques we report that both the macaque frontal eye fields (FEF) and the lateral intraparietal area (LIP) exert top-down influence on the spatial and temporal deployement of visual attention. However, there appears to be differences in the specific contributions of the two areas, with the FEF possibly playing a relatively more important role in endogenous control of attention orienting while LIP might be more concerned with representing the exogenous salience of stimuli.
"Sensorimotor processing of space in the medial posterior parietal cortex"
P Fattori, R Breveglieri, C Galletti
In the dorsal visual stream, visuo-spatial information is processed for guiding actions. Visual space is represented in a retinotopic frame of reference in striate and extrastriate cortical areas, including V6, and in more complex reference frames in the medial posterior parietal cortex (PPC). In area V6A, in the medial PPC, visual space is encoded by retinopically-organized gaze-dependent visual neurons and by “real-position” cells, that have visual receptive fields that are stable in space regardless of eye movements. Area V6A is also involved in the control of prehension. It hosts reach-related neurons tuned for the direction of reach. In a number of cells, the spatial tuning is dependent on the relative location of reaching target with respect to the fixation point. In others, it is dependent on target spatial coordinates. Spatial attention drives V6A neurons and is likely useful to highlight the position in space of objects, either for gazing or manual reaching. All together these data indicate that in monkey medial PPC there is a combined representation of space and action that may be used by the effectors (eyes and arm) in visually guided actions. Interestingly, areas homologous to monkey V6 and V6A have been recently identified also in the human medial PPC.
"Egomotion, near space and obstacles: The role of human V6"
A Smith, V Cardin
Optic flow is specified in terms of local 2D velocities but egomotion occurs in 3D space. During egomotion, points near the centre of expansion of the flow field tend to be distant while those in the periphery are closer, creating gradients of horizontal binocular disparity. To assess whether the brain combines disparity gradients with optic flow when encoding egomotion, stereoscopic depth gradients were applied to expanding dot patterns during 3T MRI scanning. The gradients were radially symmetrical, forming a cone with the apex at the centre of expansion. The depth cues were either consistent with egomotion (concave disparity cone, as in a tunnel) or inconsistent (convex cone with central dots near, peripheral dots far). BOLD activity was compared in various pre-defined cortical visual areas. One area, V6, responded much more strongly to consistent than inconsistent depth added to expanding flow. All other visual areas examined (V1-V3B, V7, MT, MST, VIP, CSv) responded well to the stimuli but were indifferent to their depth-flow relationship. A possible interpretation is that MST, VIP and CSv perform general analysis of optic flow while V6 is specialized for encoding egomotion in the presence of objects in near space, perhaps to facilitate avoidance of obstacles.
|© 2010 Cerco||Last change 28/08/2011 19:39:08.|