{"title": "Who Does What? A Novel Algorithm to Determine Function Localization", "book": "Advances in Neural Information Processing Systems", "page_first": 3, "page_last": 9, "abstract": null, "full_text": "Who Does What? A Novel Algorithm to \n\nDetermine Function Localization \n\nRanit Aharonov-Barki \n\nInterdisciplinary Center for Neural Computation \nThe Hebrew University, Jerusalem 91904, Israel \n\nranit@alice.nc.huji.ac.il \n\nIsaac Meilijson and Eytan Ruppin \n\nSchool of Mathematical Sciences \nTel-Aviv University, Tel-Aviv, Israel \n\nisaco@math.tau.ac.il, ruppin@math.tau.ac.il \n\nAbstract \n\nWe introduce a novel algorithm, termed PPA (Performance Prediction \nAlgorithm), that quantitatively measures the contributions of elements \nof a neural system to the tasks it performs. The algorithm identifies the \nneurons or areas which participate in a cognitive or behavioral task, given \ndata about performance decrease in a small set of lesions. It also allows \nthe accurate prediction of performances due to multi-element lesions. \nThe effectiveness of the new algorithm is demonstrated in two models \nof recurrent neural networks with complex interactions among the ele(cid:173)\nments. The algorithm is scalable and applicable to the analysis of large \nneural networks. Given the recent advances in reversible inactivation \ntechniques, it has the potential to significantly contribute to the under(cid:173)\nstanding of the organization of biological nervous systems, and to shed \nlight on the long-lasting debate about local versus distributed computa(cid:173)\ntion in the brain. \n\n1 \n\nIntroduction \n\nEven simple nervous systems are capable of performing multiple and unrelated tasks, often \nin parallel. Each task recruits some elements of the system (be it single neurons or cortical \nareas), and often the same element participates in several tasks. This poses a difficult chal(cid:173)\nlenge when one attempts to identify the roles of the network elements, and to assess their \ncontributions to the different tasks. Assessing the importance of single neurons or cortical \nareas to specific tasks is usually achieved either by assessing the deficit in performance after \na lesion of a specific area, or by recording the activity during behavior, assuming that ar(cid:173)\neas which deviate from baseline activity are more important for the task performed. These \nclassical methods suffer from two fundamental flaws: First, they do not take into account \nthe probable case that there are complex interactions among elements in the system. E.g., \nif two neurons have a high degree of redundancy, lesioning of either one alone will not re(cid:173)\nveal its influence. Second, they are qualitative measures, lacking quantitative predictions. \n\n\fMoreover, the very nature of the contribution of a neural element is quite elusive and ill \ndefined. In this paper we propose both a rigorous, operative definition for the neuron's \ncontribution and a novel algorithm to measure it. \n\nIdentifying the contributions of elements of a system to varying tasks is often used as a \nbasis for claims concerning the degree of the distribution of computation in that system \n(e.g. [1]). The distributed representation approach hypothesizes that computation emerges \nfrom the interaction between many simple elements, and is supported by evidence that \nmany elements are important in a given task [2, 3, 4]. The local representation hypothesis \nsuggests that activity in single neurons represents specific concepts (the grandmother cell \nnotion) or performs specific computations (see [5]). This question of distributed versus \nlocalized computation in nervous systems is fundamental and has attracted ample attention. \nHowever there seems to be a lack of a unifying definition for these terms [5]. The ability of \nthe new algorithm suggested here, to quantify the contribution of elements to tasks, allows \nus to deduce both the distribution of the different tasks in the network and the degree of \nspecialization of each neuron. \n\nWe applied the Performance Prediction Algorithm (PPA) to two models of recurrent neu(cid:173)\nral networks: The first model is a network hand-crafted to exhibit redundancy, feedback \nand modulatory effects. The second consists of evolved neurocontrollers for behaving au(cid:173)\ntonomous agents [6]. In both cases the algorithm results in measures which are highly \nconsistent with what is qualitatively known a-priori about the models. The fact that these \nare recurrent networks, and not simple feed-forward ones, suggests that the algorithm can \nbe used in many classes of neural systems which pose a difficult challenge for existing \nanalysis tools. Moreover, the proposed algorithm is scalable and applicable to the analysis \nof large neural networks. It can thus make a major contribution to studying the organization \nof tasks in biological nervous systems as well as to the long-debated issue of local versus \ndistributed computation in the brain. \n\n2 \n\nIndices of Contribution, Localization and Specialization \n\n2.1 The Contribution Matrix \n\nAssume a network (either natural or artificial) of N neurons performing a set of P different \nfunctional tasks. For any given task, we would like to find the contribution vector c = \n(Cl' ... , CN), where Ci is the contribution of neuron i to the task in question. We suggest a \nrigorous and operative definition for this contribution vector, and propose an algorithm for \nits computation. \n\nSuppose a set of neurons in the network is lesioned and the network then performs the \nspecified task. The result of this experiment is described by the pair < m, Pm > where m \nis an incidence vector of length N, such that m(i) = 0 if neuron i was lesioned and 1 if \nit was intact. Pm is the peiformance of the network divided by the baseline case of a fully \nintact network. \nLet the pair < f, C >, where f is a smooth monotone non-decreasing l function and C \na normalized column vector such that E~l ICil = 1, be the pair which minimizes the \nfollowing error function \n\n1 ~ \n\n2 \nE = 2N L)f(m. c) - Pm] . \n\n{m} \n\n(1) \n\nlIt is assumed that as more important elements are lesioned (m . c decreases), the performance \n\n(Pm) decreases, and hence the postulated monotonicity of f. \n\n\fThis c will be taken as the contribution vector for the task tested, and the corresponding f \nwill be called its adjoint performance prediction function. \n\nGiven a configuration m of lesioned and intact neurons, the predicted performance of the \nnetwork is the sum of the contribution values of the intact neurons (m . c), passed through \nthe performance prediction function f. The contribution vector c accompanied by f is \noptimal in the sense that this predicted value minimizes the Mean Square Error relative to \nthe real performance, over all possible lesion configurations. \n\nThe computation of the contribution vectors is done separately for each task, using some \nsmall subset of all the 2N possible lesioning configurations. The training error Et is defined \nas in equation 1 but only averaging on the configurations present in the training set. \n\nThe Performance Prediction Algorithm (PPA): \n\n\u2022 Step 1: Choose an initial normalized contribution vector c for the task. If there is \n\nno bias for a special initial choice, set all entries to random values. \n\nRepeat steps 2 and 3 until the error Et converges or a maximal number of steps \nhas been reached: \n\n\u2022 Step 2: Compute f. Given the current c, perform isotonic regression [7] on the \npairs < m . c,Pm > in the training set. Use a smoothing spline [8] on the result \nof the regression to obtain the new f . \n\n\u2022 Step 3: Compute c. Using the current f compute new c values by training a \nperceptron with input m, weights c and transfer function f. The output of the \nperceptron is exactly f(m . c), and the target output is Pm. Hence training the \nperceptron results in finding a new vector c, that given the current function f, \nminimizes the error Et on the training set. Finally re-normalize c. \n\nThe output of the algorithm is thus a contribution value for every neuron, accompanied by \na function, such that given any configuration of lesioned neurons, one can predict with high \nconfidence the performance of the damaged network. Thus, the algorithm achieves two \nimportant goals: a) It identifies automatically the neurons or areas which participate in \na cognitive or behavioral task. b) The function f predicts the result of multiple lesions, \nallowing for non linear combinations of the effects of single lesions 2 . \n\nThe application of the PPA to all tasks defines a contribution matrix C, whose kth column \n(k = L.P) is the contribution vector computed using the above algorithm for task k, i.e. \nCik is the contribution of neuron i to task k. \n\n2.2 Localization and Specialization \n\nIntroducing the contribution matrix allows us to approach issues relating to the distribu(cid:173)\ntion of computation in a network in a quantitative manner. Here we suggest quantitative \nmeasures for localization of function and specialization of neurons. \n\nIf a task is completely distributed in the network, the contributions of all neurons to that \ntask should be identical (full equipotentiality [2]). Thus, we define the localization Lk \nof task k as a deviation from equipotentiality. Formally, Lk is the standard deviation of \ncolumn k of the contribution matrix divided by the maximal possible standard deviation. \n\nL _ \n\nstd(C*k) \n\nk - J(N _ 1)jN2 \n\n(2) \n\n2Tbe computation of t , involving a simple perceptron-based function approximation, implies the \nimmediate applicability of the PPA for large networks, given weB-behaved performance prediction \nfunctions. \n\n\f(a) \n\n(b) \n\nNeuron 7 \n\nNeuron 8 \n\n\":0 \":0 \n\n01230123 \n\nFigure 1: Hand-crafted neural network: a) Architecture of the network. Solid lines are \nweights, all of strength 1. Dashed lines indicate modulatory effects. Neurons 1 through 6 \nare spontaneously active (activity equals 1) under normal conditions. The performance of \nthe network is taken to be the activity of neuron 10. b) The activation functions of the non(cid:173)\nspontaneous neurons. The x axis is the input field and the y axis is the resulting activity of \nthe neuron. Neuron 8 has two activation functions. If both neurons 2 and 3 are switched on \nthey activate a modulating effect on neuron 8 which switches its activation function from \nthe inactive case to the active case. \n\nNote that Lk is in the range [0,1] where Lk = \u00b0 indicates full distribution and Lk = 1 \n\nindicates localization of the task to one neuron alone. The degree of localization of function \nin the whole network, L, is the simple average of Lk over all tasks. Similarly, if neuron i is \nhighly specialized for a certain task, Ci * will deviate strongly from a uniform distribution, \nand thus we define Si, the specialization of neuron i as \n\n(3) \n\n3 Results \n\nWe tested the proposed index on two types of recurrent networks. We chose to study \nrecurrent networks because they pose an especially difficult challenge, as the output units \nalso participate in the computation, and in general complex interactions among elements \nmay arise3 . We begin with a hand-crafted example containing redundancy, feedback and \nmodulation, and continue with networks that emerge from an evolutionary process. The \nevolved networks are not hand-crafted but rather their structure emerges as an outcome of \nthe selection pressure to successfully perform the tasks defined. Thus, we have no prior \nknowledge about their structure, yet they are tractable models to investigate. \n\n3.1 Hand-Crafted Example \n\nFigure 1 depicts a neural network we designed to include potential pitfalls for analysis \nprocedures aimed at identifying important neurons of the system (see details in the cap(cid:173)\ntion). Figure 2(a) shows the contribution values computed by three methods applied to this \nnetwork. The first estimation was computed as the correlation between the activity of the \n\n3In order to single out the role of output units in the computation, lesioning was performed by \n\ndecoupling their activity from the rest of the network and not by knocking them out completely. \n\n\f0.3 \n\n0.25 \n\n0.2 \n\n0.15 \n\n0.1 \n\n(a) \n\n(b) \n\n_ \n_ \n\nActiVity \nSmgle leSions \n\no PPA \n\n~:: Correlation: 0.9978 \n\n,-,,-,'-\"'\" \n\n0.7 \n\n~6 \n0 .5 \n\n~: \n\n0.1 \n\n'* \", \n\no :f;\" \no \n\n~ ' \n\n\" \n\n\" If \" \n\n/,/'({ go'! \"-\"\") \n\n0.2 \n\n0.4 \n\n0.6 \n\nActual Performance \n\nx \n0.8 \n\nFigure 2: Results of the PPA: a) Contribution values obtained using three methods: The \ncorrelation of activity to performance, single neuron lesions, and the PPA. b) Predicted ver(cid:173)\nsus actual performance using c and its adjoint performance prediction function f obtained \nby the PPA. Insert: The shape of f. \n\nneuron and the performance of the network4 . To allow for comparison between methods \nthese values were normalized to give a sum of 1. The second estimation was computed as \nthe decrease in performance due to lesioning of single neurons. Finally, we used the PPA, \ntraining on a set of 64 examples. Note that as expected the activity correlation method \nassigns a high contribution value to neuron 9, even though it actually has no significance \nin determining the performance. Single lesions fail to detect the significance of neurons \ninvolved in redundant interactions (neurons 4 - 6). The PPA successfully identifies the \nunderlying importance of all neurons in the network, even the subtle significance of the \nfeedback from neuron 10. We used a small training set (64 out of 210 configurations) con(cid:173)\ntaining lesions of either small (up to 20% chance for each neuron to be lesioned) or large \n(more than 90% chance of lesioning) degree. Convergence was achieved after 10 iterations. \n\nAs opposed to the two other methods, the PPA not only identifies and quantifies the sig(cid:173)\nnificance of elements in the network, but also allows for the prediction of performances \nfrom multi-element lesions, even if they were absent from the training set. The predicted \nperformance following a given configuration of lesioned neurons is given by f(m . c) as \nexplained in section 2.1. Figure 2(b) depicts the predicted versus actual performances on a \ntest set containing 230 configurations of varying degrees (0 - 100% chance of lesioning). \nThe correlation between the predicted value and the actual one is 0.9978, corresponding \nto a mean prediction error of only 0.0007. In principle, the other methods do not give the \npossibility to predict the performance in any straightforward way, as is evident from the \nnon-linear form of the performance prediction error (see insert of figure 2(b\u00bb. The shape \nof the performance prediction function depends on the organization of the network, and \ncan vary widely between different models (results not shown here). \n\n3.2 Evolved Neurocontrollers \n\nUsing evolutionary simulations we developed autonomous agents controlled by fully recur(cid:173)\nrent artificial neural networks. High performance levels were attained by agents performing \nsimple life-like tasks of foraging and navigation. Using various analysis tools we found a \ncommon structure of a command neuron switching the dynamics of the network between \n\n4Neuron 10 was omitted in this method of analysis since it is by definition in full correlation with \n\nthe performance. \n\n\fradically different behavioral modes [6]. Although the command neuron mechanism was a \nrobust phenomenon, the evolved networks did differ in the role other neurons performed. \nWhen only limited sensory information was available, the command neuron relied on feed(cid:173)\nback from the motor units. In other cases no such feedback was needed, but other neurons \nperformed some auxiliary computation on the sensory input. We applied the PPA to the \nevolved neurocontrollers in order to test its capabilities in a system on which we have pre(cid:173)\nviously obtained qualitative understanding, yet is still relatively complex. \n\nFigure 3 depicts the contribution values of the neurons of three successful evolved neuro(cid:173)\ncontrollers obtained using the PPA. Figure 3(a) corresponds to a neurocontroller of an agent \nequipped with a position sensor (see [6] for details), which does not require any feedback \nfrom the motor units. As can be seen these motor units indeed receive contribution values \nof near zero. Figures 3(b) and 3(c) correspond to neurocontrollers who strongly relied on \nmotor feedback for their memory mechanism to function properly. The algorithm easily \nidentifies their significance. In all three cases the command neuron receives high values as \nexpected. The performance prediction capabilities are extremely high, giving correlations \nof 0.9999, 0.9922 and 0.9967 for the three neurocontrollers, on a test set containing 100 \nlesion configurations of mixed degrees (0 - 100% chance of lesioning). We also obtained \nthe degree of localization of each network, as explained in section 2.2. The values are: \n0.56, 0.35 and 0.47 for the networks depicted in figures 3(a) 3(b) and 3(c) respectively. \nThese values are in good agreement with the qualitative descriptions of the networks we \nhave obtained using classical neuroscience tools [6]. \n\n(a) \n\nCN \n\n045 \n\nD4 \n\n~D35 \n\n\" ~ 03 \ni 025 \n\n~ 02 \n\n8015 \n\n(b) \n\nCN \n\nD1 \n005 MotorUmts \n\n1 ~3 4 \n\no \n\nI. \n5 slll!l B \n\nNeuron Number \n\n9 \n\n10 \n\n1 \n\n2 \n\n:3 \n\n5 \n\n4 \n7 \nNeuron Number \n\n6 \n\n045 \n\n04 Motor \nCl 035 Units \n~ \ng 03 \n~025 \ni 02 \n\n\u00b0 015 \n\nD1 \n\nDDS \n\n8 \n\n9 \n\n10 \n\n1 3 \n\n(c) \n\neN \n\n.. \n\n7 \n\n5 \n\n,_ .. -. \n\n11 1315 17 19 \n\n9 \nNelronNumber \n\n21 \n\nFigure 3: Contribution values of neurons in three evolved neurocontrollers: Neurons \n1-4 are motor neurons. eN is the command neuron that emerged spontaneously in all \nevolutionary runs. \n\n4 Discussion \n\nWe have introduced a novel algorithm termed PPA (Performance Prediction Algorithm) to \nmeasure the contribution of neurons to the tasks that a neural network performs. These \ncontributions allowed us to quantitatively define an index of the degree of localization of \nfunction in the network, as well as for task-specialization of the neurons. The algorithm \nuses data from performance measures of the network when different sets of neurons are \nlesioned. Theoretically, pathological cases can be devised where very large training sets \nare needed for correct estimation. However it is expected that many cases are well-behaved \nand will demonstrate behaviors similar to the models we have used as test beds, i.e. that \na relatively small subset suffices as a training set. It is predicted that larger training sets \ncontaining different degrees of damage will be needed to achieve good results for systems \nwith higher redundancy and complex interactions. We are currently working on studying \nthe nature of the training set needed to achieve satisfying results, as this in itself may reveal \ninformation on the types of interactions between elements in the system. \n\n\fWe have applied the algorithm to two types of artificial recurrent neural networks, and \ndemonstrated that it results in agreement with our qualitative a-priori notions and with \nqualitative classical analysis methods. We have shown that estimation of the importance of \nsystem elements using simple activity measures and single lesions, may be misleading. The \nnew PPA is more robust as it takes into account interactions of higher degrees. Moreover \nit serves as a powerful tool for predicting damage caused by multiple lesions, a feat that is \ndifficult even when one can accurately estimate the contributions of single elements. The \nshape of the performance prediction function itself may also reveal important features of \nthe organization of the network, e.g. its robustness to neuronal death. \n\nThe prediction capabilities of the algorithm can be used for regularization of recurrent net(cid:173)\nworks. Regularization in feed-forward networks has been shown to improve performance \nsignificantly, and algorithms have been suggested for effective pruning [9]. However, net(cid:173)\nworks with feedback (e.g. Elman-like networks) pose a difficult problem, as it is hard \nto determine which elements should be pruned. As the PPA can be applied on the level \nof single synapses as well as single neurons, it suggests a natural algorithm for effective \nregularization, pruning the elements by order of their contribution values. \n\nRecently a large variety of reversible inactivation techniques (e.g. cooling) have emerged in \nneuroscience. These methods alleviate many of the problematic aspects of the classical le(cid:173)\nsion technique (ablation), enabling the acquisition of reliable data from multiple lesions of \ndifferent configurations (for a review see [10]). It is most likely that a plethora of data will \naccumulate in the near future. The sensible integration of such data will require quantita(cid:173)\ntive methods, to complement the available qualitative ones. The promising results achieved \nwith artificial networks and the potential scalability of the PPA lead us to believe that it \nwill prove extremely useful in obtaining insights into the organization of natural nervous \nsystems. \n\nAcknowledgments \nWe greatly acknowledge the valuable contributions made by Ehud Lehrer, Hanoch Gutfre(cid:173)\nund and Tuvik Beker . \n\nReferences \n\n[1] J. Wu, L. B. Cohen, and C. X. Falk. Neuronal activity during different behaviors in aplysia: A \n\ndistributed organiation? Science, 263:820--822, 1994. \n\n[2] K. S. Lashley. Brain Mechanisms in Intelligence. University of Chicago Press, Chicago, 1929. \n[3] J. L. McClelland, elhart D.E. Ru, and the PDP Research Group. Parallel Distributed Process(cid:173)\n\ning: Explorations in the Microstructure of Cognition, Volume 2: Psychological and Biological \nModels. MIT Press, Massachusetts, 1986. \n\n[4] J. J. Hopfield. Neural networks and physical systems with emergent collective computational \n\nabilities. Proceedings of the National Academy of Sciences, USA, 79:2554-2558, 1982. \n\n[5] S. Thorpe. Localized versus distributed representations. In M. A. Arbib, editor, Handbook of \n\nBrain Theory and Neural Networks. MIT Press, Massachusetts, 1995. \n\n[6] R. Aharonov-Barki, T. Beker, and E. Ruppin. Emergence of memory-driven command neurons \n\nin evolved artificial agents. Neural Computation, In print. \n\n[7] R. Barlow, D. Bartholemew, J. Bremner, and H. Brunk. Statistical Inference Under Order \n\nRestrictions. John Wiley, New York, 1972. \n\n[8] G. D. Knott. Interpolating cubic splines. In National Science Foundation J. C. Chemiavsky, \n\neditor, Progress in Computer Science and Applied Logic. Birkhauser, 2000. \n\n[9] R. Reed. Prunning algorithms - a survey. IEEE Trans. on Neural Networks, 4(5):740--747, \n\n1993. \n\n[10] S. G. Lomber. The advantages and limitations of permanent or reversible deactivation tech(cid:173)\nniques in the assessment of neural function. 1. of Neuroscience Methods, 86: 109- 117, 1999. \n\n\f", "award": [], "sourceid": 1799, "authors": [{"given_name": "Ranit", "family_name": "Aharonov-Barki", "institution": null}, {"given_name": "Isaac", "family_name": "Meilijson", "institution": null}, {"given_name": "Eytan", "family_name": "Ruppin", "institution": null}]}