注:机翻,未校。
Connectionism 联结主义
*First published Sun May 18, 1997; substantive revision Fri Aug 16, 2019
Connectionism is a movement in cognitive science that hopes to explain intellectual abilities using artificial neural networks (also known as “neural networks” or “neural nets”). Neural networks are simplified models of the brain composed of large numbers of units (the analogs of neurons) together with weights that measure the strength of connections between the units. These weights model the effects of the synapses that link one neuron to another. Experiments on models of this kind have demonstrated an ability to learn such skills as face recognition, reading, and the detection of simple grammatical structure.
联结主义是认知科学中的一场运动,希望使用人工神经网络(也称为“神经网络”或“神经网络”)来解释智力能力。神经网络是大脑的简化模型,由大量单元(神经元的类似物)和测量单元之间连接强度的权重组成。这些权重模拟了将一个神经元连接到另一个神经元的突触的影响。对此类模型的实验表明,它能够学习人脸识别、阅读和检测简单语法结构等技能。
Philosophers have become interested in connectionism because it promises to provide an alternative to the classical theory of the mind: the widely held view that the mind is something akin to a digital computer processing a symbolic language. Exactly how and to what extent the connectionist paradigm constitutes a challenge to classicism has been a matter of hot debate in recent years.
哲学家们对联结主义产生了兴趣,因为它有望为经典的心灵理论提供另一种选择:人们普遍持有的观点,即心灵类似于数字计算机处理符号语言。连接主义范式究竟如何以及在多大程度上构成了对古典主义的挑战,一直是近年来的激烈争论。
1. A Description of Neural Networks1. 神经网络描述
A neural network consists of large number of units joined together in a pattern of connections. Units in a net are usually segregated into three classes: input units, which receive information to be processed, output units where the results of the processing are found, and units in between called hidden units. If a neural net were to model the whole human nervous system, the input units would be analogous to the sensory neurons, the output units to the motor neurons, and the hidden units to all other neurons.
神经网络由大量以连接模式连接在一起的单元组成。网络中的单元通常分为三类:接收要处理的信息的输入单元、找到处理结果的输出单元以及介于两者之间的称为隐藏单元的单元。如果神经网络要对整个人类神经系统进行建模,则输入单元将类似于感觉神经元,输出单元类似于运动神经元,隐藏单元类似于所有其他神经元。
Here is a simple illustration of a simple neural net:下面是一个简单神经网络的简单示例:
Each input unit has an activation value that represents some feature external to the net. An input unit sends its activation value to each of the hidden units to which it is connected. Each of these hidden units calculates its own activation value depending on the activation values it receives from the input units. This signal is then passed on to output units or to another layer of hidden units. Those hidden units compute their activation values in the same way, and send them along to their neighbors. Eventually the signal at the input units propagates all the way through the net to determine the activation values at all the output units.
每个 input unit 都有一个 activation 值,该值代表 net 外部的一些特征。输入单元将其激活值发送到它所连接的每个隐藏单元。这些隐藏单元中的每一个都会根据它从输入单元接收的激活值来计算自己的激活值。然后,此信号将传递到 output units 或另一层隐藏单元。这些隐藏的单位以相同的方式计算它们的激活值,并将它们发送给它们的邻居。最终,input units 的信号一直传播到网络中,以确定所有 output units 的激活值。
The pattern of activation set up by a net is determined by the weights, or strength of connections between the units. Weights may be either positive or negative. A negative weight represents the inhibition of the receiving unit by the activity of a sending unit. The activation value for each receiving unit is calculated according a simple activation function. Activation functions vary in detail, but they all conform to the same basic plan. The function sums together the contributions of all sending units, where the contribution of a unit is defined as the weight of the connection between the sending and receiving units times the sending unit’s activation value. This sum is usually modified further, for example, by adjusting the activation sum to a value between 0 and 1 and/or by setting the activation to zero unless a threshold level for the sum is reached. Connectionists presume that cognitive functioning can be explained by collections of units that operate in this way. Since it is assumed that all the units calculate pretty much the same simple activation function, human intellectual accomplishments must depend primarily on the settings of the weights between the units.
网络设置的激活模式由权重或单元之间的连接强度决定。权重可以是正数或负数。负权重表示发送单元的活动对接收单元的抑制。每个接收单位的激活值是根据简单的激活函数计算的。激活函数的细节各不相同,但它们都符合相同的基本计划。该函数将所有发送单元的贡献相加,其中单位的贡献定义为发送和接收单元之间的连接权重乘以发送单元的激活值。此总和通常会进一步修改,例如,通过将激活总和调整为介于 0 和 1 之间的值和/或将激活设置为零,除非达到总和的阈值级别。连接论者假设认知功能可以用以这种方式运作的单元集合来解释。由于假设所有单位计算的简单激活函数几乎相同,因此人类的智力成就必须主要取决于单位之间权重的设置。
The kind of net illustrated above is called a feed forward net. Activation flows directly from inputs to hidden units and then on to the output units. More realistic models of the brain would include many layers of hidden units, and recurrent connections that send signals back from higher to lower levels. Such recurrence is necessary in order to explain such cognitive features as short-term memory. In a feed forward net, repeated presentations of the same input produce the same output every time, but even the simplest organisms habituate to (or learn to ignore) repeated presentation of the same stimulus. Connectionists tend to avoid recurrent connections because little is understood about the general problem of training recurrent nets. However Elman (1991) and others have made some progress with simple recurrent nets, where the recurrence is tightly constrained.
上面说明的那种网络称为前馈网络。激活直接从 inputs 流向 hidden 单元,然后流向 output units。更真实的大脑模型将包括多层隐藏单元,以及将信号从较高级别发送回较低级别的循环连接。这种复发对于解释短期记忆等认知特征是必要的。在前馈网络中,重复呈现相同的输入每次都会产生相同的输出,但即使是最简单的生物也会习惯于(或学会忽略)重复呈现相同的刺激。连接论者倾向于避免循环连接,因为对训练循环网络的一般问题知之甚少。然而,Elman (1991) 和其他人在简单循环网络方面取得了一些进展,其中循环受到严格限制。
2. Neural Network Learning and Backpropagation2. 神经网络学习和反向传播
Finding the right set of weights to accomplish a given task is the central goal in connectionist research. Luckily, learning algorithms have been devised that can calculate the right weights for carrying out many tasks (see Hinton 1992 for an accessible review). These fall into two broad categories: supervised and unsupervised learning. Hebbian learning is the best known unsupervised form. As each input is presented to the net, weights between nodes that are active together are increased, while those weights connecting nodes that are not active together are decreased. This form of training is especially useful for building nets that can classify the input into useful categories. The most widely used supervised algorithm is called backpropagation. To use this method, one needs a training set consisting of many examples of inputs and their desired outputs for a given task. This external set of examples “supervises” the training process. If, for example, the task is to distinguish male from female faces, the training set might contain pictures of faces together with an indication of the sex of the person depicted in each one. A net that can learn this task might have two output units (indicating the categories male and female) and many input units, one devoted to the brightness of each pixel (tiny area) in the picture. The weights of the net to be trained are initially set to random values, and then members of the training set are repeatedly exposed to the net. The values for the input of a member are placed on the input units and the output of the net is compared with the desired output for this member. Then all the weights in the net are adjusted slightly in the direction that would bring the net’s output values closer to the values for the desired output. For example, when male’s face is presented to the input units the weights are adjusted so that the value of the male output unit is increased and the value of the female output unit is decreased. After many repetitions of this process the net may learn to produce the desired output for each input in the training set. If the training goes well, the net may also have learned to generalize to the desired behavior for inputs and outputs that were not in the training set. For example, it may do a good job of distinguishing males from females in pictures that were never presented to it before.
找到正确的权重集来完成给定的任务是联结主义研究的中心目标。幸运的是,已经设计出学习算法,可以计算出执行许多任务的正确权重(参见 Hinton 1992 的无障碍评论)。这些学习分为两大类:监督学习和无监督学习。Hebbian 学习是最著名的无监督形式。当每个输入呈现给网络时,一起处于活动状态的节点之间的权重会增加,而将非活动节点连接在一起的节点之间的权重会减少。这种形式的训练对于构建可以将输入分类为有用类别的网络特别有用。使用最广泛的监督算法称为反向传播。要使用这种方法,需要一个训练集,其中包含给定任务的许多输入示例及其所需输出。这组外部示例“监督”培训过程。例如,如果任务是区分男性和女性面孔,则训练集可能包含面孔图片以及每张面孔中描绘的人物的性别指示。可以学习此任务的网络可能有两个输出单元(表示男性和女性类别)和许多输入单元,一个专门用于图片中每个像素(微小区域)的亮度。要训练的网络的权重最初设置为随机值,然后训练集的成员反复暴露在网络中。将杆件的输入值放在输入单位上,并将网络的输出与该杆件的期望输出进行比较。然后,沿使网络的输出值更接近所需输出值的方向略微调整网络中的所有权重。例如,当男性的面部呈现给输入单元时,将调整权重,以便增加男性输出单位的值,减少女性输出单位的值。在多次重复此过程后,网络可能会学习为训练集中的每个输入生成所需的输出。如果训练进展顺利,网络可能也已经学会了推广到不在训练集中的输入和输出的所需行为。例如,它可能很好地区分以前从未呈现给它的图片中的男性和女性。
Training nets to model aspects of human intelligence is a fine art. Success with backpropagation and other connectionist learning methods may depend on quite subtle adjustment of the algorithm and the training set. Training typically involves hundreds of thousands of rounds of weight adjustment. Given the limitations of computers in the past, training a net to perform an interesting task took days or even weeks. More recently, the use of massively parallel dedicated processors (GPUs) has helped relieve these heavy computational burdens. But even here, some limitations to connectionist theories of learning will remain to be faced. Humans (and many less intelligent animals) display an ability to learn from single examples; for example, a child shown a novel two-wheeled vehicle and given the name “Segway”, knows right away what a Segway is (Lake, Zaremba et al. 2015). Connectionist learning techniques such as backpropagation are far from explaining this kind of “one shot” learning.
训练网络对人类智能的各个方面进行建模是一门艺术。反向传播和其他连接主义学习方法的成功可能取决于算法和训练集的非常微妙的调整。训练通常涉及数十万轮体重调整。鉴于过去计算机的局限性,训练网络执行一项有趣的任务需要几天甚至几周的时间。最近,大规模并行专用处理器 (GPU) 的使用帮助减轻了这些沉重的计算负担。但即使在这里,联结主义学习理论的一些限制仍将有待面对。人类(以及许多智力较低的动物)表现出从单个例子中学习的能力;例如,一个孩子展示了一辆新颖的两轮车,并被命名为“Segway”,他马上就知道 Segway 是什么(Lake, Zaremba et al. 2015)。连接主义学习技术,如反向传播,远不能解释这种 “一次性” 学习。
3. Samples of What Neural Networks Can Do3. 神经网络可以做什么的示例
Connectionists have made significant progress in demonstrating the power of neural networks to master cognitive tasks. Here are three well-known experiments that have encouraged connectionists to believe that neural networks are good models of human intelligence. One of the most attractive of these efforts is Sejnowski and Rosenberg’s 1987 work on a net that can read English text called NETtalk. The training set for NETtalk was a large data base consisting of English text coupled with its corresponding phonetic output, written in a code suitable for use with a speech synthesizer. Tapes of NETtalk’s performance at different stages of its training are very interesting listening. At first the output is random noise. Later, the net sounds like it is babbling, and later still as though it is speaking English double-talk (speech that is formed of sounds that resemble English words). At the end of training, NETtalk does a fairly good job of pronouncing the text given to it. Furthermore, this ability generalizes fairly well to text that was not presented in the training set.
联结主义者在展示神经网络掌握认知任务的能力方面取得了重大进展。以下是三个著名的实验,它们鼓励连接主义者相信神经网络是人类智能的良好模型。这些努力中最吸引人的工作之一是 Sejnowski 和 Rosenberg 在 1987 年研究的一款名为 NETtalk 的可以阅读英文文本的网络。NETtalk 的训练集是一个大型数据库,由英语文本及其相应的语音输出组成,以适合与语音合成器一起使用的代码编写。NETtalk 在训练的不同阶段的表现磁带非常有趣。起初,输出是随机噪声。后来,网络听起来像是在咿呀学语,后来仍然像在说英语双语(由类似于英语单词的声音组成的语音)。在训练结束时,NETtalk 在发音给它的文本方面做得相当好。此外,此功能相当适用于训练集中未显示的文本。
Another influential early connectionist model was a net trained by Rumelhart and McClelland (1986) to predict the past tense of English verbs. The task is interesting because although most of the verbs in English (the regular verbs) form the past tense by adding the suffix “-ed”, many of the most frequently verbs are irregular (“is” / “was”, “come” / “came”, “go” / “went”). The net was first trained on a set containing a large number of irregular verbs, and later on a set of 460 verbs containing mostly regulars. The net learned the past tenses of the 460 verbs in about 200 rounds of training, and it generalized fairly well to verbs not in the training set. It even showed a good appreciation of “regularities” to be found among the irregular verbs (“send” / “sent”, “build” / “built”; “blow” / “blew”, “fly” / “flew”). During learning, as the system was exposed to the training set containing more regular verbs, it had a tendency to overregularize, i.e., to combine both irregular and regular forms: (“break” / “broked”, instead of “break” / “broke”). This was corrected with more training. It is interesting to note that children are known to exhibit the same tendency to overregularize during language learning. However, there is hot debate over whether Rumelhart and McClelland’s is a good model of how humans actually learn and process verb endings. For example, Pinker and Prince (1988) point out that the model does a poor job of generalizing to some novel regular verbs. They believe that this is a sign of a basic failing in connectionist models. Nets may be good at making associations and matching patterns, but they have fundamental limitations in mastering general rules such as the formation of the regular past tense. These complaints raise an important issue for connectionist modelers, namely whether nets can generalize properly to master cognitive tasks involving rules. Despite Pinker and Prince’s objections, many connectionists believe that generalization of the right kind is still possible (Niklasson & van Gelder 1994).
另一个有影响力的早期连接主义模型是由 Rumelhart 和 McClelland (1986) 训练的用于预测英语动词过去时态的 net。这个任务很有趣,因为虽然英语中的大多数动词(常规动词)通过添加后缀 “-ed” 来形成过去时,但许多最常见的动词是不规则的(“is” / “was”, “come” / “came”, “go” / “went”)。该网络首先在包含大量不规则动词的集合上进行训练,然后在一组 460 个动词上进行训练,其中大部分包含常规动词。该网络在大约 200 轮训练中学习了 460 个动词的过去时态,并且它相当好地推广到不在训练集中的动词。它甚至表现出对不规则动词(“send” / “sent”, “build” / “built”;“blow” / “blew”, “fly” / “flew”)。在学习过程中,由于系统暴露在包含更多常规动词的训练集中,它有过度正则化的趋势,即结合不规则和常规形式:(“break” / “broked”,而不是“break” / “broke”)。这通过更多的培训得到了纠正。有趣的是,众所周知,儿童在语言学习过程中表现出相同的过度规则化倾向。然而,关于 Rumelhart 和 McClelland 是否是人类实际学习和处理动词词尾的良好模型,存在着激烈的争论。例如,Pinker 和 Prince (1988) 指出,该模型在推广到一些新颖的常规动词方面做得很差。他们认为,这是联结主义模式基本失败的标志。网络可能擅长建立关联和匹配模式,但它们在掌握一般规则(例如常规过去时的形成)方面存在根本性限制。这些抱怨为联结主义建模者提出了一个重要问题,即网络是否能够正确泛化以掌握涉及规则的认知任务。尽管平克和普林斯反对,许多联结主义者认为,正确的概括仍然是可能的(Niklasson & van Gelder 1994)。
Elman’s 1991 work on nets that can appreciate grammatical structure has important implications for the debate about whether neural networks can learn to master rules. Elman trained a simple recurrent network to predict the next word in a large corpus of English sentences. The sentences were formed from a simple vocabulary of 23 words using a subset of English grammar. The grammar, though simple, posed a hard test for linguistic awareness. It allowed unlimited formation of relative clauses while demanding agreement between the head noun and the verb. So for example, in the sentence Any man that chases dogs that chase cats … runs.
Elman 1991 年关于可以欣赏语法结构的网络的研究对关于神经网络是否可以学习掌握规则的争论具有重要意义。Elman 训练了一个简单的递归网络来预测大量英语句子语料库中的下一个单词。这些句子是由 23 个单词的简单词汇组成的,使用了英语语法的子集。语法虽然简单,但对语言意识构成了严峻的考验。它允许无限制地形成关系从句,同时要求主名词和动词之间达成一致。例如,在句子任何追逐狗的人追猫的人…都在跑.
the singular “man” must agree with the verb “runs” despite the intervening plural nouns (“dogs”, “cats”) which might cause the selection of “run”. One of the important features of Elman’s model is the use of recurrent connections. The values at the hidden units are saved in a set of so called context units, to be sent back to the input level for the next round of processing. This looping back from hidden to input layers provides the net with a rudimentary form of memory of the sequence of words in the input sentence. Elman’s nets displayed an appreciation of the grammatical structure of sentences that were not in the training set. The net’s command of syntax was measured in the following way. Predicting the next word in an English sentence is, of course, an impossible task. However, these nets succeeded, at least by the following measure. At a given point in an input sentence, the output units for words that are grammatical continuations of the sentence at that point should be active and output units for all other words should be inactive. After intensive training, Elman was able to produce nets that displayed perfect performance on this measure including sentences not in the training set. The work of Christiansen and Chater (1999a) and Morris, Cottrell, and Elman (2000) extends this research to more complex grammars. For a broader view of progress in connectionist natural language processing see summaries by Christiansen and Chater (1999b), and Rohde and Plaut (2003).
单数 “man ” 必须与动词 “run s ” 一致,尽管中间有复数名词 (“dogs”, “cats”) 这可能会导致选择 “run”。Elman 模型的一个重要特征是使用递归连接。隐藏单位的值保存在一组所谓的上下文单位中,以发送回输入级别进行下一轮处理。这种从隐藏层到输入层的循环为网络提供了对输入句子中单词序列的基本记忆形式。Elman 的网显示了对不在训练集中的句子的语法结构的欣赏。网络对语法的控制是用以下方式衡量的。预测英语句子中的下一个单词当然是一项不可能完成的任务。然而,这些网络成功了,至少从以下方面来看是这样。在输入句子的给定点,在该点作为句子的语法延续的单词的输出单位应为 active,所有其他单词的输出单位应为 inactive。经过强化训练,Elman 能够制作出在这项指标上表现出完美表现的网络,包括不在训练集中的句子。Christiansen 和 Chater (1999a) 以及 Morris、Cottrell 和 Elman (2000) 的工作将这项研究扩展到更复杂的语法。有关连接主义自然语言处理进展的更广泛观点,请参见 Christiansen 和 Chater (1999b) 以及 Rohde 和 Plaut (2003) 的摘要。
Although this performance is impressive, there is still a long way to go in training nets that can process a language like English. Furthermore, doubts have been raised about the significance of Elman’s results. For example, Marcus (1998, 2001) argues that Elman’s nets are not able to generalize this performance to sentences formed from a novel vocabulary. This, he claims, is a sign that connectionist models merely associate instances, and are unable to truly master abstract rules. On the other hand, Phillips (2002) argues that classical architectures are no better off in this respect. The purported inability of connectionist models to generalize performance in this way has become an important theme in the systematicity debate. (See Section 7 below.)
尽管这一性能令人印象深刻,但要实现能够处理英语等语言的训练网,还有很长的路要走。此外,人们对 Elman 结果的重要性提出了怀疑。例如,Marcus (1998, 2001) 认为 Elman 的 nets 无法将这种表现推广到由新词汇组成的句子。他声称,这表明联结主义模型只是关联实例,无法真正掌握抽象规则。另一方面,Phillips (2002) 认为古典建筑在这方面也好不到哪里去。据称连接主义模型无法以这种方式推广性能已成为系统性辩论的一个重要主题。(请参阅下面的第 7 节。
A somewhat different concern about the adequacy of connectionist language processing focuses on tasks that mimic infant learning of simple artificial grammars. Data on reaction time confirms that infants can learn to distinguish well-formed from ill-formed sentences in a novel language created by experimenters. Shultz and Bale (2001) report success in training neural nets on the same task. Vilcu and Hadley (2005) object that this work fails to demonstrate true acquisition of the grammar, but see Shultz and Bale (2006) for a detailed reply.
对联结主义语言处理的充分性,一个有点不同的关注点集中在模仿婴儿学习简单人工语法的任务上。反应时间数据证实,婴儿可以学习在实验者创造的新颖语言中区分格式正确的句子和格式错误的句子。Shultz 和 Bale (2001) 报告了在同一任务中训练神经网络的成功。Vilcu 和 Hadley (2005) 反对这项工作未能证明语法的真正习得,但参见 Shultz 和 Bale (2006) 的详细回答。
4. Strengths and Weaknesses of Neural Network Models4. 神经网络模型的优缺点
Philosophers are interested in neural networks because they may provide a new framework for understanding the nature of the mind and its relation to the brain (Rumelhart & McClelland 1986: Chapter 1). Connectionist models seem particularly well matched to what we know about neurology. The brain is indeed a neural net, formed from massively many units (neurons) and their connections (synapses). Furthermore, several properties of neural network models suggest that connectionism may offer an especially faithful picture of the nature of cognitive processing. Neural networks exhibit robust flexibility in the face of the challenges posed by the real world. Noisy input or destruction of units causes graceful degradation of function. The net’s response is still appropriate, though somewhat less accurate. In contrast, noise and loss of circuitry in classical computers typically result in catastrophic failure. Neural networks are also particularly well adapted for problems that require the resolution of many conflicting constraints in parallel. There is ample evidence from research in artificial intelligence that cognitive tasks such as object recognition, planning, and even coordinated motion present problems of this kind. Although classical systems are capable of multiple constraint satisfaction, connectionists argue that neural network models provide much more natural mechanisms for dealing with such problems.
哲学家们对神经网络感兴趣,因为它们可能为理解心灵的本质及其与大脑的关系提供新的框架(Rumelhart & McClelland 1986:第一章)。连接主义模型似乎与我们对神经学的了解特别吻合。大脑确实是一个神经网络,由大量单元(神经元)及其连接(突触)形成。此外,神经网络模型的几个特性表明,联结主义可能特别忠实地描绘了认知加工的本质。神经网络在面对现实世界带来的挑战时表现出强大的灵活性。嘈杂的输入或单元的销毁会导致功能正常降级。网络的回应仍然是适当的,尽管不太准确。相比之下,传统计算机中的噪声和电路丢失通常会导致灾难性的故障。神经网络也特别适用于需要并行解决许多冲突约束的问题。人工智能研究有充分的证据表明,物体识别、规划甚至协调运动等认知任务都存在此类问题。尽管经典系统能够实现多重约束满足,但连接论者认为神经网络模型为处理此类问题提供了更自然的机制。
Over the centuries, philosophers have struggled to understand how our concepts are defined. It is now widely acknowledged that trying to characterize ordinary notions with necessary and sufficient conditions is doomed to failure. Exceptions to almost any proposed definition are always waiting in the wings. For example, one might propose that a tiger is a large black and orange feline. But then what about albino tigers? Philosophers and cognitive psychologists have argued that categories are delimited in more flexible ways, for example via a notion of family resemblance or similarity to a prototype. Connectionist models seem especially well suited to accommodating graded notions of category membership of this kind. Nets can learn to appreciate subtle statistical patterns that would be very hard to express as hard and fast rules. Connectionism promises to explain flexibility and insight found in human intelligence using methods that cannot be easily expressed in the form of exception free principles (Horgan & Tienson 1989, 1990), thus avoiding the brittleness that arises from standard forms of symbolic representation.
几个世纪以来,哲学家们一直在努力理解我们的概念是如何定义的。现在人们普遍承认,试图用必要和充分的条件来描述普通概念注定要失败。几乎所有拟议定义的例外情况总是在等待。例如,有人可能会提出老虎是一种大型黑色和橙色的猫科动物。但是白化老虎呢?哲学家和认知心理学家认为,类别以更灵活的方式划分,例如通过家庭相似性或与原型相似性的概念。联结主义模型似乎特别适合容纳这种类别成员的分级概念。网络可以学会欣赏微妙的统计模式,这些模式很难用硬性规定来表达。联结主义承诺使用无法轻易以无例外原则的形式表达的方法(Horgan & Tienson 1989, 1990)来解释人类智能中发现的灵活性和洞察力,从而避免了由标准形式的符号表示产生的脆弱性。
Despite these intriguing features, there are some weaknesses in connectionist models that bear mentioning. First, most neural network research abstracts away from many interesting and possibly important features of the brain. For example, connectionists usually do not attempt to explicitly model the variety of different kinds of brain neurons, nor the effects of neurotransmitters and hormones. Furthermore, it is far from clear that the brain contains the kind of reverse connections that would be needed if the brain were to learn by a process like backpropagation, and the immense number of repetitions needed for such training methods seems far from realistic. Attention to these matters will probably be necessary if convincing connectionist models of human cognitive processing are to be constructed. A more serious objection must also be met. It is widely felt, especially among classicists, that neural networks are not particularly good at the kind of rule based processing that is thought to undergird language, reasoning, and higher forms of thought. (For a well known critique of this kind see Pinker and Prince 1988.) We will discuss the matter further when we turn to the systematicity debate.
尽管有这些有趣的特征,但联结主义模型也有一些弱点值得一提。首先,大多数神经网络研究都抽象出了大脑的许多有趣且可能重要的特征。例如,连接论者通常不会尝试明确模拟各种不同种类的大脑神经元,也不会尝试对神经递质和激素的影响进行建模。此外,如果大脑通过反向传播等过程学习,大脑是否包含所需的那种反向连接还远不清楚,而且这种训练方法所需的大量重复似乎远非现实。如果要构建令人信服的人类认知处理连接主义模型,那么对这些问题的关注可能是必要的。还必须满足更严重的反对意见。人们普遍认为,尤其是在古典主义者中,神经网络并不特别擅长那种基于规则的处理,这种处理被认为是语言、推理和更高形式的思想的基础。(有关此类众所周知的批评,请参见 Pinker 和 Prince 1988。当我们转向系统性辩论时,我们将进一步讨论这个问题。
There has been a cottage industry in developing more biologically-plausible algorithms for error-driven training that can be shown to approximate the results of backpropagation without its implausible features. Prominent examples include O’Reilly’s Generalized Error Recirculation algorithm (O’Reilly 1996), using randomized error signals rather than error signals individually computed for each neuron (Lillicrap, Cownden, Tweed, & Akerman 2016), and modifying weights using spike-timing dependent plasticity–the latter of which has been a favorite of prominent figures in deep learning research (Bengio et al. 2017). (For more on deep learning see section 11 below.)
在开发更具生物学合理性的误差驱动训练算法方面,已经存在着一种家庭作坊式的产业,这些算法可以被证明可以近似反向传播的结果,而没有其难以置信的特征。突出的例子包括 O’Reilly 的广义误差再循环算法(O’Reilly 1996),使用随机误差信号而不是为每个神经元单独计算的误差信号(Lillicrap, Cownden, Tweed, & Akerman 2016),以及使用尖峰时间依赖的可塑性修改权重–后者一直是深度学习研究中杰出人物的最爱(Bengio et al. 2017)。(有关深度学习的更多信息,请参阅下面的第 11 节。
5. The Shape of the Controversy between Connectionists and Classicists5. 联结论者和古典论者之间争论的形状
The last forty years have been dominated by the classical view that (at least higher) human cognition is analogous to symbolic computation in digital computers. On the classical account, information is represented by strings of symbols, just as we represent data in computer memory or on pieces of paper. The connectionist claims, on the other hand, that information is stored non-symbolically in the weights, or connection strengths, between the units of a neural net. The classicist believes that cognition resembles digital processing, where strings are produced in sequence according to the instructions of a (symbolic) program. The connectionist views mental processing as the dynamic and graded evolution of activity in a neural net, each unit’s activation depending on the connection strengths and activity of its neighbors.
在过去的 40 年里,人们一直认为(至少更高)人类认知类似于数字计算机中的符号计算。在经典账户中,信息由一串串符号表示,就像我们在计算机内存或纸上表示数据一样。另一方面,连接论者声称,信息以非符号方式存储在神经网络单元之间的权重或连接强度中。古典主义者认为,认知类似于数字处理,其中字符串是根据(符号)程序的指令按顺序生成的。联结主义者将心理处理视为神经网络中活动的动态和分级演变,每个单元的激活取决于其邻居的连接强度和活动。
On the face of it, these views seem very different. However many connectionists do not view their work as a challenge to classicism and some overtly support the classical picture. So-called implementational connectionists seek an accommodation between the two paradigms. They hold that the brain’s net implements a symbolic processor. True, the mind is a neural net; but it is also a symbolic processor at a higher and more abstract level of description. So the role for connectionist research according to the implementationalist is to discover how the machinery needed for symbolic processing can be forged from neural network materials, so that classical processing can be reduced to the neural network account.
从表面上看,这些观点似乎大相径庭。然而,许多联结主义者并不认为他们的工作是对古典主义的挑战,有些人公开支持古典主义。所谓的实施连接主义者寻求在这两种范式之间寻求调和。他们认为大脑的网络实现了一个符号处理器。诚然,心智是一个神经网络;但它也是一个更高、更抽象的描述级别的符号处理器。因此,根据实现论者的说法,连接主义研究的作用是发现如何从神经网络材料中锻造出符号处理所需的机制,从而将经典处理简化为神经网络帐户。
However, many connectionists resist the implementational point of view. Such radical connectionists claim that symbolic processing was a bad guess about how the mind works. They complain that classical theory does a poor job of explaining graceful degradation of function, holistic representation of data, spontaneous generalization, appreciation of context, and many other features of human intelligence which are captured in their models. The failure of classical programming to match the flexibility and efficiency of human cognition is by their lights a symptom of the need for a new paradigm in cognitive science. So radical connectionists would eliminate symbolic processing from cognitive science forever.
然而,许多联结主义者抵制实施的观点。这些激进的连接主义者声称,符号处理是对大脑如何运作的糟糕猜测。他们抱怨说,古典理论在解释功能的优雅退化、数据的整体表示、自发泛化、对上下文的欣赏以及他们的模型中捕获的人类智能的许多其他特征方面做得很差。经典编程无法与人类认知的灵活性和效率相匹配,在他们看来,这是认知科学需要新范式的征兆。因此,激进的连接主义者将永远从认知科学中消除符号处理。
The controversy between radical and implementational connectionists is complicated by the invention of what are called hybrid connectionist architectures. Here elements of classical symbolic processing are included in neural nets (Wermter & Sun 2000). For example, Miikkulainen (1993) champions a complex collection of neural net modules that share data coded in activation patterns. Since one of the modules acts as a memory, the system taken as a whole resembles a classical processor with separate mechanisms for storing and operating on digital “words”. Smolensky (1990) is famous for inventing so called tensor product methods for simulating the process of variable binding, where symbolic information is stored at and retrieved from known “locations”. More recently, Eliasmith (2013) has proposed complex and massive architectures that use what are called semantic pointers, which exhibit features of classical variable binding. Once hybrid architectures such as these are on the table, it becomes more difficult to classify a given connectionist model as radical or merely implementational. This opens the interesting prospect that whether symbolic processing is actually present in the human brain may turn out to be a matter of degree.
激进连接主义者和实现连接主义者之间的争论因所谓的混合连接主义架构的发明而变得复杂。这里,经典符号处理的元素包含在神经网络中(Wermter & Sun 2000)。例如,Miikkulainen (1993) 倡导了一个复杂的神经网络模块集合,这些模块共享以激活模式编码的数据。由于其中一个模块充当存储器,因此整个系统类似于经典处理器,具有用于存储和操作数字“字”的独立机制。Smolensky (1990) 以发明所谓的张量积方法而闻名,该方法用于模拟变量绑定过程,其中符号信息存储在已知的“位置”并从已知的“位置”检索。最近,Eliasmith (2013) 提出了复杂而庞大的架构,这些架构使用所谓的语义指针,这些指针表现出经典变量绑定的特征。一旦像这样的混合架构被摆上桌面,就更难将给定的连接主义模型归类为激进的或仅仅是实现的。这开辟了一个有趣的前景,即符号处理是否真的存在于人脑中可能是一个程度的问题。
The disagreement concerning the degree to which human cognition involves symbolic processing is naturally embroiled with the innateness debate—whether higher level abilities such as language and reasoning are part of the human genetic endowment, or whether they are learned. The success of connectionist models at learning tasks starting from randomly chosen weights gives heart to empiricists, who would think that the infant brain is able to construct intelligence from perceptual input using a simple learning mechanism (Elman et al. 1996). On the other hand, nativists in the rationalist tradition argue that at least for grammar-based language, the poverty of perceptual stimulus (Chomsky 1965: 58) entails the existence of a genetically determined mechanism tailored to learning grammar. However, the alignment between connectionism and non-nativism is not so clear-cut. There is no reason that connectionist models cannot be interpreted from a nativist point of view, where the ongoing “learning” represents the process of evolutionary refinement from generation to generation of a species. The idea that the human brain has domain specific knowledge that is genetically determined can be accommodated in the connectionist paradigm by biasing the initial weights of the models to make that knowledge easy or trivial to learn. Connectionist research makes best contact with the innateness debate by providing a new strategy for disarming poverty of stimulus arguments. Nativists argue that association of ideas, the mechanism for learning proposed by the traditional empiricist, is too slender a reed to support the development of higher level cognitive abilities. They suppose that innate mechanisms are essential for learning (for example) a grammar of English from a child’s linguistic input, because the statistical regularities available to “mere association” massively underdetermine that grammar. Connectionism could support an empiricism here by providing a proof-of-concept that such structured knowledge can be learned from inputs available to humans using only learning mechanisms found in non-classical architectures. Of course it is too soon to tell whether this promise can be realized.
关于人类认知在多大程度上涉及符号处理的分歧自然而然地卷入了先天性的争论——语言和推理等更高层次的能力是人类遗传禀赋的一部分,还是后天习得的。连接主义模型在从随机选择的权重开始学习任务方面的成功给了经验主义者信心,他们会认为婴儿大脑能够使用简单的学习机制从感知输入中构建智能(Elman et al. 1996)。另一方面,理性主义传统的本土主义者认为,至少对于基于语法的语言来说,知觉刺激的贫乏(Chomsky 1965:58)意味着存在一种为学习语法量身定制的遗传决定机制。然而,联结主义和非本土主义之间的一致性并不那么明确。没有理由不能从本土主义的角度来解释联结主义模型,其中持续的“学习”代表了一个物种从一代到另一代的进化改进过程。人脑拥有由基因决定的特定领域知识的想法可以通过偏向模型的初始权重来使该知识易于学习或微不足道。联结主义研究通过提供一种新的策略来消除刺激论点的贫困,从而与先天性辩论进行了最佳接触。本土主义者认为,思想的关联,即传统经验主义者提出的学习机制,太细了,无法支持更高层次认知能力的发展。他们假设先天机制对于从儿童的语言输入中学习英语语法至关重要,因为“纯粹联想”可用的统计规律在很大程度上低估了语法的决定性。联结主义可以通过提供概念验证来支持经验主义,即这种结构化知识可以仅使用非经典建筑中的学习机制从人类可用的输入中学习。当然,现在判断这一承诺是否能够实现还为时过早。
6. Connectionist Representation 6. 联结主义代表
Connectionist models provide a new paradigm for understanding how information might be represented in the brain. A seductive but naive idea is that single neurons (or tiny neural bundles) might be devoted to the representation of each thing the brain needs to record. For example, we may imagine that there is a grandmother neuron that fires when we think about our grandmother. However, such local representation is not likely. There is good evidence that our grandmother thought involves complex patterns of activity distributed across relatively large parts of cortex.
连接主义模型为理解信息在大脑中的表示方式提供了一种新的范式。一个诱人但天真的想法是,单个神经元(或微小的神经束)可能专门用于表示大脑需要记录的每件事。例如,我们可以想象有一个祖母神经元,当我们想到我们的祖母时,它会被激发。但是,这种本地代表不太可能。有充分的证据表明,我们的祖母思维涉及分布在皮层相对大部分的复杂活动模式。
It is interesting to note that distributed, rather than local representations on the hidden units are the natural products of connectionist training methods. The activation patterns that appear on the hidden units while NETtalk processes text serve as an example. Analysis reveals that the net learned to represent such categories as consonants and vowels, not by creating one unit active for consonants and another for vowels, but rather in developing two different characteristic patterns of activity across all the hidden units.
有趣的是,隐藏单位上的分布式而不是局部表示是联结主义训练方法的自然产物。NETtalk 处理文本时出现在隐藏单元上的激活模式就是一个例子。分析表明,网络学会了表示辅音和元音等类别,不是通过为辅音创建一个活跃的单元,为元音创建另一个活跃的单元,而是在所有隐藏的单元中发展出两种不同的特征活动模式。
Given the expectations formed from our experience with local representation on the printed page, distributed representation seems both novel and difficult to understand. But the technique exhibits important advantages. For example, distributed representations, (unlike symbols stored in separate fixed memory locations) remain relatively well preserved when parts of the model are destroyed or overloaded. More importantly, since representations are coded in patterns rather than firings of individual units, relationships between representations are coded in the similarities and differences between these patterns. So the internal properties of the representation carry information on what it is about (Clark 1993: 19). In contrast, local representation is conventional. No intrinsic properties of the representation (a unit’s firing) determine its relationships to the other symbols. This self-reporting feature of distributed representations promises to resolve a philosophical conundrum about meaning. In a symbolic representational scheme, all representations are composed out of symbolic atoms (like words in a language). Meanings of complex symbol strings may be defined by the way they are built up out of their constituents, but what fixes the meanings of the atoms?
鉴于我们对印刷页面上的本地表示的经验形成的期望,分布式表示似乎既新颖又难以理解。但该技术显示出重要的优势。例如,当模型的某些部分被销毁或过载时,分布式表示(与存储在单独的固定内存位置的符号不同)仍然保存得相对较好。更重要的是,由于表示是以模式编码的,而不是以单个单元的发射方式编码的,因此表示之间的关系是根据这些模式之间的相似性和差异性编码的。因此,表示的内部属性携带了有关其内容的信息(Clark 1993:19)。相比之下,地方代表是传统的。表示的内在属性 (单位的开火) 没有决定它与其他符号的关系。分布式表示的这种自我报告功能有望解决关于意义的哲学难题。在符号表示方案中,所有表示都由符号原子组成(就像语言中的单词一样)。复杂符号字符串的含义可以通过它们由其组成部分构建的方式来定义,但是是什么固定了原子的含义呢?
Connectionist representational schemes provide an end run around the puzzle by simply dispensing with atoms. Every distributed representation is a pattern of activity across all the units, so there is no principled way to distinguish between simple and complex representations. To be sure, representations are composed out of the activities of the individual units. But none of these “atoms” codes for any symbol. The representations are sub-symbolic in the sense that analysis into their components leaves the symbolic level behind.
连接主义表示方案通过简单地省略原子来提供绕过难题的终点。每个分布式表示都是跨所有单元的活动模式,因此没有原则性的方法来区分简单和复杂的表示。可以肯定的是,表征是由各个单位的活动组成的。但是这些 “atom” 都没有代表任何交易品种。这些表示是次符号的,因为对它们的组成部分的分析将符号层面抛在了后面。
The sub-symbolic nature of distributed representation provides a novel way to conceive of information processing in the brain. If we model the activity of each neuron with a number, then the activity of the whole brain can be given by a giant vector (or list) of numbers, one for each neuron. Both the brain’s input from sensory systems and its output to individual muscle neurons can also be treated as vectors of the same kind. So the brain amounts to a vector processor, and the problem of psychology is transformed into questions about which operations on vectors account for the different aspects of human cognition.
分布式表示的子符号性质为构想大脑中的信息处理提供了一种新颖的方法。如果我们用一个数字来模拟每个神经元的活动,那么整个大脑的活动可以由一个巨大的数字向量(或列表)给出,每个神经元一个。大脑来自感觉系统的输入及其对单个肌肉神经元的输出也可以被视为同类的向量。因此,大脑相当于一个向量处理器,心理学问题被转化为关于向量上的哪些操作解释了人类认知的不同方面的问题。
Sub-symbolic representation has interesting implications for the classical hypothesis that the brain must contain symbolic representations that are similar to sentences of a language. This idea, often referred to as the language of thought (or LOT) thesis may be challenged by the nature of connectionist representations. It is not easy to say exactly what the LOT thesis amounts to, but van Gelder (1990) offers an influential and widely accepted benchmark for determining when the brain should be said to contain sentence-like representations. It is that when a representation is tokened one thereby tokens the constituents of that representation. For example, if I write “John loves Mary” I have thereby written the sentence’s constituents: “John” “loves” and “Mary”. Distributed representations for complex expressions like “John loves Mary” can be constructed that do not contain any explicit representation of their parts (Smolensky 1990). The information about the constituents can be extracted from the representations, but neural network models do not need to explicitly extract this information themselves in order to process it correctly (Chalmers 1990). This suggests that neural network models serve as counterexamples to the idea that the language of thought is a prerequisite for human cognition. However, the matter is still a topic of lively debate (Fodor 1997).
子符号表示对经典假设具有有趣的含义,即大脑必须包含类似于语言句子的符号表示。这个想法,通常被称为思想语言(或 LOT)论文,可能会受到连接主义表征的性质的挑战。要确切地说出 LOT 论文的含义并不容易,但 van Gelder (1990) 提供了一个有影响力且被广泛接受的基准,用于确定何时应该说大脑包含类似句子的表征。当一个表示被标记时,因此标记了该表示的组成部分。例如,如果我写“John loves Mary”,我就写了句子的成分:“John”、“loves”和“Mary”。可以构造复杂表达式(如 “John loves Mary”)的分布式表示,这些表示不包含其部分的任何显式表示 (Smolensky 1990)。有关成分的信息可以从表示中提取,但神经网络模型不需要自己显式提取此信息即可正确处理它 (Chalmers 1990)。这表明神经网络模型是思想语言是人类认知的先决条件这一观点的反例。然而,这个问题仍然是一个激烈争论的话题(Fodor 1997)。
The novelty of distributed and superimposed connectionist information storage naturally causes one to wonder about the viability of classical notions of symbolic computation in describing the brain. Ramsey (1997) argues that though we may attribute symbolic representations to neural nets, those attributions do not figure in legitimate explanations of the model’s behavior. This claim is important because the classical account of cognitive processing, (and folk intuitions) presume that representations play an explanatory role in understanding the mind. It has been widely thought that cognitive science requires, by its very nature, explanations that appeal to representations (Von Eckardt 2003). If Ramsey is right, the point may cut in two different ways. Some may use it to argue for a new and non-classical understanding of the mind, while others would use it to argue that connectionism is inadequate since it cannot explain what it must. However, Haybron (2000) argues against Ramsey that there is ample room for representations with explanatory role in radical connectionist architectures. Roth (2005) makes the interesting point that contrary to first impressions, it may also make perfect sense to explain a net’s behavior by reference to a computer program, even if there is no way to discriminate a sequence of steps of the computation through time.
分布式和叠加连接主义信息存储的新颖性自然使人们怀疑经典的符号计算概念在描述大脑方面的可行性。Ramsey (1997) 认为,尽管我们可以将符号表示归因于神经网络,但这些归因并不能用于对模型行为的合法解释。这个说法很重要,因为认知加工的经典解释(和民间直觉)假设表征在理解心灵方面起着解释作用。人们普遍认为,认知科学就其本质而言,需要吸引表征的解释(Von Eckardt 2003)。如果拉姆齐是对的,那么这一点可能会以两种不同的方式进行切割。有些人可能会用它来论证一种新的、非经典的心灵理解,而另一些人会用它来论证联结主义是不充分的,因为它无法解释它必须解释什么。然而,Haybron (2000) 反对 Ramsey 的观点,即在激进的连接主义建筑中,有足够的空间进行具有解释作用的表示。Roth (2005) 提出了一个有趣的观点,与第一印象相反,通过参考计算机程序来解释网络的行为也可能非常有意义,即使没有办法随时间区分计算的一系列步骤。
The debate concerning the presence of classical representations and a language of thought has been clouded by lack of clarity in defining what should count as the representational “vehicles” in distributed neural models. Shea (2007) makes the point that the individuation of distributed representations should be defined by the way activation patterns on the hidden units cluster together. It is the relationships between clustering regions in the space of possible activation patterns that carry representational content, not the activations themselves, nor the collection of units responsible for the activation. On this understanding, prospects are improved for locating representational content in neural nets that can be compared in nets of different architectures, that is causally involved in processing, and which overcomes some objections to holistic accounts of meaning.
关于经典表征和思维语言存在的争论由于在定义分布式神经模型中什么应该算作表征“载体”缺乏明确性而蒙上了一层阴影。Shea (2007) 指出,分布式表示的个性化应该由隐藏单元上的激活模式聚集在一起的方式来定义。它是承载表征内容的可能的激活模式空间中的聚类区域之间的关系,而不是激活本身,也不是负责激活的单元集合。基于这种理解,在神经网络中定位表征内容的前景得到了改善,这些内容可以在不同架构的网络中进行比较,这些内容因果参与处理,并克服了对意义的整体解释的一些反对意见。
In a series of papers Horgan and Tienson (1989, 1990) have championed a view called representations without rules. According to this view classicists are right to think that human brains (and good connectionist models of them) contain explanatorily robust representations; but they are wrong to think that those representations enter in to hard and fast rules like the steps of a computer program. The idea that connectionist systems may follow graded or approximate regularities (“soft laws” as Horgan and Tienson call them) is intuitive and appealing. However, Aizawa (1994) argues that given an arbitrary neural net with a representation level description, it is always possible to outfit it with hard and fast representation-level rules. Guarini (2001) responds that if we pay attention to notions of rule following that are useful to cognitive modeling, Aizawa’s constructions will seem beside the point.
在一系列论文中,Horgan 和 Tienson (1989, 1990) 倡导了一种称为无规则表示的观点。根据这种观点,古典主义者正确地认为人类大脑(以及它们的良好连接主义模型)包含解释性的稳健表征;但是他们错误地认为这些表示进入了硬性规定,就像计算机程序的步骤一样。连接主义系统可能遵循分级或近似规律(Horgan 和 Tienson 称之为“软定律”)的想法是直观且有吸引力的。然而,Aizawa (1994) 认为,给定一个具有表示级描述的任意神经网络,总是可以为其配备硬性且快速的表示级规则。Guarini (2001) 回应说,如果我们关注对认知建模有用的规则遵循概念,相泽的建构就会显得无关紧要。
7. The Systematicity Debate 7. 系统性辩论
The major points of controversy in the philosophical literature on connectionism have to do with whether connectionists provide a viable and novel paradigm for understanding the mind. One complaint is that connectionist models are only good at processing associations. But such tasks as language and reasoning cannot be accomplished by associative methods alone and so connectionists are unlikely to match the performance of classical models at explaining these higher-level cognitive abilities. However, it is a simple matter to prove that neural networks can do anything that symbolic processors can do, since nets can be constructed that mimic a computer’s circuits. So the objection can not be that connectionist models are unable to account for higher cognition; it is rather that they can do so only if they implement the classicist’s symbolic processing tools. Implementational connectionism may succeed, but radical connectionists will never be able to account for the mind.
关于联结主义的哲学文献中的主要争议点与联结论者是否为理解心灵提供了一个可行的、新颖的范式有关。一个抱怨是联结主义模型只擅长处理关联。但是,语言和推理等任务不能仅通过联想方法完成,因此连接论者在解释这些更高层次的认知能力方面不太可能与经典模型的表现相媲美。然而,证明神经网络可以做符号处理器可以做的任何事情是一件简单的事情,因为可以构建模拟计算机电路的网络。因此,反对意见不可能是联结主义模型无法解释更高的认知;相反,只有当他们实现 Classicist 的符号处理工具时,他们才能做到这一点。实施联结主义可能会成功,但激进的连接论者永远无法解释心智。
Fodor and Pylyshyn’s often cited paper (1988) launches a debate of this kind. They identify a feature of human intelligence called systematicity which they feel connectionists cannot explain. The systematicity of language refers to the fact that the ability to produce/understand/think some sentences is intrinsically connected to the ability to produce/understand/think others of related structure. For example, no one with a command of English who understands “John loves Mary” can fail to understand “Mary loves John.” From the classical point of view, the connection between these two abilities can easily be explained by assuming that masters of English represent the constituents (“John”, “loves” and “Mary”) of “John loves Mary” and compute its meaning from the meanings of these constituents. If this is so, then understanding a novel sentence like “Mary loves John” can be accounted for as another instance of the same symbolic process. In a similar way, symbolic processing would account for the systematicity of reasoning, learning and thought. It would explain why there are no people who are capable of concluding P from P & (Q & R), but incapable of concluding P from P & Q, why there are no people capable of learning to prefer a red cube to green square who cannot learn to prefer a green cube to the red square, and why there isn’t anyone who can think that John loves Mary who can’t also think that Mary loves John.
Fodor 和 Pylyshyn 经常被引用的论文 (1988) 引发了这种辩论。他们确定了人类智能的一个特征,称为系统性,他们认为联结主义者无法解释。语言的系统性是指产生/理解/思考某些句子的能力与产生/理解/思考其他相关结构的能力有着内在的联系。例如,任何懂英语的人如果能理解 “John loves Mary” ,就不可能不理解 “Mary loves John”。从古典的角度来看,这两种能力之间的联系可以很容易地解释为假设英语大师代表“John loves Mary”的成分(“John”、“loves”和“Mary”),并从这些成分的含义中计算其含义。如果是这样,那么理解像 “Mary loves John” 这样的新句子可以被解释为同一象征过程的另一个实例。以类似的方式,符号处理将解释推理、学习和思考的系统性。这可以解释为什么没有人能够从P和(Q & R)得出P,但是不能从P和Q得出P,为什么没有能够学会喜欢红色方块而不是绿色方块的人,他们不能学会喜欢绿色方块而不是红色方块,以及为什么没有人能认为约翰爱玛丽,而他不能同时认为玛丽爱约翰。
Fodor and McLaughlin (1990) argue in detail that connectionists do not account for systematicity. Although connectionist models can be trained to be systematic, they can also be trained, for example, to recognize “John loves Mary” without being able to recognize “Mary loves John.” Since connectionism does not guarantee systematicity, it does not explain why systematicity is found so pervasively in human cognition. Systematicity may exist in connectionist architectures, but where it exists, it is no more than a lucky accident. The classical solution is much better, because in classical models, pervasive systematicity comes for free.
Fodor 和 McLaughlin (1990) 详细论证了联结论者不考虑系统性。尽管连接主义模型可以被训练成系统化的,但它们也可以被训练,例如,识别“John loves Mary”,而无法识别“Mary loves John”。既然联结主义不能保证系统性,它就不能解释为什么系统性在人类认知中如此普遍。系统性可能存在于联结主义架构中,但即使存在,也只不过是一个幸运的意外。经典解决方案要好得多,因为在经典模型中,普遍的系统性是免费的。
The charge that connectionist nets are disadvantaged in explaining systematicity has generated a lot of interest. Chalmers (1993) points out that Fodor and Pylyshyn’s argument proves too much, for it entails that all neural nets, even those that implement a classical architecture, do not exhibit systematicity. Given the uncontroversial conclusion that the brain is a neural net, it would follow that systematicity is impossible in human thought. Another often mentioned point of rebuttal (Aizawa 1997b; Matthews 1997; Hadley 1997b) is that classical architectures do no better at explaining systematicity. There are also classical models that can be programmed to recognize “John loves Mary” without being able to recognize “Mary loves John,” for this depends on exactly which symbolic rules govern the classical processing. The point is that neither the use of connectionist architecture alone nor the use of classical architecture alone enforces a strong enough constraint to explain pervasive systematicity. In both architectures, further assumptions about the nature of the processing must be made to ensure that “Mary loves John” and “John loves Mary” are treated alike.
关于连接主义网络在解释系统性方面处于劣势的指责引起了很多人的兴趣。Chalmers (1993) 指出,Fodor 和 Pylyshyn 的论点证明得太多了,因为它意味着所有的神经网络,即使是那些实现经典架构的神经网络,也不表现出系统性。鉴于大脑是一个神经网络这一无可争议的结论,那么系统性在人类思维中是不可能的。另一个经常被提及的反驳观点(Aizawa 1997b;马修斯 1997 年;Hadley 1997b)认为古典建筑在解释系统性方面也做得并不好。还有一些经典模型可以被编程来识别 “John loves Mary”,而无法识别 “Mary loves John”,因为这完全取决于哪些符号规则支配着经典处理。关键是,无论是单独使用联结主义建筑还是单独使用古典建筑,都不能强制执行足够强的约束来解释普遍的系统性。在这两种体系结构中,必须对处理的性质进行进一步的假设,以确保“Mary loves John”和“John loves Mary”的处理方式相同。
A discussion of this point should mention Fodor and McLaughlin’s requirement that systematicity be explained as a matter of nomic necessity, that is, as a matter of natural law. The complaint against connectionists is that while they may implement systems that exhibit systematicity, they will not have explained it unless it follows from their models as a nomic necessity. However, the demand for nomic necessity is a very strong one, and one that classical architectures clearly cannot meet either. So the only tactic for securing a telling objection to connectionists along these lines would be to weaken the requirement on the explanation of systematicity to one which classical architectures can and connectionists cannot meet. A convincing case of this kind has yet to be made.
对这一点的讨论应该提到 Fodor 和 McLaughlin 的要求,即系统性应该作为一个经济学的必然性问题来解释,即作为一个自然法则的问题。对联结主义者的抱怨是,虽然他们可能会实现表现出系统性的系统,但他们不会解释它,除非它从他们的模型中得出作为经济学的必要性。然而,对经济学必要性的需求非常强烈,而古典架构显然也无法满足。因此,确保按照这些思路对连接论者提出明显反对的唯一策略是将对系统性解释的要求削弱为古典建筑可以而连接论者无法满足的要求。这种令人信服的案例还没有提出。
As the systematicity debate has evolved, attention has been focused on defining the benchmarks that would answer Fodor and Pylyshyn’s challenge. Hadley (1994a, 1994b) distinguishes three brands of systematicity. Connectionists have clearly demonstrated the weakest of these by showing that neural nets can learn to correctly recognize novel sequences of words (e.g., “Mary loves John”) that were not in the training set. However, Hadley claims that a convincing rebuttal must demonstrate strong systematicity, or better, strong semantical systematicity. Strong systematicity would require (at least) that “Mary loves John” be recognized even if “Mary” never appears in the subject position in any sentence in the training set. Strong semantical systematicity would require as well that the net show abilities at correct semantical processing of the novel sentences rather than merely distinguishing grammatical from ungrammatical forms. Niklasson and van Gelder (1994) have claimed success at strong systematicity, though Hadley complains that this is at best a borderline case. Hadley and Hayward (1997) tackle strong semantical systematicity, but by Hadley’s own admission it is not clear that they have avoided the use of a classical architecture. Boden and Niklasson (2000) claim to have constructed a model that meets at least the spirit of strong semantical systematicity, but Hadley (2004) argues that even strong systematicity has not been demonstrated there. Whether one takes a positive or a negative view of these attempts, it is safe to say that no one has met the challenge of providing a neural net capable of learning complex semantical processing that generalizes to a full range of truly novel inputs.
随着系统性辩论的发展,人们的注意力一直集中在定义能够回答 Fodor 和 Pylyshyn 挑战的基准上。Hadley (1994a, 1994b) 区分了系统性的三个品牌。联结论者通过证明神经网络可以学习正确识别不在训练集中的新单词序列(例如,“Mary loves John”)来清楚地证明其中最薄弱的一面。然而,Hadley 声称,令人信服的反驳必须表现出强大的系统性,或者更好的是,强大的语义系统性。强大的系统性要求(至少)识别“Mary loves John”,即使“Mary”从未出现在训练集中任何句子的主语位置。强大的语义系统性还要求 net 显示出对新句子的正确语义处理的能力,而不仅仅是区分语法和非语法形式。Niklasson 和 van Gelder (1994) 声称在强系统性方面取得了成功,尽管 Hadley 抱怨这充其量只是一个边缘案例。Hadley 和 Hayward (1997) 处理了很强的语义系统性,但根据 Hadley 自己承认,他们是否避免使用古典架构并不清楚。Boden 和 Niklasson (2000) 声称已经构建了一个至少满足强语义系统性精神的模型,但 Hadley (2004) 认为,即使是强系统性也没有在那里得到证明。无论人们如何看待这些尝试,都可以肯定地说,没有人迎接过提供能够学习复杂语义处理的神经网络的挑战,该神经网络可以推广到所有真正新颖的输入。
Research on nets that clearly demonstrate strong systematicity has continued. Jansen and Watter (2012) provide a good summary of more recent efforts along these lines, and propose an interesting basis for solving the problem. They use a more complex architecture that combines unsupervised self-organizing maps with features of simple recurrent nets. However, the main innovation is to allow codes for the words being processed to represent sensory-motor features of what the words represent. Once trained, their nets displayed very good accuracy in distinguishing the grammatical features of sentences whose words never even appeared in the training set. This may appear to be cheating since the word codes might surreptitiously represent grammatical categories, or at least they may unfairly facilitate learning those categories. Jansen and Watter note however, that the sensory-motor features of what a word represents are apparent to a child who has just acquired a new word, and so that information is not off-limits in a model of language learning. They make the interesting observation that a solution to the systematicity problem may require including sources of environmental information that have so far been ignored in theories of language learning. This work complicates the systematicity debate, since it opens a new worry about what information resources are legitimate in responding to the challenge. However, this reminds us that architecture alone (whether classical or connectionist) is not going to solve the systematicity problem in any case, so the interesting questions concern what sources of supplemental information are needed to make the learning of grammar possible.
对明确证明强大系统性的网络的研究仍在继续。Jansen 和 Watter (2012) 很好地总结了这些方面的最新努力,并提出了解决问题的有趣基础。它们使用更复杂的架构,将无监督的自组织映射与简单递归网络的功能相结合。然而,主要的创新是允许被处理的单词的代码来表示单词所代表的感觉运动特征。经过训练后,他们的网络在区分句子的语法特征方面表现出非常好的准确性,这些句子的单词甚至从未出现在训练集中。这可能看起来是作弊,因为单词代码可能偷偷地代表语法类别,或者至少它们可能不公平地促进了学习这些类别。然而,Jansen 和 Watter 指出,一个词所代表的感觉运动特征对于刚刚获得一个新词的孩子来说是显而易见的,因此这些信息在语言学习模型中并不是禁区。他们提出了一个有趣的观察,即系统性问题的解决方案可能需要包括迄今为止在语言学习理论中被忽视的环境信息来源。这项工作使系统性辩论复杂化,因为它引发了新的担忧,即哪些信息资源在应对挑战时是合法的。然而,这提醒我们,仅靠建筑(无论是古典的还是联结主义的)在任何情况下都无法解决系统性问题,因此有趣的问题是需要哪些补充信息来源才能使语法学习成为可能。
Kent Johnson (2004) argues that the whole systematicity debate is misguided. Attempts at carefully defining the systematicity of language or thought leaves us with either trivialities or falsehoods. Connectionists surely have explaining to do, but Johnson recommends that it is fruitless to view their burden under the rubric of systematicity. Aizawa (2014) also suggests the debate is no longer germane given the present climate in cognitive science. What is needed instead is the development of neurally plausible connectionist models capable of processing a language with a recursive syntax, which react immediately to the introduction of new items in the lexicon without introducing the features of classical architecture. The “systematicity” debate may have already gone as Johnson advises, for Hadley’s demand for strong semantical systematicity may be thought of as the requirement that connectionists exhibit success in that direction.
Kent Johnson (2004) 认为整个系统性辩论是被误导的。试图仔细定义语言或思想的系统性,给我们留下的要么是琐碎的,要么是虚假的。联结论者当然需要做解释,但约翰逊建议,在系统性的名义下看待他们的负担是徒劳的。Aizawa (2014) 还指出,鉴于认知科学的当前气候,这场辩论不再密切相关。相反,需要的是开发神经上合理的连接主义模型,这些模型能够用递归语法处理语言,这些模型对词典中新项目的引入立即做出反应,而不会引入古典建筑的特征。正如约翰逊所建议的那样,“系统性”的争论可能已经结束了,因为哈德利对强语义系统性的要求可以被认为是连接论者在这个方向上表现出成功的要求。
Recent work (Loula, Baroni, & Lake 2018) sheds new light on the controversy. Here recurrent neural nets were trained to interpret complex commands in a simple language that includes primitives such as “jump”, “walk”, “left”, “right”, “opposite” and “around”. “Opposite” is interpreted as a request to perform a command twice, and “around” to do so four times. So “jump around left” requests a left jump four times. The authors report that their nets showed very accurate generalization at tasks that qualify for demonstrating strong semantic systematicity. The nets correctly parsed commands in the test set containing “jump around right” even though this phrase never appeared in the training set. Nevertheless the net’s failures at more challenging tasks point to limitations in their abilities to generalize in ways that would demonstrate genuine systematicity. The nets exhibited very poor performance when commands in the test set were longer (or even shorter), than those presented in the training set. So they appeared unable to spontaneously compose the meaning of complex expressions from the meanings of their parts. New research is needed to understand the nature of these failures, whether they can be overcome in non-classical architectures, and the extent to which humans would exhibit similar mistakes under analogous circumstances.
最近的工作(Loula, Baroni, & Lake 2018)为争议提供了新的视角。在这里,循环神经网络被训练以一种简单的语言解释复杂的命令,其中包括 “jump”、“walk”、“left”、“right”、“opposite” 和 “around” 等基元。“Opposite” 被解释为执行命令两次的请求,而 “around” 被解释为执行命令四次的请求。因此,“jump around left” 请求左跳四次。作者报告说,他们的网络在符合证明强语义系统性的任务中显示出非常准确的泛化。网络正确解析了测试集中包含 “jump around right” 的命令,即使这个短语从未出现在训练集中。然而,网络在更具挑战性的任务中的失败表明,它们以展示真正系统性的方式进行泛化的能力受到限制。当测试集中的命令比训练集中的命令更长(甚至更短)时,网络的性能非常差。因此,他们似乎无法从其部分的含义中自发地组合出复杂表达的含义。需要新的研究来了解这些失败的性质,它们是否可以在非经典建筑中克服,以及人类在类似情况下会表现出类似错误的程度。
It has been almost thirty years since the systematicity debate first began, with over 3,000 citations to Fodor and Pylyshyn’s original paper. So this brief account is necessarily incomplete. Aizawa (2003) provides an excellent view of the literature, and Calvo and Symons (2014) serves as another more recent resource.
自系统性辩论首次开始以来,已经过去了将近 30 年,Fodor 和 Pylyshyn 的原始论文被引用了 3,000 多次。所以这个简短的叙述必然是不完整的。Aizawa (2003) 提供了很好的文献视图,Calvo 和 Symons (2014) 是另一个较新的资源。
8. Connectionism and Semantic Similarity8. 联结主义和语义相似性
One of the attractions of distributed representations in connectionist models is that they suggest a solution to the problem of providing a theory of how brain states could have meaning. The idea is that the similarities and differences between activation patterns along different dimensions of neural activity record semantical information. So the similarity properties of neural activations provide intrinsic properties that determine meaning. However, when it comes to compositional linguistic representations, Fodor and Lepore (1992: Ch. 6) challenge similarity based accounts, on two fronts. The first problem is that human brains presumably vary significantly in the number of and connections between their neurons. Although it is straightforward to define similarity measures on two nets that contain the same number of units, it is harder to see how this can be done when the basic architectures of two nets differ. The second problem Fodor and Lepore cite is that even if similarity measures for meanings can be successfully crafted, they are inadequate to the task of meeting the desiderata which a theory of meaning must satisfy.
连接主义模型中分布式表示的吸引力之一是,它们为提供大脑状态如何具有意义的理论问题提出了解决方案。这个想法是,神经活动不同维度的激活模式之间的相似性和差异性记录了语义信息。因此,神经激活的相似性特性提供了决定意义的内在特性。然而,当涉及到组合语言表征时,Fodor 和 Lepore (1992: Ch. 6) 在两个方面挑战了基于相似性的解释。第一个问题是,人类大脑的神经元数量和神经元之间的连接可能有很大差异。虽然在包含相同单元数的两个网络上定义相似性度量很简单,但当两个网络的基本架构不同时,很难看出如何做到这一点。Fodor 和 Lepore 引用的第二个问题是,即使可以成功地制定意义的相似性度量,它们也不足以满足意义理论必须满足的任务。
Churchland (1998) shows that the first of these two objections can be met. Citing the work of Laakso and Cottrell (2000) he explains how similarity measures between activation patterns in nets with radically different structures can be defined. Not only that, Laakso and Cottrell show that nets of different structures trained on the same task develop activation patterns which are strongly similar according to the measures they recommend. This offers hope that empirically well defined measures of similarity of concepts and thoughts across different individuals might be forged.
Churchland (1998) 表明,这两个反对意见中的第一个是可以得到满足的。他引用了 Laakso 和 Cottrell (2000) 的工作,解释了如何定义具有完全不同结构的网络中激活模式之间的相似性度量。不仅如此,Laakso 和 Cottrell 还表明,根据他们推荐的措施,在同一任务上训练的不同结构的网络会发展出非常相似的激活模式。这带来了希望,即可能会形成对不同个体之间概念和思想相似性的实证明确定义的衡量标准。
On the other hand, the development of a traditional theory of meaning based on similarity faces severe obstacles (Fodor & Lepore 1999), for such a theory would be required to assign sentences truth conditions based on an analysis of the meaning of their parts, and it is not clear that similarity alone is up to such tasks as fixing denotation in the way a standard theory demands. However, most connectionists who promote similarity based accounts of meaning reject many of the presupposition of standard theories. They hope to craft a working alternative which either rejects or modifies those presuppositions while still being faithful to the data on human linguistic abilities.
另一方面,基于相似性的传统意义理论的发展面临严重的障碍(Fodor & Lepore 1999),因为这样的理论需要根据对其部分含义的分析来分配句子的真值条件,而且并不清楚仅靠相似性是否能完成像按照标准理论要求的方式固定含义这样的任务。然而,大多数提倡基于相似性的意义解释的连接论者拒绝接受标准理论的许多假设。他们希望制定一个可行的替代方案,要么拒绝或修改这些假设,一边仍然忠实于人类语言能力的数据。
Calvo Garzón (2003) complains that there are reasons to think that connectionists must fail. Churchland’s response has no answer to the collateral information challenge. That problem is that the measured similarities between activation patterns for a concept (say: grandmother) in two human brains are guaranteed to be very low because two people’s (collateral) information on their grandmothers (name, appearance, age, character) is going to be very different. If concepts are defined by everything we know, then the measures for activation patterns of our concepts are bound to be far apart. This is a truly deep problem in any theory that hopes to define meaning by functional relationships between brain states. Philosophers of many stripes must struggle with this problem. Given the lack of a successfully worked out theory of concepts in either traditional or connectionist paradigms, it is only fair to leave the question for future research.
Calvo Garzón (2003) 抱怨说,有理由认为联结主义者必须失败。丘奇兰的回应没有回答附带信息的挑战。这个问题是,两个人脑中一个概念(比如:祖母)的激活模式之间的测量相似性保证非常低,因为两个人关于他们祖母的(附带)信息(姓名、外貌、年龄、性格)将非常不同。如果概念是由我们所知道的一切定义的,那么我们概念的激活模式的度量必然会相距甚远。在任何希望通过大脑状态之间的功能关系来定义意义的理论中,这都是一个真正深刻的问题。形形色色的哲学家必须为这个问题而苦苦挣扎。鉴于在传统或联结主义范式中缺乏成功制定的概念理论,将这个问题留给未来的研究是公平的。
9. Connectionism and the Elimination of Folk Psychology9. 联结主义和民间心理学的消除
Another important application of connectionist research to philosophical debate about the mind concerns the status of folk psychology. Folk psychology is the conceptual structure that we spontaneously apply to understanding and predicting human behavior. For example, knowing that John desires a beer and that he believes that there is one in the refrigerator allows us to explain why John just went into the kitchen. Such knowledge depends crucially on our ability to conceive of others as having desires and goals, plans for satisfying them, and beliefs to guide those plans. The idea that people have beliefs, plans and desires is a commonplace of ordinary life; but does it provide a faithful description of what is actually to be found in the brain?
联结主义研究在关于心灵的哲学辩论中的另一个重要应用涉及民间心理学的地位。民俗心理学是我们自发地应用于理解和预测人类行为的概念结构。例如,知道 John 想要一杯啤酒,并且他相信冰箱里有啤酒,这使我们能够解释为什么 John 只是走进厨房。这种知识在很大程度上取决于我们能否将他人想象为有愿望和目标、满足这些目标的计划以及指导这些计划的信念。人们有信仰、计划和愿望的想法在日常生活中是司空见惯的;但它是否忠实地描述了大脑中实际存在的东西呢?
Its defenders will argue that folk psychology is too good to be false (Fodor 1988: Ch. 1). What more can we ask for the truth of a theory than that it provides an indispensable framework for successful negotiations with others? On the other hand, eliminativists will respond that the useful and widespread use of a conceptual scheme does not argue for its truth (Churchland 1989: Ch. 1). Ancient astronomers found the notion of celestial spheres useful (even essential) to the conduct of their discipline, but now we know that there are no celestial spheres. From the eliminativists’ point of view, an allegiance to folk psychology, like allegiance to folk (Aristotelian) physics, stands in the way of scientific progress. A viable psychology may require as radical a revolution in its conceptual foundations as is found in quantum mechanics.
它的捍卫者会争辩说,民间心理学好得不能假(Fodor 1988:第 1 章)。我们还能要求理论的真理,除了它为与他人成功谈判提供不可或缺的框架之外呢?另一方面,排除论者会回应说,概念方案的有用和广泛使用并不能证明其真实性(Churchland 1989:第 1 章)。古代天文学家发现天球的概念对他们的学科的进行有用(甚至是必不可少的),但现在我们知道没有天球。从排除论者的角度来看,对民间心理学的效忠,就像对民间(亚里士多德)物理学的效忠一样,阻碍了科学的进步。一个可行的心理学可能需要像量子力学中那样在其概念基础上进行彻底的革命。
Eliminativists are interested in connectionism because it promises to provide a conceptual foundation that might replace folk psychology. For example Ramsey, Stich, & Garon (1991) have argued that certain feed-forward nets show that simple cognitive tasks can be performed without employing features that could correspond to beliefs, desires and plans. Presuming that such nets are faithful to how the brain works, concepts of folk psychology fare no better than do celestial spheres. Whether connectionist models undermine folk psychology in this way is still controversial. There are two main lines of response to the claim that connectionist models support eliminativist conclusions. One objection is that the models used by Ramsey et al. are feed forward nets, which are too weak to explain some of the most basic features of cognition such as short term memory. Ramsey et al. have not shown that beliefs and desires must be absent in a class of nets adequate for human cognition. A second line of rebuttal challenges the claim that features corresponding to beliefs and desires are necessarily absent even in the feed forward nets at issue (Von Eckardt 2005).
消除论者对联结主义感兴趣,因为它有望提供一个可能取代民间心理学的概念基础。例如,Ramsey, Stich, & Garon (1991) 认为,某些前馈网络表明,无需采用可能与信念、愿望和计划相对应的特征,就可以执行简单的认知任务。假设这样的网络忠实于大脑的运作方式,那么民间心理学的概念并不比天球好。联结主义模型是否以这种方式破坏了民间心理学,仍然存在争议。对于连接主义模型支持排除法结论的说法,有两条主要的回应路线。一个反对意见是 Ramsey 等人使用的模型是前馈网络,它太弱了,无法解释认知的一些最基本特征,例如短期记忆。Ramsey 等人没有证明信念和欲望必须在一类足以进行人类认知的网络中不存在。第二行反驳挑战了这样一种说法,即即使在有争议的前馈网络中,与信念和欲望相对应的特征也必然不存在(Von Eckardt 2005)。
The question is complicated further by disagreements about the nature of folk psychology. Many philosophers treat the beliefs and desires postulated by folk psychology as brain states with symbolic contents. For example, the belief that there is a beer in the refrigerator is thought to be a brain state that contains symbols corresponding to beer and a refrigerator. From this point of view, the fate of folk psychology is strongly tied to the symbolic processing hypothesis. So if connectionists can establish that brain processing is essentially non-symbolic, eliminativist conclusions will follow. On the other hand, some philosophers do not think folk psychology is essentially symbolic, and some would even challenge the idea that folk psychology is to be treated as a theory in the first place. Under this conception, it is much more difficult to forge links between results in connectionist research and the rejection of folk psychology.
由于对民间心理学性质的分歧,这个问题变得更加复杂。许多哲学家将民间心理学假设的信念和欲望视为具有象征内容的大脑状态。例如,相信冰箱里有啤酒被认为是一种大脑状态,其中包含对应于啤酒和冰箱的符号。从这个角度来看,民俗心理学的命运与符号加工假说密切相关。因此,如果连接论者能够证明大脑处理本质上是非符号的,那么排除论的结论就会随之而来。另一方面,一些哲学家并不认为民间心理学本质上是象征性的,有些人甚至会质疑民间心理学首先应该被视为一种理论的观点。在这种概念下,在联结主义研究的结果和对民间心理学的拒绝之间建立联系要困难得多。
10. Predictive Coding Models of Cognition10. 认知的预测编码模型
As connectionist research has matured from its “Golden Age” in the 1980s, the main paradigm has radiated into a number of distinct approaches. Two important trends worth mention are predicative coding and deep learning (which will be covered in the following section). Predictive coding is a well-established information processing tool with a wide range of applications. It is useful, for example, in compressing the size of data sets. Suppose you wish to transmit a picture of a landscape with a blue sky. Since most of the pixels in the top half of your image are roughly the same shade, it is very inefficient to record the color value (say Red: 46 Green: 78 Blue: FF in hexadecimal) over and over again for each pixel in the top half of the image. Since the value of one pixel strongly predicts the value of its neighbor, the efficient thing to do is record at each pixel location, the difference between the predicted value (an average of its neighbors) and the actual value for that pixel. (In the case of representing an even shaded sky, we would only need to record the blue value once, followed by lots of zeros.) This way, major coding resources are only needed to keep track of points in the image (such as edges) where there are large changes, that is points of “surprise” or “unexpected” variation.
随着联结主义研究从 1980 年代的“黄金时代”走向成熟,主要范式已经辐射到许多不同的方法中。值得一提的两个重要趋势是预测编码和深度学习(将在下一节中介绍)。预测编码是一种成熟的信息处理工具,应用范围很广。例如,在压缩数据集的大小时,它很有用。假设您希望传输一张蓝天景观的图片。由于图像上半部分的大多数像素都大致具有相同的阴影,因此一遍又一遍地记录图像上半部分每个像素的颜色值(例如红色:46 绿色:78 蓝色:FF 十六进制)的效率非常低。由于一个像素的值强烈预测了其相邻像素的值,因此高效的做法是在每个像素位置记录预测值(其相邻像素的平均值)与该像素的实际值之间的差异。(在表示均匀阴影天空的情况下,我们只需要记录一次蓝色值,然后记录很多 0。这样,只需要主要编码资源来跟踪图像中存在较大变化的点(例如边缘),即“意外”或“意外”变化点。
It is well known that early visual processing in the brain involves taking differences between nearby values, (for example, to identify visual boundaries). It is only natural then to explore how the brain might take advantage of predictive coding in perception, inference, or even action. (See Clark 2013 for an excellent summary and entry point to the literature.) There is wide variety in the models presented in the predictive coding paradigm, and they tend to be specified at a higher level of generality than are connectionist models so far discussed. Assume we have a neural net with input, hidden and output levels that has been trained on a task (say face recognition) and so presumably has information about faces stored in the weights connecting the hidden level nodes. Three features would classify this net as a predictive coding (PC) model. First, the model will have downward connections from the higher levels that are able to predict the next input for that task. (The prediction might be a representation of a generic face.) Second, the data sent to the higher levels for a given input is not the value recorded at the input nodes, but the difference between the predicted values and the values actually present. (So in the example, the data provided tracks the differences between the face to be recognized and the generic face.) In this way the data being received by the net is already preprocessed for coding efficiency. Third, the model is trained by adjusting the weights in such a way that the error is minimized at the inputs. In other words, the trained net reduces as much as possible the “surprise” registered in the difference between the raw input and its prediction. In so doing it comes to be able to predict the face of the individual to be recognized to eliminate the error. Some advocates of predictive coding models suggest that this scheme provides a unified account of all cognitive phenomena, including perception, reasoning, planning and motor control. By minimizing prediction error in interacting with the environment, the net is forced to develop the conceptual resources to model the causal structure of the external world, and so navigate that world more effectively.
众所周知,大脑中的早期视觉处理涉及获取附近值之间的差异(例如,识别视觉边界)。因此,探索大脑如何在感知、推理甚至行动中利用预测编码是很自然的。(参见 Clark 2013 的文献总结和切入点。预测编码范式中呈现的模型种类繁多,并且它们往往比迄今为止讨论的连接主义模型具有更高的通用性。假设我们有一个神经网络,其中包含输入、隐藏和输出级别,该网络已经针对任务(比如人脸识别)进行了训练,因此可能将有关人脸的信息存储在连接隐藏级别节点的权重中。三个特征将此网络归类为预测编码 (PC) 模型。首先,该模型将具有来自更高级别的向下连接,这些连接能够预测该任务的下一个输入。(预测可能是通用人脸的表示形式。其次,对于给定输入,发送到更高级别的数据不是在输入节点处记录的值,而是预测值与实际存在的值之间的差值。(因此,在此示例中,提供的数据跟踪要识别的人脸与通用人脸之间的差异。通过这种方式,网络接收的数据已经经过预处理以提高编码效率。第三,通过调整权重来训练模型,使输入处的误差最小。换句话说,经过训练的网络尽可能减少原始输入与其预测之间的差异中记录的 “惊喜”。这样做就能够预测要识别的个人面孔以消除错误。一些预测编码模型的倡导者认为,该方案提供了所有认知现象的统一说明,包括感知、推理、规划和运动控制。通过最大限度地减少与环境交互的预测误差,网络被迫开发概念资源来模拟外部世界的因果结构,从而更有效地导航该世界。
The predictive coding (PC) paradigm has attracted a lot of attention. There is ample evidence that PC models capture essential details of visual function in the mammalian brain (Rao & Ballard 1999; Huang & Rao 2011). For example, when trained on typical visual input, PC models spontaneously develop functional areas for edge, orientation and motion detection known to exist in visual cortex. This work also raises the interesting point that the visual architecture may develop in response to the statistics of the scenes being encountered, so that organisms in different environments have visual systems specially tuned to their needs.
预测编码 (PC) 范式引起了很多关注。有充分的证据表明,PC模型捕捉到了哺乳动物大脑中视觉功能的基本细节(Rao & Ballard 1999;Huang & Rao 2011)。例如,当使用典型的视觉输入进行训练时,PC 模型会自发地开发已知存在于视觉皮层中的边缘、方向和运动检测功能区域。这项工作还提出了一个有趣的观点,即视觉架构可能会根据所遇到场景的统计数据而发展,因此不同环境中的生物体具有专门针对其需求进行调整的视觉系统。
It must be admitted that there is still no convincing evidence that the essential features of PC models are directly implemented as anatomical structures in the brain. Although it is conjectured that superficial pyramidal cells may transmit prediction error, and deep pyramidal cells predictions, we do not know that that is how they actually function. On the other hand, PC models do appear more neurally plausible than backpropagation architectures, for there is no need for a separate process of training on an externally provided set of training samples. Instead, predictions replace the role of the training set, so that learning and interacting with the environment are two sides of a unified unsupervised process.
必须承认,仍然没有令人信服的证据表明 PC 模型的基本特征直接作为大脑中的解剖结构实现。虽然有人推测浅层锥体细胞可能会传递预测误差,而深层锥体细胞可能会传递预测,但我们不知道这就是它们的实际运作方式。另一方面,PC 模型在神经上确实比反向传播架构更合理,因为不需要对外部提供的一组训练样本进行单独的训练过程。相反,预测取代了训练集的角色,因此学习和与环境交互是统一的无监督过程的两个方面。
PC models also show promise for explaining higher-level cognitive phenomena. An often-cited example is binocular rivalry. When presented with entirely different images in two eyes, humans report an oscillation between the two images as each in turn comes into “focus”. The PC explanation is that the system succeeds in eliminating error by predicting the scene for one eye, but only to increase the error for the other eye. So the system is unstable, “hunting” from one prediction to the other. Predictive coding also has a natural explanation for why we are unaware of our blind spot, for the lack of input in that area amounts to a report of no error, with the result that one perceives “more of the same”.
PC 模型也显示出解释更高层次认知现象的前景。一个经常被引用的例子是双眼竞争。当两只眼睛看到完全不同的图像时,人类报告说,当每个图像依次成为“焦点”时,两个图像之间会出现振荡。PC 的解释是,系统通过预测一只眼睛的场景成功地消除了误差,但只是增加了另一只眼睛的误差。所以这个系统是不稳定的,从一个预测“狩猎”到另一个预测。预测编码也可以自然地解释为什么我们没有意识到我们的盲点,因为在该领域缺乏输入相当于没有错误的报告,结果是人们感知到“更多相同”。
PC accounts of attention have also been championed. For example, Hohwy (2012) notes that realistic PC models, which must tolerate noisy inputs, need to include parameters that track the desired precision to be used in reporting error. So PC models need to make predictions of the error precision relevant for a given situation. Hohwy explores the idea that mechanisms for optimizing precision expectations map onto those that account for attention, and argues that attentional phenomena such as change blindness can be explained within the PC paradigm.
关注度的 PC 账户也得到了支持。例如,Hohwy (2012) 指出,必须容忍嘈杂输入的现实 PC 模型需要包含跟踪报告错误所需的精度的参数。因此,PC 模型需要对与给定情况相关的误差精度进行预测。Hohwy 探讨了优化精确期望的机制映射到考虑注意力的机制的想法,并认为诸如变化盲度之类的注意力现象可以在 PC 范式中解释。
Predictive coding has interesting implications for themes in the philosophy of cognitive science. By integrating the processes of top-down prediction with bottom-up error detection, the PC account of perception views it as intrinsically theory-laden. Deployment of the conceptual categorization of the world embodied in higher levels of the net is essential to the very process of gathering data about the world. This underscores, as well, tight linkages between belief, imaginative abilities, and perception (Grush 2004). The PC paradigm also tends to support situated or embodied conceptions of cognition, for it views action as a dynamic interaction between the organism’s effects on the environment, its predictions concerning those effects (its plans), and its continual monitoring of error, which provides feedback to help ensure success.
预测编码对认知科学哲学中的主题具有有趣的意义。通过将自上而下的预测过程与自下而上的错误检测相结合,感知的 PC 帐户将其视为本质上充满理论。部署包含在更高层次网络中的世界概念分类对于收集有关世界的数据的过程至关重要。这也强调了信仰、想象力和感知之间的紧密联系(Grush 2004)。PC 范式也倾向于支持认知的情境或具体概念,因为它将行动视为有机体对环境的影响、对这些影响的预测(其计划)以及对错误的持续监控之间的动态互动,从而提供反馈以帮助确保成功。
It is too early to evaluate the importance and scope of PC models in accounting for the various aspects of cognition. Providing a unified theory of brain function in general is, after all, an impossibly high standard. Clark’s target article (2013) provides a useful forum for airing complaints against PC models and some possible responses. One objection that is often heard is that an organism with a PC brain can be expected to curl up in a dark room and die, for this is the best way to minimize error at its sensory inputs. However, that view may take too narrow a view of the sophistication of the predictions available to the organism. If it is to survive at all, its genetic endowment coupled with what it can learn along the way may very well endow it with the expectation that it go out and seek needed resources in the environment. Minimizing error for that prediction of its behavior will get it out of the dark room. However, it remains to be seen whether a theory of biological urges is usefully recast in PC terminology in this way, or whether PC theory is better characterized as only part of the explanation. Another complaint is that the top-down influence on our perception coupled with the constraint that the brain receives error signals rather than raw data would impose an unrealistic divide between a represented world of fantasy and the world as it really is. It is hard to evaluate whether that qualifies as a serious objection. Were PC models actually to provide an account of our phenomenological experience, and characterize the relations between that experience and what we count as real, then skeptical conclusions to be drawn would count as features of the view rather than objections to it. A number of responders to Clark’s target article also worry that PC-models count as overly general. In trying to explain everything they explain nothing. Without sufficient constraints on the architecture, it is too easy to pretend to explain cognitive phenomena by merely redescribing them in a story written in the vocabulary of prediction, comparison, error minimization, and optimized precision. The real proof of the pudding will come with the development of more complex and detailed computer models in the PC framework that are biologically plausible, and able to demonstrate the defining features of cognition.
现在评估 PC 模型在解释认知各个方面的重要性和范围还为时过早。毕竟,提供一个统一的大脑功能理论总体上是一个不可能的高标准。Clark 的目标文章 (2013) 提供了一个有用的论坛,用于表达对 PC 模型的抱怨和一些可能的回应。经常听到的一个反对意见是,一个拥有 PC 大脑的生物体可以预期会蜷缩在黑暗的房间里并死亡,因为这是最大限度地减少其感觉输入错误的最佳方法。然而,这种观点可能对生物体可用的预测的复杂性持过于狭隘的看法。如果它真的要生存,它的基因禀赋加上它在此过程中可以学到的东西,很可能赋予它出去寻找环境中所需资源的期望。最大限度地减少对其行为的预测的误差将使它走出暗室。然而,生物冲动理论是否以这种方式有效地重新塑造为 PC 术语,或者 PC 理论是否仅作为解释的一部分被更好地描述,还有待观察。另一个抱怨是,自上而下对我们感知的影响,加上大脑接收错误信号而不是原始数据的限制,将在所表现的幻想世界和真实世界之间造成不切实际的鸿沟。很难评估这是否属于严重的反对意见。如果 PC 模型真的提供了我们的现象学经验的解释,并描述了这种经验与我们所认为的真实之间的关系,那么要得出的怀疑结论将被视为该观点的特征,而不是对它的反对。克拉克的目标文章的许多回复者还担心 PC 模型算得过于笼统。在试图解释一切时,他们什么也解释不了。如果对架构没有足够的约束,仅仅通过用预测、比较、误差最小化和优化精度的词汇来重新描述认知现象,就很容易假装解释认知现象。布丁的真正证明将来自在 PC 框架中开发更复杂、更详细的计算机模型,这些模型在生物学上是合理的,并且能够展示认知的定义特征。
11. Deep Learning: Connectionism’s New Wave11. 深度学习:联结主义的新浪潮
Whereas connectionism’s ambitions seemed to mature and temper towards the end of its Golden Age from 1980–1995, neural network research has recently returned to the spotlight after a combination of technical achievements made it practical to train networks with many layers of nodes between input and output (Krizhevsky, Sutskever, & Hinton 2012; Goodfellow, Bengio, & Courville 2016). Amazon, Facebook, Google, Microsoft, and Uber have all since made substantial investments in these “deep learning” systems. Their many promising applications include recognition of objects and faces in photographs, natural language translation and text generation, prediction of protein folds, medical diagnosis and treatment, and control of autonomous vehicles. The success of the game-playing program AlphaZero (Silver et al. 2018) has brought intense publicity to deep learning in the popular press. What is especially telling about AlphaZero is that essentially the same algorithm was capable of learning to defeat human world champions and other top-performing artificial systems in three different rule-based games (chess, shogi, and Go) “without human knowledge” of strategy, that is, by using only information about the rules of these games and policies it learned from extensive self-play. Its ability to soundly defeat expert-knowledge-based programs at their forte has been touted as the death knell for the traditional symbolic paradigm in artificial intelligence.
尽管连接主义的雄心壮志似乎在1980-1995年的黄金时代结束时变得成熟和缓和,但在技术成就的结合使得训练具有输入和输出之间多层次节点的网络变得实用之后,神经网络研究最近又回到了人们的视线中,这些网络在输入和输出之间具有多层节点(Krizhevsky, Sutskever, & Hinton 2012;Goodfellow, Bengio, & Courville 2016)。此后,亚马逊、Facebook、谷歌、Microsoft 和 Uber 都对这些“深度学习”系统进行了大量投资。它们的许多有前途的应用包括识别照片中的物体和面部、自然语言翻译和文本生成、蛋白质折叠预测、医疗诊断和治疗以及自动驾驶汽车的控制。游戏程序 AlphaZero(Silver 等人,2018 年)的成功在大众媒体上为深度学习带来了强烈的宣传。AlphaZero 特别能说明问题的是,本质上相同的算法能够在三种不同的基于规则的游戏(国际象棋、将棋和围棋)中“没有人类知识”的策略中学习击败人类世界冠军和其他表现最好的人工系统,也就是说,通过使用有关这些游戏规则和策略的信息它从广泛的自我博弈中学到。它能够彻底击败基于专家知识的程序,这被吹捧为人工智能中传统符号范式的丧钟。
However, the new capabilities of deep learning systems have brought with them new concerns. Deep networks typically learn from vastly more data than their predecessors (AlphaZero learned from over 100 million self-played Go games), and can extract much more subtle, structured patterns. While the analysis of AlphaZero’s unusual approach to strategy has created a mini-revolution in the study of chess and Go (Sadler & Regan 2019), it also raised concerns that the solutions deep networks discover are alien and mysterious. It is natural, therefore, to have second thoughts about depending on deep learning technologies for tasks that must be responsive to human interests and goals.
然而,深度学习系统的新功能带来了新的担忧。深度网络通常从比其前辈更多的数据中学习(AlphaZero 从超过 1 亿个自下围棋游戏中学习),并且可以提取更微妙的结构化模式。虽然对AlphaZero不寻常的策略方法的分析在国际象棋和围棋的研究中引发了一场小型革命(Sadler & Regan,2019年),但它也引发了人们对深度网络发现的解决方案陌生而神秘的担忧。因此,很自然地会重新考虑依赖深度学习技术来完成必须响应人类利益和目标的任务。
The success of deep learning would not have been possible without specialized Graphics Processing Units (GPUs), massively-parallel processors optimized for the computational burden of training large nets. However, the crucial innovations behind deep learning’s successes lie in network architecture. Although the literature describes a bewildering set of variations in deep net design (Schmidhuber 2015), there are some common themes that help define the paradigm.
如果没有专门的图形处理单元 (GPU),深度学习就不可能取得成功,GPU 是针对训练大型网络的计算负担而优化的大规模并行处理器。然而,深度学习成功背后的关键创新在于网络架构。尽管文献描述了深度网络设计中一系列令人眼花缭乱的变化 (Schmidhuber 2015),但有一些共同的主题有助于定义范式。
The most obvious feature is a substantial increase in the number of hidden layers. Whereas Golden Age networks typically had only one or two hidden layers, deep neural nets have anywhere from five to several hundred. It has been proven that additional depth can exponentially increase the representational and computational power of a neural network, compared to a shallower network with the same number of nodes (Bengio & Dellaleau 2011; Montúfar et al. 2014; Raghu et al. 2017). The key is that the patterns detected at a given layer may be used by the subsequent layers to repeatedly create more and more complex discriminations.
最明显的特征是隐藏层的数量大幅增加。黄金时代的网络通常只有一两个隐藏层,而深度神经网络则有五到几百个隐藏层。已经证明,与具有相同节点数量的较浅网络相比,额外的深度可以指数级地增加神经网络的表示和计算能力(Bengio & Dellaleau 2011;Montúfar 等人,2014 年;Raghu 等人,2017 年)。关键是,在给定层检测到的模式可能会被后续层用于重复创建越来越复杂的区分。
The number of layers is not the only feature of deep nets that explain their superior abilities. An emerging consensus is that many tasks that are hard to learn are characterized by the presence of “nuisance parameters”, sources of variation in input signals that are not correlated with decision success. Examples of nuisance parameters in visual categorization tasks include pose, size, and position in the visual field; examples in auditory tasks include tone, pitch, and duration. Successful systems must learn to recognize deeper similarities hiding under this variation to identify objects in images, or words in audio data.
层数并不是解释其卓越能力的深网的唯一特征。一个正在形成的共识是,许多难以学习的任务的特点是存在“令人讨厌的参数”,即与决策成功无关的输入信号变化来源。视觉分类任务中的干扰参数示例包括视野中的姿势、大小和位置;听觉任务中的示例包括 Tone、Pitch 和 Duration。成功的系统必须学会识别隐藏在这种变化下的更深层次的相似性,以识别图像中的对象或音频数据中的单词。
One of the most commonly-deployed deep architectures—deep convolutional networks—leverages a combination of strategies that are well-suited to overcoming nuisance variation. Golden Age nets used the same activation function for all units, and units in a layer were fully connected to units in adjacent layers. However, deep convolutional nets deploy several different activation functions, and connections to units in the next higher layer are restricted to small windows, such as a square tile of an image or a temporal snippet of a sound file.
最常部署的深度架构之一 — 深度卷积网络 — 利用了非常适合克服干扰变化的策略组合。黄金时代的网络对所有单元使用相同的激活函数,并且一个层中的单元与相邻层中的单元完全连接。然而,深度卷积网络部署了几种不同的激活函数,并且与下一个更高层中的单元的连接仅限于小窗口,例如图像的方形图块或声音文件的时间片段。
A toy example of a deep convolutional net trained to recognize objects in images will help illustrate some of the details. The input to such a net consists of a digitized scene with red, green, and blue (RGB) values for the intensity of colors in each pixel. This input layer is fed to a layer of filter units, which are connected only to a small window of input pixels. Filter units detect specific, local features of the image using an operation called convolution. For example, they might find edges by noting where differences in the intensity of nearby pixels are the greatest. Outputs of these units are then passed to rectified linear units (or “ReLU” nodes), which only pass along activations from the filter nodes that exceed a certain threshold. ReLU units send their signals to a pooling layer, which collects data from many ReLU units and only passes along the most-activated features for each location. The result of this sandwich of convolution-ReLU-pooling layers is a “feature map”, which marks all and only the most salient features detected at each location across the whole image. This feature map can then be sent to a whole series of such sandwiches to detect larger and more abstract features. For example, one sandwich might build lines from edges, the next angles from lines, the next shapes from lines and angles, and the next objects from shapes. A final, fully-connected classification layer is then used to assign labels to the objects detected in the most abstract feature map delivered by the penultimate layer.
一个经过训练以识别图像中对象的深度卷积网络的玩具示例将有助于说明一些细节。此类网络的输入由一个数字化场景组成,其中每个像素中的颜色强度为红色、绿色和蓝色 (RGB) 值。此输入层被馈送到一个过滤器单元层,该层仅连接到输入像素的小窗口。Filter units 使用称为 convolution 的操作来检测图像的特定局部特征。例如,他们可能会通过注意附近像素强度差异最大的位置来找到边缘。然后,这些单元的输出被传递到修正的线性单元(或“ReLU”节点),这些单元仅传递来自超过特定阈值的滤波器节点的激活。ReLU 单元将其信号发送到池化层,该池化层从许多 ReLU 单元收集数据,并且只传递每个位置激活最多的特征。这个卷积 ReLU 池化层三明治的结果是一个“特征图”,它标记了整个图像中每个位置检测到的所有且仅最突出的特征。然后,可以将此特征图发送到一系列此类三明治,以检测更大、更抽象的特征。例如,一个三明治可能从边缘构建线条,从线条构建下一个角度,从线条和角度构建下一个形状,以及从形状构建下一个对象。然后使用最终的全连接分类层为倒数第二层提供的最抽象特征图中检测到的对象分配标签。
This division-of-labor is extremely efficient at overcoming nuisance variation, compared to shallow Golden Age networks. Furthermore, limiting the inputs of the filter nodes to a small window significantly lowers the number of weights that must be learned at each level, compared to a fully-connected network. If features usually depend only on local relations (i.e. in the sense that one normally does not need to look at someone’s feet to read their facial expression), then this gain comes at no cost to classification accuracy. Furthermore, pooling the outputs of several different filter nodes helps detect the same feature across small differences in nuisance variables like pose or location. There is special enthusiasm for this kind of neurocomputational division-of-labor in cognitive science, because it was originally inspired by anatomical studies of mammalian neocortex (Hubel & Wiesel 1965; Fukushima 1980). Other sources of empirical evidence have demonstrated the potential of such networks as models for perceptual similarity and object recognition judgments in primates (Khaligh-Razavi & Kriegeskorte 2014; Hong et al. 2016; Kubilius, Bracci, & Beeck 2016; Lake, Zaremba et al. 2015; Yamins & DiCarlo 2016; and Guest & Love 2019 [Other Internet Resources, hereafter OIR]). These points also interface with the innateness controversy discussed in Section 6. For example, Buckner (2018) has recently argued that these activation functions combine to implement a form of cognitive abstraction which addresses problems facing traditional empiricist philosophy of mind, concerning the way that minds can efficiently discover abstract categorical knowledge in specific, idiosyncratic perceptions.
与浅层黄金时代网络相比,这种分工在克服滋扰变化方面非常有效。此外,与完全连接的网络相比,将滤波器节点的输入限制在一个小窗口可以显著降低必须在每个级别学习的权重数量。如果特征通常只取决于局部关系(即,从某种意义上说,人们通常不需要看某人的脚来阅读他们的面部表情),那么这种收益不会影响分类准确性。此外,池化多个不同过滤器节点的输出有助于在令人讨厌的变量(如姿势或位置)的微小差异中检测相同的特征。在认知科学中,人们对这种神经计算分工特别感兴趣,因为它最初受到哺乳动物新皮层解剖学研究的启发(Hubel & Wiesel 1965;福岛 1980 年)。其他经验证据来源已经证明了此类网络作为灵长类动物感知相似性和物体识别判断模型的潜力(Khaligh-Razavi & Kriegeskorte 2014;Hong 等人,2016 年;Kubilius, Bracci, & Beeck 2016;Lake, Zaremba 等人,2015 年;Yamins & DiCarlo 2016;和Guest & Love 2019 [其他互联网资源,以下简称OIR])。这些观点也与第 6 节中讨论的先天性争议相吻合。例如,Buckner (2018) 最近认为,这些激活函数结合起来实现了一种认知抽象形式,它解决了传统经验主义心灵哲学面临的问题,即心灵如何在特定的、特殊的感知中有效地发现抽象的分类知识。
The increase in computational power that comes with deep net architecture brings with it additional dangers. In fact, the representational power of deep networks is so great that they can simply memorize the correct answer for every item in a large, complex data set, even if the “correct” labels were randomly assigned (Zhang et al. 2016 in OIR). The result is poor generalization of the task to be learned—with total failure to properly respond to inputs outside the training set. Effective deep nets thus employ an array of strategies to prevent them from merely memorizing training data, mostly by biasing the network against the learning of fine-grained idiosyncrasies. Popular options include dropout, which randomly deactivates a small number of nodes during training, and weight decay rules, which cause weights to decrease in value if not constantly refreshed by different examples.
深度网络架构带来的计算能力的增加带来了额外的危险。事实上,深度网络的表示能力是如此之大,以至于它们可以简单地记住大型复杂数据集中每个项目的正确答案,即使“正确”的标签是随机分配的(Zhang et al. 2016 in OIR)。结果是要学习的任务的泛化能力很差,完全无法正确响应训练集之外的输入。因此,有效的深度网络采用一系列策略来防止它们仅仅记住训练数据,主要是通过使网络偏向于细粒度特性的学习。常用选项包括 dropout(在训练期间随机停用少量节点)和 weight decay rules(如果不经常刷新不同的示例),则会导致权重值减少。
While these general points may explain why deep convolutional nets tend to succeed on a wide variety of tasks, their complex structure makes it difficult to explain their decisions in specific cases. This concern interfaces with the XAI (explainable AI) movement, which aims to inspire the development of better tools to analyze the decisions of computer algorithms, especially so that AI systems can be certified to meet practical or legal requirements (Explainable Artificial Intelligence (XAI); B. Goodman & Flaxman 2017). Deep Visualization methods are important tools in addressing these goals for deep neural networks. One popular family of methods uses further machine learning to create an artificial image that maximizes the activation of some particular hidden layer unit (Yosinski et al. 2015). The image is intended to give one an impression of the kind of feature that unit detects when it fires. As expected, the images look more complex and more object-like as we ascend the level hierarchy (for examples and software, see http://yosinski.com/deepvis). Without additional processing, however, many of these visualizations appear chimerical and nonsensical, and it is not clear exactly how well this method reveals features that are genuinely important in the network’s processing. Another family of methods attempts to reveal the aspects of input images that are most salient for the nets’ decision-making. Relevance decomposition, for example, determines which nodes, if deactivated, would have had the greatest effect on some particular decision (Montavon, Samek, & Müller 2018). This can generate a “heatmap”, which shows the aspects of the input that were most influential in that decision. Further machine learning has also been used to build systems able to provide brief English phrases describing the features that lead to a net’s decisions (Hendricks et al. 2016 [OIR]; Ehsan et al. 2018). Despite these advances, the methodologies needed for an adequate explanation of a deep network’s behavior remain unclear and would benefit from further philosophical reflection (Lipton 2016 [OIR]; Zednik 2019 [OIR]).
虽然这些一般性观点可以解释为什么深度卷积网络往往在各种任务上取得成功,但它们复杂的结构使得在特定情况下很难解释它们的决策。这种关注与XAI(可解释的AI)运动相辅相成,该运动旨在激发开发更好的工具来分析计算机算法的决策,特别是使AI系统能够被认证以满足实际或法律要求(可解释的人工智能(XAI); B. Goodman和Flaxman 2017)。深度可视化方法是实现深度神经网络这些目标的重要工具。一种流行的方法系列使用进一步的机器学习来创建人工图像,以最大限度地激活某些特定的隐藏层单元(Yosinski 等人,2015 年)。该图像旨在给人一种该装置在开火时检测到的特征类型的印象。正如预期的那样,随着我们上升到级别层次结构,图像看起来更复杂,更像对象(有关示例和软件,请参见 http://yosinski.com/deepvis)。然而,如果没有额外的处理,这些可视化中的许多都会显得虚构和荒谬,并且目前尚不清楚这种方法在多大程度上揭示了在网络处理中真正重要的特征。另一类方法试图揭示对网络决策最突出的输入图像方面。例如,相关性分解决定了哪些节点,如果停用,将对某些特定决策产生最大的影响(Montavon, Samek, & Müller 2018)。这可以生成一个 “热图”,它显示了对该决策影响最大的输入方面。进一步的机器学习也被用于构建系统,能够提供简短的英语短语来描述导致网络决策的特征(Hendricks 等人,2016 [OIR];Ehsan 等人,2018 年)。尽管取得了这些进步,但充分解释深度网络行为所需的方法仍然不清楚,这将受益于进一步的哲学反思(Lipton 2016 [ OIR ];Zednik 2019 [ OIR ]).
The need for explainable deep nets is all the more pressing because of the discovery of so-called “adversarial examples” (Goodfellow et al. 2014; Nguyen, Yosinski, & Clune 2015). These come in at least two forms: “perturbed images” which are natural photographs modified very slightly in a way that causes dramatic changes in classification by deep nets even though the difference is imperceptible to humans, and “rubbish images”, which are purportedly meaningless to humans but are classified with high confidence scores by deep nets. Adversarial examples have led some to conclude that whatever understanding the net has of objects must be radically different than that of humans. Adversarial examples exhibit a number of surprising properties: though constructed from a particular training set, they are highly effective at fooling other nets trained on the same task, even nets with different training sets and different architectures. Furthermore, the search for effective countermeasures has led to frustrating failures. It has also been discovered, however, that perturbation methods can create images which fool humans (Elsayed et al. 2018), and human subjects can predict nets’ preferred labels for rubbish images with high accuracy (Z. Zhou & Firestone 2019). Others have noted that the features nets detect in adversarial examples lead to reliable classifications in naturally-occurring data, challenging the idea that the nets’ decisions should be counted as mistaken (Ilyas et al. 2019 [OIR]). These questions intersect with traditional issues about projectibility and induction, potentially offering new test cases for older philosophical conundrums in epistemology and philosophy of science (N. Goodman 1955; Quine 1969; Harman & Kulkarni 2007).
由于发现了所谓的“对抗性示例”(Goodfellow 等人,2014 年;Nguyen, Yosinski, & Clune 2015)。这些图像至少有两种形式:“扰动图像”,即自然照片,经过非常轻微的修改,即使人类无法察觉,也会导致深网分类发生巨大变化,以及“垃圾图像”,据称对人类没有意义,但被深网以高置信度分数分类。对抗性的例子使一些人得出结论,无论网络对物体的理解如何,都一定与人类的理解截然不同。对抗性示例表现出许多令人惊讶的特性:尽管由特定的训练集构建而成,但它们在欺骗针对同一任务训练的其他网络方面非常有效,即使是具有不同训练集和不同架构的网络。此外,寻找有效的对策也导致了令人沮丧的失败。然而,人们还发现,扰动方法可以创建欺骗人类的图像(Elsayed et al. 2018),人类受试者可以高精度地预测网络对垃圾图像的首选标签(Z. 周 & Firestone 2019)。其他人指出,网络在对抗性示例中检测到的特征导致在自然发生的数据中进行可靠的分类,这挑战了网络的决定应该被视为错误的想法(Ilyas 等人,2019 年 [ OIR ])。这些问题与关于可投射性和归纳的传统问题相交,可能为认识论和科学哲学中较旧的哲学难题提供新的测试案例(N. Goodman 1955;奎因 1969;Harman & Kulkarni 2007)。
Although deep learning has received an enormous amount of attention in computer science and from the popular press, there is surprisingly little published about it directly among philosophers (though this is beginning to change—Buckner 2018, 2019 [OIR]; Miracchi 2019; Shevlin & Halina 2019; and Zednik 2019 [OIR]). However, there are rich opportunities for philosophical research on deep learning. Examples of some relevant questions include:
尽管深度学习在计算机科学和大众媒体中受到了极大的关注,但令人惊讶的是,在哲学家中直接发表关于它的文章很少(尽管这种情况正在开始改变——Buckner 2018、2019 [OIR];奇迹 2019;Shevlin & Halina 2019;和 Zednik 2019 [OIR ])。然而,深度学习的哲学研究存在丰富的机会。一些相关问题的示例包括:
- What kinds of explanation or justification are needed to satisfy our worries about the reliability of deep neural networks in practical applications? What results in deep net research would be needed to assure us that the relevant explanations or justifications are at hand?
需要什么样的解释或理由来满足我们对深度神经网络在实际应用中可靠性的担忧?深度网络研究需要什么结果来确保我们手头有相关的解释或理由?
- Can deep nets serve as explanatory models of biological cognition in cognitive neuroscience? If so, what kind of scientific explanations do they provide? Are they mechanistic, functional, or non-causal in nature?
深网可以作为认知神经科学中生物认知的解释模型吗?如果是这样,他们提供了什么样的科学解释?它们本质上是机械的、函数的还是非因果的?
- What are the prospects for new breakthroughs in deep net natural language processing, and what would it take for these to throw new light on the systematicity controversy?
深度网络自然语言处理的新突破前景如何,这些突破需要什么才能为系统性争议带来新的视角?
- Does deep learning research change the terms of the conflict between radical connectionists and those who claim that symbolic processing models are required to explain higher level cognitive functioning?
深度学习研究是否改变了激进的连接主义者和那些声称需要符号处理模型来解释更高层次认知功能的人之间的冲突?
- Do deep nets like AlphaZero vindicate classical empiricism about higher reasoning? Or must they ultimately replicate more human biases and domain-specific knowledge to reason in the way that humans do?
像 AlphaZero 这样的深网是否证明了关于更高推理的经典经验主义?或者他们最终必须复制更多的人类偏见和特定领域的知识,以人类的方式进行推理?
Bibliography
- Aizawa, Kenneth, 1994, “Representations without Rules, Connectionism and the Syntactic Argument”, Synthese, 101(3): 465–492. doi:10.1007/BF01063898
- –––, 1997a, “Exhibiting versus Explaining Systematicity: A Reply to Hadley and Hayward”, Minds and Machines, 7(1): 39–55. doi:10.1023/A:1008203312152
- –––, 1997b, “Explaining Systematicity”, Mind & Language, 12(2): 115–136. doi:10.1111/j.1468-0017.1997.tb00065.x
- –––, 2003, The Systematicity Arguments, Dordrecht: Kluwer.
- –––, 2014, “A Tough Time to be Talking Systematicity”, in Calvo and Symons 2014: 77–101.
- Bechtel, William, 1987, “Connectionism and the Philosophy of Mind: An Overview”, The Southern Journal of Philosophy, 26(S1): 17–41. doi:10.1111/j.2041-6962.1988.tb00461.x
- –––, 1988, “Connectionism and Rules and Representation Systems: Are They Compatible?”, Philosophical Psychology, 1(1): 5–16. doi:10.1080/09515088808572922
- Bechtel, William and Adele Abrahamsen, 1990, Connectionism and the Mind: An Introduction to Parallel Processing in Networks, Cambridge, MA: Blackwell.
- Bengio, Yoshua and Olivier Delalleau, 2011, “On the Expressive Power of Deep Architectures”, in International Conference on Algorithmic Learning Theory (ALT 2011), Jyrki Kivinen, Csaba Szepesvári, Esko Ukkonen, and Thomas Zeugmann (eds.) (Lecture Notes in Computer Science 6925), Berlin, Heidelberg: Springer Berlin Heidelberg, 18–36. doi:10.1007/978-3-642-24412-4_3
- Bengio, Yoshua, Thomas Mesnard, Asja Fischer, Saizheng Zhang, and Yuhuai Wu, 2017, “STDP-Compatible Approximation of Backpropagation in an Energy-Based Model”, Neural Computation, 29(3): 555–577. doi:10.1162/NECO_a_00934
- Bodén, Mikael and Lars Niklasson, 2000, “Semantic Systematicity and Context in Connectionist Networks”, Connection Science, 12(2): 111–142. doi:10.1080/09540090050129754
- Buckner, Cameron, 2018, “Empiricism without Magic: Transformational Abstraction in Deep Convolutional Neural Networks”, Synthese, 195(12): 5339–5372. doi:10.1007/s11229-018-01949-1
- Butler, Keith, 1991, “Towards a Connectionist Cognitive Architecture”, Mind & Language, 6(3): 252–272. doi:10.1111/j.1468-0017.1991.tb00191.x
- Calvo Garzón, Francisco, 2003, “Connectionist Semantics and the Collateral Information Challenge”, Mind & Language, 18(1): 77–94. doi:10.1111/1468-0017.00215
- Calvo, Paco and John Symons, 2014, The Architecture of Cognition: Rethinking Fodor and Pylyshyn’s Systematicity Challenge, Cambridge: MIT Press.
- Chalmers, David J., 1990, “Syntactic Transformations on Distributed Representations”, Connection Science, 2(1–2): 53–62. doi:10.1080/09540099008915662
- –––, 1993, “Connectionism and Compositionality: Why Fodor and Pylyshyn Were Wrong”, Philosophical Psychology, 6(3): 305–319. doi:10.1080/09515089308573094
- Chomsky, Noam, 1965, Aspects of the Theory of Syntax, Cambridge, MA: MIT Press.
- Christiansen, Morten H. and Nick Chater, 1994, “Generalization and Connectionist Language Learning”, Mind & Language, 9(3): 273–287. doi:10.1111/j.1468-0017.1994.tb00226.x
- –––, 1999a, “Toward a Connectionist Model of Recursion in Human Linguistic Performance”, Cognitive Science, 23(2): 157–205. doi:10.1207/s15516709cog2302_2
- –––, 1999b, “Connectionist Natural Language Processing: The State of the Art”, Cognitive Science, 23(4): 417–437. doi:10.1207/s15516709cog2304_2
- Churchland, Paul M., 1989, A Neurocomputational Perspective: The Nature of Mind and the Structure of Science, Cambridge, MA: MIT Press.
- –––, 1995, The Engine of Reason, the Seat of the Soul: A Philosophical Journey into the Brain, Cambridge, MA: MIT Press.
- –––, 1998, “Conceptual Similarity Across Sensory and Neural Diversity: The Fodor/Lepore Challenge Answered”, Journal of Philosophy, 95(1): 5–32. doi:10.5840/jphil19989514
- Clark, Andy, 1989, Microcognition: Philosophy, Cognitive Science, and Parallel Distributed Processing, (Explorations in Cognitive Science), Cambridge, MA: MIT Press.
- –––, 1990 [1995], “Connectionist Minds”, Proceedings of the Aristotelian Society, 90: 83–102. Reprinted in MacDonald and MacDonald 1995: 339–356. doi:10.1093/aristotelian/90.1.83
- –––, 1993, Associative Engines: Connectionism, Concepts, and Representational Change, Cambridge, MA: MIT Press.
- –––, 2013, “Whatever next? Predictive Brains, Situated Agents, and the Future of Cognitive Science”, Behavioral and Brain Sciences, 36(3): 181–204. doi:10.1017/S0140525X12000477
- Clark, Andy and Rudi Lutz (eds.), 1992, Connectionism in Context, London: Springer London. doi:10.1007/978-1-4471-1923-4
- Cotrell G.W. and S.L. Small, 1983, “A Connectionist Scheme for Modeling Word Sense Disambiguation”, Cognition and Brain Theory, 6(1): 89–120.
- Cummins, Robert, 1991, “The Role of Representation in Connectionist Explanations of Cognitive Capacities”, in Ramsey, Stich, and Rumelhart 1991: 91–114.
- –––, 1996, “Systematicity”:, Journal of Philosophy, 93(12): 591–614. doi:10.2307/2941118
- Cummins, Robert and Georg Schwarz, 1991, “Connectionism, Computation, and Cognition”, in Horgan and Tienson 1991: 60–73. doi:10.1007/978-94-011-3524-5_3
- Davies, Martin, 1989, “Connectionism, Modularity, and Tacit Knowledge”, The British Journal for the Philosophy of Science, 40(4): 541–555. doi:10.1093/bjps/40.4.541
- –––, 1991, “Concepts, Connectionism and the Language of Thought”, in Ramsey, Stich, and Rumelhart 1991: 229–257.
- Dinsmore, John (ed.), 1992, The Symbolic and Connectionist Paradigms: Closing the Gap, Hillsdale, NJ: Erlbaum.
- Ehsan, Upol, Brent Harrison, Larry Chan, and Mark O. Riedl, 2018, “Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations”, in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’18), New Orleans, LA: ACM Press, 81–87. doi:10.1145/3278721.3278736
- Eliasmith, Chris, 2007, “How to Build a Brain: From Function to Implementation”, Synthese, 159(3): 373–388. doi:10.1007/s11229-007-9235-0
- –––, 2013, How to Build a Brain: a Neural Architecture for Biological Cognition, New York: Oxford University Press.
- Elman, Jeffrey L., 1991, “Distributed Representations, Simple Recurrent Networks, and Grammatical Structure”, in Touretzky 1991: 91–122. doi:10.1007/978-1-4615-4008-3_5
- Elman, Jeffrey, Elizabeth Bates, Mark H. Johnson, Annette Karmiloff-Smith,Domenico Parisi, and Kim Plunkett, 1996, Rethinking Innateness: A Connectionist Perspective on Development, Cambridge, MA: MIT Press.
- Elsayed, Gamaleldin F., Shreya Shankar, Brian Cheung, Nicolas Papernot, Alexey Kurakin, Ian Goodfellow, and Jascha Sohl-Dickstein, 2018, “Adversarial Examples That Fool Both Computer Vision and Time-Limited Humans”, in Proceedings of the 32Nd International Conference on Neural Information Processing Systems, (NIPS’18), 31: 3914–3924.
- Fodor, Jerry A., 1988, Psychosemantics: The Problem of Meaning in the Philosophy of Mind, Cambridge, MA: MIT Press.
- –––, 1997, “Connectionism and the Problem of Systematicity (Continued): Why Smolensky’s Solution Still Doesn’t Work”, Cognition, 62(1): 109–119. doi:10.1016/S0010-0277(96)00780-9
- Fodor, Jerry and Ernest Lepore, 1992, Holism: A Shopper’s Guide, Cambridge: Blackwell.
- Fodor, Jerry and Ernie Lepore, 1999, “All at Sea in Semantic Space: Churchland on Meaning Similarity”, Journal of Philosophy, 96(8): 381–403. doi:10.5840/jphil199996818
- Fodor, Jerry and Brian P. McLaughlin, 1990, “Connectionism and the Problem of Systematicity: Why Smolensky’s Solution Doesn’t Work”, Cognition, 35(2): 183–204. doi:10.1016/0010-0277(90)90014-B
- Fodor, Jerry A. and Zenon W. Pylyshyn, 1988, “Connectionism and Cognitive Architecture: A Critical Analysis”, Cognition, 28(1–2): 3–71. doi:10.1016/0010-0277(88)90031-5
- Friston, Karl, 2005, “A Theory of Cortical Responses”, Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456): 815–836. doi:10.1098/rstb.2005.1622
- Friston, Karl J. and Klaas E. Stephan, 2007, “Free-Energy and the Brain”, Synthese, 159(3): 417–458. doi:10.1007/s11229-007-9237-y
- Fukushima, Kunihiko, 1980, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position”, Biological Cybernetics, 36(4): 193–202. doi:10.1007/BF00344251
- Garfield, Jay L., 1997, “Mentalese Not Spoken Here: Computation, Cognition and Causation”, Philosophical Psychology, 10(4): 413–435. doi:10.1080/09515089708573231
- Garson, James W., 1991, “What Connectionists Cannot Do: The Threat to Classical AI”, in Horgan and Tienson 1991: 113–142. doi:10.1007/978-94-011-3524-5_6
- –––, 1994, “Cognition without Classical Architecture”, Synthese, 100(2): 291–305. doi:10.1007/BF01063812
- –––, 1997, “Syntax in a Dynamic Brain”, Synthese, 110(3): 343–355.
- Goodfellow, Ian, Yoshua Bengio, and Aaron Courville, 2016, Deep Learning, Cambridge, MA: MIT Press.
- Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy, 2015, “Explaining and Harnessing Adversarial Examples.”, in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7–9, 2015, available online.
- Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, 2014, “Generative Adversarial Nets”, in Proceedings of the 27th International Conference on Neural Information Processing Systems, (NIPS’14), Cambridge, MA: MIT Press, 2: 2672–2680.
- Goodman, Bryce and Seth Flaxman, 2017, “European Union Regulations on Algorithmic Decision-Making and a ‘Right to Explanation’”, AI Magazine, 38(3): 50–57. doi:10.1609/aimag.v38i3.2741
- Goodman, Nelson, 1955, Fact, Fiction, and Forecast, Cambridge, MA: Harvard University Press.
- Grush, Rick, 2004, “The Emulation Theory of Representation: Motor Control, Imagery, and Perception”, Behavioral and Brain Sciences, 27(3): 377–396. doi:10.1017/S0140525X04000093
- Guarini, Marcello, 2001, “A Defence of Connectionism Against the ‘Syntactic’ Argument”, Synthese, 128(3): 287–317. doi:10.1023/A:1011905917986
- Hadley, Robert F., 1994a, “Systematicity in Connectionist Language Learning”, Mind & Language, 9(3): 247–272. doi:10.1111/j.1468-0017.1994.tb00225.x
- –––, 1994b, “Systematicity Revisited: Reply to Christiansen and Chater and Niklasson and van Gelder”, Mind & Language, 9(4): 431–444. doi:10.1111/j.1468-0017.1994.tb00317.x
- –––, 1997a, “Explaining Systematicity: A Reply to Kenneth Aizawa”, Minds and Machines, 7(4): 571–579. doi:10.1023/A:1008252322227
- –––, 1997b, “Cognition, Systematicity and Nomic Necessity”, Mind & Language, 12(2): 137–153. doi:10.1111/j.1468-0017.1997.tb00066.x
- –––, 2004, “On The Proper Treatment of Semantic Systematicity”, Minds and Machines, 14(2): 145–172. doi:10.1023/B:MIND.0000021693.67203.46
- Hadley, Robert F. and Michael B. Hayward, 1997, “Strong Semantic Systematicity from Hebbian Connectionist Learning”, Minds and Machines, 7(1): 1–37. doi:10.1023/A:1008252408222
- Hanson, Stephen J. and Judy Kegl, 1987, “PARSNIP: A Connectionist Network that Learns Natural Language Grammar from Exposure to Natural Language Sentences”, Ninth Annual Conference of the Cognitive Science Society, Hillsdale, NJ: Erlbaum, pp. 106–119.
- Harman, Gilbert and Sanjeev Kulkarni, 2007, Reliable Reasoning: Induction and Statistical Learning Theory, Cambridge MA: MIT Press.
- Hatfield, Gary, 1991a, “Representation in Perception and Cognition: Connectionist Affordances”, in Ramsey, Stich, and Rumelhart 1991: 163–195.
- –––, 1991b, “Representation and Rule-Instantiation in Connectionist Systems”, in Horgan and Tienson 1991: 90–112. doi:10.1007/978-94-011-3524-5_5
- Hawthorne, John, 1989, “On the Compatibility of Connectionist and Classical Models”, Philosophical Psychology, 2(1): 5–15. doi:10.1080/09515088908572956
- Haybron, Daniel M., 2000, “The Causal and Explanatory Role of Information Stored in Connectionist Networks”, Minds and Machines, 10(3): 361–380. doi:10.1023/A:1026545231550
- Hinton, Geoffrey E., 1990 [1991], “Mapping Part-Whole Hierarchies into Connectionist Networks”, Artificial Intelligence, 46(1–2): 47–75. Reprinted in Hinton 1991: 47–76. doi:10.1016/0004-3702(90)90004-J
- ––– (ed.), 1991, Connectionist Symbol Processing, Cambridge, MA: MIT Press.
- –––, 1992, “How Neural Networks Learn from Experience”, Scientific American, 267(3): 145–151.
- –––, 2010, “Learning to Represent Visual Input”, Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537): 177–184. doi:10.1098/rstb.2009.0200
- Hinton, Geoffrey E., James L. McClelland, and David E. Rumelhart, 1986, “Distributed Representations”, Rumelhart, McClelland, and the PDP group 1986: chapter 3.
- Hohwy, Jakob, 2012, “Attention and Conscious Perception in the Hypothesis Testing Brain”, Frontiers in Psychology, 3(96): 1–14. doi:10.3389/fpsyg.2012.00096
- Hong, Ha, Daniel L K Yamins, Najib J Majaj, and James J DiCarlo, 2016, “Explicit Information for Category-Orthogonal Object Properties Increases along the Ventral Stream”, Nature Neuroscience, 19(4): 613–622. doi:10.1038/nn.4247
- Horgan, Terence E. and John Tienson, 1989, “Representations without Rules”, Philosophical Topics, 17(1): 147–174.
- –––, 1990, “Soft Laws”, Midwest Studies In Philosophy, 15: 256–279. doi:10.1111/j.1475-4975.1990.tb00217.x
- ––– (eds.), 1991, Connectionism and the Philosophy of Mind, Dordrecht: Kluwer. doi:10.1007/978-94-011-3524-5
- –––, 1996, Connectionism and the Philosophy of Psychology, Cambridge, MA: MIT Press.
- Hosoya, Toshihiko, Stephen A. Baccus, and Markus Meister, 2005, “Dynamic Predictive Coding by the Retina”, Nature, 436(7047): 71–77. doi:10.1038/nature03689
- Huang, Yanping and Rajesh P. N. Rao, 2011, “Predictive Coding”, Wiley Interdisciplinary Reviews: Cognitive Science, 2(5): 580–593. doi:10.1002/wcs.142
- Hubel, David H. and Torsten N. Wiesel, 1965, “Receptive Fields and Functional Architecture in Two Nonstriate Visual Areas (18 and 19) of the Cat”, Journal of Neurophysiology, 28(2): 229–289. doi:10.1152/jn.1965.28.2.229
- Jansen, Peter A. and Scott Watter, 2012, “Strong Systematicity through Sensorimotor Conceptual Grounding: An Unsupervised, Developmental Approach to Connectionist Sentence Processing”, Connection Science, 24(1): 25–55. doi:10.1080/09540091.2012.664121
- Johnson, Kent, 2004, “On the Systematicity of Language and Thought”:, Journal of Philosophy, 101(3): 111–139. doi:10.5840/jphil2004101321
- Jones, Matt and Bradley C. Love, 2011, “Bayesian Fundamentalism or Enlightenment? On the Explanatory Status and Theoretical Contributions of Bayesian Models of Cognition”, Behavioral and Brain Sciences, 34(4): 169–188. doi:10.1017/S0140525X10003134
- Khaligh-Razavi, Seyed-Mahdi and Nikolaus Kriegeskorte, 2014, “Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation”, PLoS Computational Biology, 10(11): e1003915. doi:10.1371/journal.pcbi.1003915
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, 2012, “Imagenet Classification with Deep Convolutional Neural Networks”, Advances in Neural Information Processing Systems, 25: 1097–1105.
- Kubilius, Jonas, Stefania Bracci, and Hans P. Op de Beeck, 2016, “Deep Neural Networks as a Computational Model for Human Shape Sensitivity”, PLOS Computational Biology, 12(4): e1004896. doi:10.1371/journal.pcbi.1004896
- Laakso, Aarre and Garrison Cottrell, 2000, “Content and Cluster Analysis: Assessing Representational Similarity in Neural Systems”, Philosophical Psychology, 13(1): 47–76. doi:10.1080/09515080050002726
- Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum, 2015, “Human-Level Concept Learning through Probabilistic Program Induction”, Science, 350(6266): 1332–1338. doi:10.1126/science.aab3050
- Lake, Brenden M., Wojciech Zaremba, Rob Fergus, and Todd M. Gureckis, 2015, “Deep Neural Networks Predict Category Typicality Ratings for Images”, Proceedings of the 37th Annual Cognitive Science Society, Pasadena, CA, 22–25 July 2015, available online.
- Lillicrap, Timothy P., Daniel Cownden, Douglas B. Tweed, and Colin J. Akerman, 2016, “Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning”, Nature Communications, 7(1): 13276. doi:10.1038/ncomms13276
- Loula, João, Marco Baroni, and Brenden Lake, 2018, “Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks”, in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium: Association for Computational Linguistics, 108–114. doi:10.18653/v1/W18-5413
- MacDonald, Cynthia and Graham MacDonald (eds), 1995, Connectionism, (Debates on Psychological Explanation, 2), Oxford: Blackwell.
- Matthews, Robert J., 1997, “Can Connectionists Explain Systematicity?”, Mind & Language, 12(2): 154–177. doi:10.1111/j.1468-0017.1997.tb00067.x
- Marcus, Gary F., 1998, “Rethinking Eliminative Connectionism”, Cognitive Psychology, 37(3): 243–282. doi:10.1006/cogp.1998.0694
- –––, 2001, The Algebraic Mind: Integrating Connectionism and Cognitive Science, Cambridge, MA: MIT Press.
- McClelland, James L and Jeffrey L Elman, 1986, “The TRACE Model of Speech Perception”, Cognitive Psychology, 18(1): 1–86. doi:10.1016/0010-0285(86)90015-0
- McClelland, James L., David E. Rumelhart, and the PDP Research Group (ed.), 1986, Parallel Distributed Processing, Volume II: Explorations in the Microstructure of Cognition: Psychological and Biological Models, Cambridge, MA: MIT Press.
- McLaughlin, Brian P., 1993, “The Connectionism/Classicism Battle to Win Souls”, Philosophical Studies, 71(2): 163–190. doi:10.1007/BF00989855
- Miikkulainen, Risto, 1993, Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory, Cambridge, MA: MIT Press.
- Miikkulainen, Risto and Michael G. Dyer, 1991, “Natural Language Processing With Modular Pdp Networks and Distributed Lexicon”, Cognitive Science, 15(3): 343–399. doi:10.1207/s15516709cog1503_2
- Miracchi, Lisa, 2019, “A Competence Framework for Artificial Intelligence Research”, Philosophical Psychology, 32(5): 588–633. doi:10.1080/09515089.2019.1607692
- Montavon, Grégoire, Wojciech Samek, and Klaus-Robert Müller, 2018, “Methods for Interpreting and Understanding Deep Neural Networks”, Digital Signal Processing, 73: 1–15. doi:10.1016/j.dsp.2017.10.011
- Montúfar, Guido, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio, 2014, “On the Number of Linear Regions of Deep Neural Networks”, in Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), Cambridge, MA: MIT Press, 2: 2924–2932.
- Morris, William C., Garrison W. Cottrell, and Jeffrey Elman, 2000, “A Connectionist Simulation of the Empirical Acquisition of Grammatical Relations”, in Wermter and Sun 2000: 1778:175–193. doi:10.1007/10719871_12
- Nguyen, Anh, Jason Yosinski, Jeff Clune, 2015, “Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images”, Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 427–436, available online.
- Niklasson, Lars F. and Tim van Gelder, 1994, “On Being Systematically Connectionist”, Mind & Language, 9(3): 288–302. doi:10.1111/j.1468-0017.1994.tb00227.x
- O’Reilly, Randall C., 1996, “Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm”, Neural Computation, 8(5): 895–938. doi:10.1162/neco.1996.8.5.895
- Phillips, Steven, 2002, “Does Classicism Explain Universality?”, Minds and Machines, 12(3): 423–434. doi:10.1023/A:1016160512967
- Pinker, Steven and Jacques Mehler (eds.), 1988, Connections and Symbols, Cambridge, MA: MIT Press.
- Pinker, Steven and Alan Prince, 1988, “On Language and Connectionism: Analysis of a Parallel Distributed Processing Model of Language Acquisition”, Cognition, 28(1–2): 73–193. doi:10.1016/0010-0277(88)90032-7
- Pollack, Jordan B., 1989, “Implications of Recursive Distributed Representations”, in Touretzky 1989: 527–535, available online.
- –––, 1991, “Induction of Dynamical Recognizers”, in Touretzky 1991: 123–148. doi:10.1007/978-1-4615-4008-3_6
- Pollack, Jordan B., 1990 [1991], “Recursive Distributed Representations”, Artificial Intelligence, 46(1–2): 77–105. Reprinted in Hinton 1991: 77–106. doi:10.1016/0004-3702(90)90005-K
- Port, Robert F., 1990, “Representation and Recognition of Temporal Patterns”, Connection Science, 2(1–2): 151–176. doi:10.1080/09540099008915667
- Port, Robert F. and Timothy van Gelder, 1991, “Representing Aspects of Language”, Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society, Hillsdale, N.J.: Erlbaum, 487–492, available online.
- Quine, W. V., 1969, “Natural Kinds”, in Essays in Honor of Carl G. Hempel, Nicholas Rescher (ed.), Dordrecht: Springer Netherlands, 5–23. doi:10.1007/978-94-017-1466-2_2
- Raghu, Maithra, Ben Poole, Jon Kleinberg, Surya Ganguli, and Jascha Sohl-Dickstein, 2017, “On the Expressive Power of Deep Neural Networks”, in Proceedings of the 34th International Conference on Machine Learning, 70: 2847–2854, available online.
- Ramsey, William, 1997, “Do Connectionist Representations Earn Their Explanatory Keep?”, Mind & Language, 12(1): 34–66. doi:10.1111/j.1468-0017.1997.tb00061.x
- Ramsey, William, Stephen P. Stich, and Joseph Garon, 1991, “Connectionism, Eliminativism, and the Future of Folk Psychology”, in Ramsey, Stich, and Rumelhart 1991: 199–228.
- Ramsey, William, Stephen P. Stich, and David E. Rumelhart, 1991, Philosophy and Connectionist Theory, Hillsdale, N.J.: Erlbaum.
- Rao, Rajesh P. N. and Dana H. Ballard, 1999, “Predictive Coding in the Visual Cortex: A Functional Interpretation of Some Extra-Classical Receptive-Field Effects”, Nature Neuroscience, 2(1): 79–87. doi:10.1038/4580
- Rohde, Douglas L. T. and David C. Plaut, 2003, “Connectionist Models of Language Processing”, Cognitive Studies (Japan), 10(1): 10–28. doi:10.11225/jcss.10.10
- Roth, Martin, 2005, “Program Execution in Connectionist Networks”, Mind & Language, 20(4): 448–467. doi:10.1111/j.0268-1064.2005.00295.x
- Rumelhart, David E. and James L. McClelland, 1986, “On Learning the Past Tenses of English Verbs”, in McClelland, Rumelhart, and the PDP group 1986: 216–271.
- Rumelhart, David E., James L. McClelland, and the PDP Research Group (eds), 1986, Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations, Cambridge, MA: MIT Press.
- Sadler, Matthew and Natasha Regan, 2019, Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI, Alkmaar: New in Chess.
- Schmidhuber, Jürgen, 2015, “Deep Learning in Neural Networks: An Overview”, Neural Networks, 61: 85–117. doi:10.1016/j.neunet.2014.09.003
- Schwarz, Georg, 1992, “Connectionism, Processing, Memory”, Connection Science, 4(3–4): 207–226. doi:10.1080/09540099208946616
- Sejnowski, Terrence J. and Charles R. Rosenberg, 1987, “Parallel Networks that Learn to Pronounce English Text”, Complex Systems, 1(1): 145–168, available online.
- Servan-Schreiber, David, Axel Cleeremans, and James L. McClelland, 1991, “Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks”, in Touretzky 1991: 57–89. doi:10.1007/978-1-4615-4008-3_4
- Shastri, Lokendra and Venkat Ajjanagadde, 1993, “From Simple Associations to Systematic Reasoning: A Connectionist Representation of Rules, Variables and Dynamic Bindings Using Temporal Synchrony”, Behavioral and Brain Sciences, 16(3): 417–451. doi:10.1017/S0140525X00030910
- Shea, Nicholas, 2007, “Content and Its Vehicles in Connectionist Systems”, Mind & Language, 22(3): 246–269. doi:10.1111/j.1468-0017.2007.00308.x
- Shevlin, Henry and Marta Halina, 2019, “Apply Rich Psychological Terms in AI with Care”, Nature Machine Intelligence, 1(4): 165–167. doi:10.1038/s42256-019-0039-y
- Shultz, Thomas R. and Alan C. Bale, 2001, “Neural Network Simulation of Infant Familiarization to Artificial Sentences”, Infancy, 2(4): 501–536.
- –––, 2006, “Neural Networks Discover a Near-Identity Relation to Distinguish Simple Syntactic Forms”, Minds and Machines, 16(2): 107–139. doi:10.1007/s11023-006-9029-z
- Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, et al., 2018, “A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go through Self-Play”, Science, 362(6419): 1140–1144. doi:10.1126/science.aar6404
- Smolensky, Paul, 1987, “The Constituent Structure of Connectionist Mental States: A Reply to Fodor and Pylyshyn”, The Southern Journal of Philosophy, 26(S1): 137–161. doi:10.1111/j.2041-6962.1988.tb00470.x
- –––, 1988, “On the Proper Treatment of Connectionism”, Behavioral and Brain Sciences, 11(1): 1–23. doi:10.1017/S0140525X00052432
- –––, 1990 [1991], “Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems”, Artificial Intelligence, 46(1–2): 159–216. Reprinted in Hinton 1991: 159–216. doi:10.1016/0004-3702(90)90007-M
- –––, 1995, “Constituent Structure and Explanation in an Integrated Connectionist/Symbolic Cognitive Architecture”, in MacDonald and MacDonald 1995: .
- St. John, Mark F. and James L. McClelland, 1990 [1991], “Learning and Applying Contextual Constraints in Sentence Comprehension”, Artificial Intelligence, 46(1–2): 217–257. Reprinted in Hinton 1991: 217–257 doi:10.1016/0004-3702(90)90008-N
- Tomberlin, James E. (ed.), 1995, Philosophical Perspectives 9: AI, Connectionism and Philosophical Psychology, Atascadero: Ridgeview Press.
- Touretzky, David S. (ed.), 1989, Advances in Neural Information Processing Systems I, San Mateo, CA: Kaufmann, available online.
- ––– (ed.), 1990, Advances in Neural Information Processing Systems II, San Mateo, CA: Kaufmann.
- ––– (ed.), 1991, Connectionist Approaches to Language Learning, Boston, MA: Springer US. doi:10.1007/978-1-4615-4008-3
- Touretzky, David S., Geoffrey E. Hinton, and Terrence Joseph Sejnowski (eds), 1988, Proceedings of the 1988 Connectionist Models Summer School, San Mateo, CA: Kaufmann.
- Van Gelder, Tim, 1990, “Compositionality: A Connectionist Variation on a Classical Theme”, Cognitive Science, 14(3): 355–384. doi:10.1016/0364-0213(90)90017-Q
- –––, 1991, “What is the ‘D’ in PDP?” in Ramsey, Stich, and Rumelhart 1991: 33–59.
- Van Gelder, Timothy and Robert Port, 1993, “Beyond Symbolic: Prolegomena to a Kama-Sutra of Compositionality”, in Vasant G Honavar, Leonard Uhr (eds.), Symbol Processing and Connectionist Models in AI and Cognition: Steps Towards Integration, Boston: Academic Press.
- Vilcu, Marius and Robert F. Hadley, 2005, “Two Apparent ‘Counterexamples’ to Marcus: A Closer Look”, Minds and Machines, 15(3–4): 359–382. doi:10.1007/s11023-005-9000-4
- Von Eckardt, Barbara, 2003, “The Explanatory Need for Mental Representations in Cognitive Science”, Mind & Language, 18(4): 427–439. doi:10.1111/1468-0017.00235
- –––, 2005, “Connectionism and the Propositional Attitudes”, in Christina Erneling and David Martel Johnson (eds.), The Mind as a Scientific Object: Between Brain and Culture, New York: Oxford University Press.
- Waltz, David L. and Jordan B. Pollack, 1985, “Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation*”, Cognitive Science, 9(1): 51–74. doi:10.1207/s15516709cog0901_4
- Wermter, Stefan and Ron Sun (eds.), 2000, Hybrid Neural Systems, (Lecture Notes in Computer Science 1778), Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/10719871
- Yamins, Daniel L. K. and James J. DiCarlo, 2016, “Using Goal-Driven Deep Learning Models to Understand Sensory Cortex”, Nature Neuroscience, 19(3): 356–365. doi:10.1038/nn.4244
- Yosinski, Jason, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2015, “Understanding Neural Networks Through Deep Visualization”, Deep Learning Workshop, 31st International Conference on Machine Learning, Lille, France, available online.
- Zhou, Zhenglong and Chaz Firestone, 2019, “Humans Can Decipher Adversarial Images”, Nature Communications, 10(1): 1334. doi:10.1038/s41467-019-08931-6
Associationist Theories of Thought 联想主义思想理论
*First published Tue Mar 17, 2015; substantive revision Wed Jun 24, 2020
Associationism is one of the oldest, and, in some form or another, most widely held theories of thought. Associationism has been the engine behind empiricism for centuries, from the British Empiricists through the Behaviorists and modern day Connectionists. Nevertheless, “associationism” does not refer to one particular theory of cognition per se, but rather a constellation of related though separable theses. What ties these theses together is a commitment to a certain arationality of thought: a creature’s mental states are associated because of some facts about its causal history, and having these mental states associated entails that bringing one of a pair of associates to mind will, ceteris paribus, ensure that the other also becomes activated.
联想主义是最古老、以某种形式持有最广泛的思想理论之一。几个世纪以来,关联主义一直是经验主义背后的引擎,从英国的经验主义者到行为主义者和现代的连接主义者。然而,“联想主义”并不是指一种特定的认知理论本身,而是指一系列相关但可分离的论点。将这些论点联系在一起的是对某种非理性思维的承诺:一个生物的心理状态之所以相关,是因为关于其因果历史的一些事实,而将这些心理状态关联起来意味着将一对伙伴中的一个带入脑海中,ceteris paribus ,确保另一个也被激活。
1. What is Associationism? 1. 什么是联想主义?
Associationism is a theory that connects learning to thought based on principles of the organism’s causal history. Since its early roots, associationists have sought to use the history of an organism’s experience as the main sculptor of cognitive architecture. In its most basic form, associationism has claimed that pairs of thoughts become associated based on the organism’s past experience. So, for example, a basic form of associationism (such as Hume’s) might claim that the frequency with which an organism has come into contact with Xs and Ys in one’s environment determines the frequency with which thoughts about Xs and thoughts about Ys will arise together in the organism’s future.
联想主义是一种基于有机体因果历史原则将学习与思考联系起来的理论。从早期根源开始,联想论者就试图将有机体经验的历史作为认知架构的主要塑造者。在其最基本的形式中,联想主义声称成对的思想是根据有机体过去的经验而产生的。因此,例如,一种基本形式的联想主义(如休谟的)可能声称,有机体与环境中的 X s 和 Y s 接触的频率决定了在有机体的未来中关于 X s 的想法和关于 Y s 的想法将一起出现的频率。
Associationism’s popularity is in part due to how many different masters it can serve. In particular, associationism can be used as a theory of learning (e.g., as in behaviorist theorizing), a theory of thinking (as in Jamesian “streams of thought”), a theory of mental structures (e.g., as in concept pairs), and a theory of the implementation of thought (e.g., as in connectionism). All these theories are separable, but share a related, empiricist-friendly core. As used here, a “pure associationist” will refer to one who holds associationist theories of learning, thinking, mental structure, and implementation. The “pure associationist” is a somewhat idealized position, one that no particular theorist may have ever held, but many have approximated to differing degrees (e.g., Locke 1690/1975; Hume 1738/1975; Thorndike 1911; Skinner 1953; Hull 1943; Churchland 1986, 1989; Churchland and Sejnowski 1990; Smolensky 1988; Elman 1991; Elman et al. 1996; McClelland et al. 2010; Rydell and McConnell 2006; Fazio 2007).
Associationism 的受欢迎程度部分是由于它可以服务多少不同的主人。特别是,联想主义可以用作学习理论(例如,如行为主义理论化)、思维理论(如詹姆斯的“思想流”)、心理结构理论(例如,如概念对)和思想实施理论(例如,如联结主义)。所有这些理论都是可分离的,但都有一个相关的、对经验主义者友好的核心。这里使用的“纯粹联想主义者”指的是持有学习、思考、心理结构和实施的联想主义理论的人。“纯粹关联论者”是一个有点理想化的立场,可能没有特定的理论家持有过这种立场,但许多人在不同程度上近似了(例如,洛克 1690/1975;休谟 1738/1975;桑代克 1911 年;斯金纳 1953 年;赫尔 1943 年;丘奇兰 1986 年、1989 年;Churchland 和 Sejnowski 1990;斯莫伦斯基 1988 年;Elman 1991 年;Elman 等人,1996 年;McClelland 等人,2010 年;Rydell 和 McConnell 2006 年;Fazio 2007 年)。
Outside of these core uses of associationism the movement has also been closely aligned with a number of different doctrines over the years: empiricism, behaviorism, anti-representationalism (i.e., skepticism about the necessity of representational realism in psychological explanation), gradual learning, and domain-general learning. All of these theses are dissociable from core associationist thought (see section 7). While one can be an associationist without holding those theses, some of those theses imply associationism to differing degrees. These extra theses’ historical and sociological ties to associationism are strong, and so will be intermittently discussed below.
除了联想主义的这些核心用途之外,多年来,该运动还与许多不同的学说密切相关:经验主义、行为主义、反表征主义(即对表征现实主义在心理学解释中的必要性持怀疑态度)、渐进学习和领域通用学习。所有这些论点都与核心联想主义思想无关(见第 7 节)。虽然一个人可以不持有这些论点就成为一个联想主义者,但其中一些论点在不同程度上暗示了联想主义。这些额外的论点与联想主义的历史和社会学联系很强,因此将在下面断断续续地讨论。
2. Associationism as a Theory of Mental Processes: The Empiricist Connection2. 联想主义作为心理过程理论:经验主义联系
Empiricism is a general theoretical outlook, which tends to offer a theory of learning to explain as much of our mental life as possible. From the British empiricists through Skinner and the behaviorists (see the entry on behaviorism) the main focus has been arguing for the acquisition of concepts (for the empiricists’ “Ideas”, for the behaviorists “responses”) through learning. However, the mental processes that underwrite such learning are almost never themselves posited to be learned.[1] So winnowing down the amount of mental processes one has to posit limits the amount of innate machinery with which the theorist is saddled. Associationism, in its original form as in Hume (1738/1975), was put forward as a theory of mental processes. Associationists’ attempt to answer the question of how many mental processes there are by positing only a single mental process: the ability to associate ideas.[2]
经验主义是一种一般的理论观点,它倾向于提供一种学习理论来尽可能多地解释我们的心理生活。从英国经验主义者到斯金纳和行为主义者(见行为主义条目),主要焦点一直是争论通过学习获得概念(经验主义者的“想法”,行为主义者的“回应”)。然而,支撑这种学习的心理过程本身几乎从来没有被认为是可以学习的。[1]因此,减少一个人必须假设的心理过程的数量,限制了理论家所背负的先天机制的数量。联想主义,其原始形式与休谟 (1738/1975) 一样,被作为心理过程理论提出。联想主义者试图通过只假设一个心理过程来回答有多少心理过程的问题:关联思想的能力。[2]。
Of course, thinkers execute many different types of cognitive acts, so if there is only one mental process, the ability to associate, that process must be flexible enough to accomplish a wide range of cognitive work. In particular, it must be able to account for learning and thinking. Accordingly, associationism has been utilized on both fronts. We will first discuss the theory of learning and then, after analyzing that theory and seeing what is putatively learned, we will return to the associationist theory of thinking.
当然,思考者会执行许多不同类型的认知行为,因此,如果只有一个心理过程,即关联能力,那么这个过程必须足够灵活,才能完成广泛的认知工作。特别是,它必须能够考虑学习和思考。因此,关联主义在这两个方面都得到了利用。我们将首先讨论学习理论,然后,在分析该理论并了解假定的学习内容之后,我们将回到联想主义思维理论。
3. Associationism as a Theory of Learning3. 联想主义作为一种学习理论
In one of its senses, “associationism” refers to a theory of how organisms acquire concepts, associative structures, response biases, and even propositional knowledge. It is commonly acknowledged that associationism took hold after the publishing of John Locke’s Essay Concerning Human Understanding (1690/1975).[3] However, Locke’s comments on associationism were terse (though fertile), and did not address learning to any great degree. The first serious attempt to detail associationism as a theory of learning was given by Hume in the Treatise of Human Nature (1738/1975).[4] Hume’s associationism was, first and foremost, a theory connecting how perceptions (“Impressions”) determined trains of thought (successions of “Ideas”). Hume’s empiricism, as enshrined in the Copy Principle,[5] demanded that there were no Ideas in the mind that were not first given in experience. For Hume, the principles of association constrained the functional role of Ideas once they were copied from Impressions: if Impressions IM1 and IM2 were associated in perception, then their corresponding Ideas, ID1 and ID2 would also become associated. In other words, the ordering of Ideas was determined by the ordering of the Impressions that caused the Ideas to arise.
在它的一个意义上,“联想主义”是指一种关于生物体如何获得概念、联想结构、反应偏差甚至命题知识的理论。人们普遍认为,联想主义是在约翰·洛克的《论人类理解》(1690/1975 年)出版后站稳脚跟的。[3]然而,洛克对联想主义的评论很简洁(尽管很丰富),并没有在很大程度上涉及学习。休谟在《人性论》(Treatise of Human Nature,1738/1975)中首次认真尝试将联想主义详细描述为一种学习理论。[4]休谟的联想主义首先是一种理论,它连接了知觉(“印象”)如何决定思路(“思想”的连续性)。休谟的经验主义,正如复制原则所奉行的那样,[5]要求头脑中没有不是首先在经验中给出的观念。对于休谟来说,一旦从印象中复制了想法,关联原则就限制了想法的功能作用:如果印象 IM1 和 IM2 在感知中相关联,那么它们相应的想法 ID1 和 ID2 也将相关联。换句话说,观念的排序是由导致观念产生的印象的顺序决定的。
Hume’s theory then needs to analyze what types of associative relations between Impressions mattered for determining the ordering of Ideas. Hume’s analysis consisted of three types of associative relations: cause and effect, contiguity, and resemblance. If two Impressions instantiated one of these associative relations, then their corresponding Ideas would mimic the same instantiation.[6] For instance, if Impression IM1 was cotemporaneous with Impression IM2, then (ceteris paribus) their corresponding Ideas, ID1 and ID2, would become associated.
然后,休谟的理论需要分析印象之间的哪些类型的联想关系对于确定理念的排序很重要。休谟的分析包括三种类型的关联关系:因果关系、邻接关系和相似性。如果两个 Impression 实例化了这些关联关系中的一个,那么它们相应的 Ideas 将模仿相同的实例化。[6]例如,如果印象 IM1 与印象 IM2 同时期,则 ( ceteris paribus ) 它们相应的想法 ID1 和 ID2 将关联起来。
As stated, Hume’s associationism was mostly a way of determining the functional profile of Ideas. But we have not yet said what it is for two Ideas to be associated (for that see section 4). Instead, one can see Hume’s contribution as introducing a very influential type of learning—associative learning—for Hume’s theory purports to explain how we learn to associate certain Ideas. We can abstract away from Hume’s framework of ideas and his account of the specific relations that underlie associative learning, and state the theory of associative learning more generally: if two contents of experiences, X and Y, instantiate some associative relation, R, then those contents will become associated, so that future activations of X will tend to bring about activations of Y. The associationist then has to explain what relation R amounts to. The Humean form of associative learning (where R is equated with cause and effect, contiguity, or resemblance) has been hugely influential, informing the accounts of those such as Jeremy Bentham, J.S. Mill, and Alexander Bain (see, e.g., the entries on John Stuart Mill and 19th Century Scottish Philosophy).[7]
如前所述,休谟的联想主义主要是确定理念功能概况的一种方式。但我们还没有说两个想法关联起来是什么(见第 4 节)。相反,人们可以将休谟的贡献看作是引入了一种非常有影响力的学习类型——联想学习——因为休谟的理论旨在解释我们如何学会关联某些概念。我们可以从休谟的思想框架和他对联想学习基础的特定关系的描述中抽象出来,并更普遍地陈述联想学习的理论:如果经验的两个内容 X 和 Y 实例化了某种联想关系 R,那么这些内容将变得关联,因此 X 的未来激活将倾向于导致 Y 的激活。然后,关联论者必须解释关系 R 相当于什么。联想学习的休米亚形式(其中 R 等同于因果、连续性或相似性)具有巨大的影响力,为杰里米·边沁、J.S. 穆勒和亚历山大·贝恩等人的叙述提供了信息(例如,参见约翰·斯图尔特·穆勒和 19th 世纪苏格兰哲学的条目)。[7]。
Associative learning didn’t hit its stride until the work of Ivan Pavlov, which spurred the subsequent rise of the behaviorist movement in psychology. Pavlov introduced the concept of classical conditioning as a modernized version of associative learning. For Pavlov, classical conditioning was in part an experimental paradigm for teaching animals to learn new associations between stimuli. The general method of learning was to pair an unconditioned stimulus (US) with a novel stimulus. An unconditioned stimulus is just a stimulus that instinctively, without training, provokes a response in an organism. Since this response is not itself learned, the response is referred to as an “unconditioned response” (UR). In Pavlov’s canonical experiment, the US was a meat powder, as the smell of meat automatically brought about salivation (UR) in his canine subjects. The US is then paired with a neutral stimulus, such as a bell. Over time, the contiguity between the US and the neutral stimulus causes the neutral stimulus to provoke the same response as the US. Once the bell starts to provoke salivation, the bell has become a “conditioned stimulus” (CS) and the salivating, when prompted by the bell alone, a “conditioned response” (CR). The associative learning here is learning to form new stimulus-response pairs between the bell and the salivation.
联想学习直到伊万·巴甫洛夫 (Ivan Pavlov) 的工作才大步向前,这刺激了心理学中行为主义运动的随后兴起。巴甫洛夫引入了经典条件反射的概念,作为联想学习的现代化版本。对于巴甫洛夫来说,经典条件反射在一定程度上是一种实验范式,用于教动物学习刺激之间的新关联。一般的学习方法是将无条件刺激 (US) 与新刺激配对。无条件刺激只是本能地、未经训练就激发生物体反应的刺激。由于这种反应本身不是学习的,因此该反应被称为“无条件反应”(UR)。在巴甫洛夫的经典实验中,美国是一种肉粉,因为肉的气味会自动导致他的犬类受试者流涎 (UR)。然后,美国与中性刺激措施配对,例如钟声。随着时间的推移,美国和中性刺激计划之间的连续性导致中性刺激计划引发与美国相同的反应。一旦铃铛开始引起流涎,铃铛就变成了“条件刺激”(CS),而当仅由铃铛提示时,流涎就变成了“条件反应”(CR)。这里的联想学习是学习在铃铛和流涎之间形成新的刺激-反应对。[8]。[8]
Classical conditioning is a fairly circumscribed process. It is a “stimulus substitution” paradigm where one stimulus can be swapped for another to provoke a response.[9] However, the responses that are provoked are supposed to remain unchanged; all that changes is the stimulus that gets associated with the response. Thus, classical conditioning seemed to some to be too restrictive to explain the panoply of novel behavior organisms appear to execute.[10]
经典条件反射是一个相当有限的过程。这是一种“刺激替代”范式,其中一种刺激可以交换为另一种刺激以引发反应。[9]然而,被激起的反应应该保持不变;唯一改变的是与反应相关的刺激。因此,在一些人看来,经典条件反射似乎过于严格,无法解释生物体似乎执行的一系列新行为。[10]。
Edward Thorndike’s research with cats in puzzle boxes broadened the theory of associative learning by introducing the notion of consequences to associative learning. Thorndike expanded the notion of associative learning beyond instinctual behaviors and sensory substitution to genuinely novel behaviors. Thorndike’s experiments initially probed, e.g., how cats learned to lift a lever to escape the “puzzle boxes” (the forbearer to “Skinner boxes”) that they were trapped in. The cats’ behaviors, such as attempting to lift a lever, were not themselves instinctual behaviors like the URs of Pavlov’s experiments. Additionally, the cats’ behaviors were shaped by the consequences that they brought on. For Thorndike it was because lifting the lever caused the door to open that the cats learned the connection between the lever and the door. This new view of learning, operant conditioning (for the organism is “operating” on its environment), was not merely the passive learning of Pavlov, but a species-nonspecific, general, active theory of learning.
爱德华·桑代克 (Edward Thorndike) 对拼图盒中的猫的研究通过将后果的概念引入联想学习,拓宽了联想学习的理论。桑代克将联想学习的概念从本能行为和感官替代扩展到真正新颖的行为。例如,桑代克的实验最初探讨了猫如何学会抬起杠杆来逃离它们被困在其中的“拼图盒”(“斯金纳盒”的前身)。猫的行为,例如试图举起杠杆,本身并不是像巴甫洛夫实验中的 UR 那样的本能行为。此外,猫的行为受到它们带来的后果的影响。对桑代克来说,正是因为抬起拉杆导致门打开,猫才学会了拉杆和门之间的联系。这种新的学习观,即操作性条件反射(因为有机体正在其环境中“运作”),不仅仅是巴甫洛夫的被动学习,而是一种物种非特异性的、一般的、主动的学习理论。
This research culminated in Thorndike’s famous “Law of Effect” (1911), the first canonical psychological law of associationist learning. It asserted that responses that are accompanied by the organism feeling satisfied will, ceteris paribus, be more likely to be associated with the situation in which the behavior was executed, whereas responses that are accompanied with a feeling of discomfort to the animal will, ceteris paribus, make the response less likely to occur when the organism encounters the same situation.[11] The greater the positive or negative feelings produced, the greater the likelihood that the behavior will be evinced. To this Thorndike added the “Law of Exercise”, that responses to situations will, ceteris paribus, be more connected to those situations in proportion to the frequency of past pairings between situation and response. Thorndike’s paradigm was popularized and extended by B.F. Skinner (see, e.g., Skinner 1953) who stressed the notion not just of consequences but of reinforcement as the basis of forming associations. For Skinner, a behavior would get associated with a situation according to the frequency and strength of reinforcement that would arise as a consequence of the behavior.
这项研究在桑代克著名的“效应定律”(1911 年)中达到顶峰,这是联想主义学习的第一条规范心理学定律。它断言,伴随着生物体感到满意的反应 ceteris paribus ,更有可能与执行行为的情况相关联,而伴随着对动物的不适感的反应,ceteris paribus ,使得当生物体遇到相同情况时,反应不太可能发生。[11]产生的积极或消极情绪越大,这种行为被证明的可能性就越大。对此,桑代克补充了“运动法则”,即对情境的反应将与过去情境和反应之间的配对频率成正比,与这些情境的联系更加紧密。桑代克的范式由 B.F. Skinner(参见,例如,Skinner 1953)推广和扩展,他强调不仅要考虑后果,还要强调强化的概念是形成协会的基础。对于 Skinner 来说,行为会根据行为产生的强化频率和强度与情况相关联。
Since the days of Skinner, associative learning has come in many different variations. But what all varieties should share with their historical predecessors is that associative learning is supposed to mirror the contingencies in the world without adding additional structure to them (see section 9 for some examples of when supposedly associative theories smuggle in extra structure). The question of what contingencies associative learning detects (that is, one’s preferred analysis of what the associative relation R is), is up for debate and changes between theorists.
自 Skinner 时代以来,联想学习出现了许多不同的变化。但是,所有变体都应该与它们的历史前辈共享的是,联想学习应该反映世界上的偶然事件,而不给它们增加额外的结构(参见第 9 节,了解所谓的联想理论何时偷运进入额外结构的一些例子)。联想学习检测到什么偶发性(即一个人对联想关系 R 是什么的偏好分析)的问题有待争论和理论家之间的变化。
The final widely shared, though less central, property of associative learning concerns the domain generality of associative learning. Domain generality’s prevalence among associationists is due in large part to their traditional empiricist allegiances: excising domain-specific learning mechanisms constrains the amount of innate mental processes one has to posit. Thus it is no surprise to find that both Hume and Pavlov assumed that associative learning could be used to acquire associations between any contents, regardless of the types of contents they were. For example, Pavlov writes,
联想学习的最后一个广泛共享但不太核心的属性涉及联想学习的领域普遍性。领域普遍性在关联论者中盛行,很大程度上是由于他们传统的经验主义效忠:去除特定领域的学习机制限制了人们必须假设的先天心理过程的数量。因此,发现休谟和巴甫洛夫都假设联想学习可用于获得任何内容之间的关联也就不足为奇了,无论它们是什么类型的内容。例如,巴甫洛夫写道:
Any natural phenomenon chosen at will may be converted into a conditioned stimulus. Any ocular stimulus, any desired sound, any odor, and the stimulation of any portion of the skin, whether by mechanical means or by the application of heat or cold never failed to stimulate the salivary glands. (Pavlov 1906: 615)
任何随意选择的自然现象都可能转化为条件刺激。任何眼部刺激、任何想要的声音、任何气味和对皮肤任何部分的刺激,无论是通过机械手段还是通过热或冷来刺激唾液腺。(巴甫洛夫 1906:615)
For Pavlov the content of the CS doesn’t matter. Any content will do, as long as it bears the right functional relationship in the organism’s learning history. In that sense, the learning is domain general—it matters not what the content is, just the role it plays (for more on this topic, see section 9.4).[12]
对巴甫洛夫来说,CS 的内容并不重要。任何内容都可以,只要它在有机体的学习历史中具有正确的功能关系。从这个意义上说,学习是一般的领域——重要的是内容是什么,重要的是它所扮演的角色(有关此主题的更多信息,请参见 Section 9.4 )。[12]。
4. Associationism as a Theory of Mental Structure4. 作为心理结构理论的联想主义
Associative learning amounts to a constellation of related views that interprets learning as associating stimuli with responses (in operant conditioning), or stimuli with other stimuli (in classical conditioning), or stimuli with valences (in evaluative conditioning).[13] Associative learning accounts raise the question: when one learns to associate contents X and Y because, e.g., previous experiences with Xs and Ys instantiated R, how does one store the information that X and Y are associated?[14] A highly contrived sample answer to this question would be that a thinker learns an explicitly represented unconscious conditional rule that states “when a token of x is activated, then also activate a token of y”. Instead of such a highly intellectualized response, associationists have found a natural (though by no means necessary, see section 4.2) complementary view that the information is stored in an associative structure.
联想学习相当于一系列相关观点,它将学习解释为将刺激与反应相关联(在操作性条件反射中),或将刺激与其他刺激相关联(在经典条件反射中),或将刺激与效价相关联(在评价条件反射中)。[13]联想学习叙述提出了一个问题:当一个人学会将内容 X 和 Y 关联起来时,因为,例如,以前与 X s 和 Y s 实例化的 R 的经验,一个人如何存储 X 和 Y 相关联的信息?[14]对这个问题的一个高度人为的示例答案是,思考者学习了一个明确表示的无意识条件规则,该规则指出“当 x 的标记被激活时,也激活 y 的标记”。与这种高度理智化的反应相反,联想主义者找到了一种自然的(尽管绝不是必需的,参见第 4.2 节)互补的观点,即信息存储在联想结构中。
An associative structure describes the type of bond that connects two distinct mental states.[15] An example of such a structure is the associative pair salt/pepper.[16] The associative structure is defined, in the first instance, functionally: if X and Y form an associative structure, then, ceteris paribus, activations of mental state X bring about mental state Y and vice versa without the mediation of any other psychological states (such as an explicitly represented rule telling the system to activate a concept because its associate has been activated).[17] In other words, saying that two concepts are associated amounts to saying that there is a reliable, psychologically basic causal relation that holds between them—the activation of one of the concepts causes the activation of the other. So, saying that someone harbors the structure salt/pepper amounts to saying that activations of salt will cause activations of pepper (and vice versa) without the aid of any other cognitive states.
联想结构描述了连接两种不同心理状态的纽带类型。[15]这种结构的一个例子是结合对 salt/pepper 。[16]联想结构首先在功能上被定义:如果 X 和 Y 形成一个联想结构,那么, ceteris paribus ,心理状态 X 的激活带来心理状态 Y,反之亦然,没有任何其他心理状态的中介(例如明确表示的规则告诉系统激活一个概念,因为它的关联已被激活)。[17]换句话说,说两个概念是相关的,就等于说它们之间存在着一种可靠的、心理上基本的因果关系——其中一个概念的激活导致另一个概念的激活。因此,说某人拥有盐/胡椒的结构,相当于说盐的激活会导致胡椒的激活(反之亦然),而无需任何其他认知状态的帮助。
Associative structures are most naturally contrasted with propositional structures. A pure associationist is opposed to propositional structures—strings of mental representations that express a proposition—because propositionally structured mental representations have structure over and above the mere associative bond between two concepts. Take, for example, the associative structure green/toucan. This structure does not predicate green onto toucan. If we know that a mind has an associative bond between green and toucan, then we know that activating one of those concepts leads to the activation of the other. A pure associative theory rules out predication, for propositional structures aren’t just strings of associations. “Association” (in associative structures) just denotes a causal relation among mental representations, whereas predication (roughly) expresses a relation between things in the world (or intentional contents that specify external relations). Saying that someone has an associative thought green/toucan tells you something about the causal and temporal sequences of the activation of concepts in one’s mind; saying that someone has the thought there is a green toucan tells you that a person is predicating greenness of a particular toucan (see Fodor 2003: 91–94, for an expansion of this point).
联想结构与命题结构最自然地形成对比。纯粹的联想论者反对命题结构——表达一个命题的一串心理表征——因为命题结构的心理表征具有超越两个概念之间单纯联想纽带的结构。以关联结构 green/toucan 为例。这种结构并不代表 toucan。如果我们知道一个思想在绿色和巨嘴鸟之间有一个联想纽带,那么我们就知道激活其中一个概念会导致另一个概念的激活。纯粹的联想理论排除了谓词,因为命题结构不仅仅是一串关联。“Association”(在联想结构中)仅表示心理表征之间的因果关系,而谓词(大致)表示世界上事物之间的关系(或指定外部关系的有意内容)。说某人有一个联想思想 green/toucan 告诉你一些关于一个人脑海中概念激活的因果和时间顺序的信息;说某人认为有一只绿色的巨嘴鸟,就告诉你一个人在预测特定巨嘴鸟的绿色性(参见 Fodor 2003:91-94,关于这一点的扩展)。
Associative structures needn’t just hold between simple concepts. One might have reason to posit associative structures between propositional elements (see section 5) or between concepts and valences (see section 8). But none of the proceeding is meant to imply that all structures are associative or propositional—there are other representational formats that the mind might harbor (e.g., analog magnitudes or iconic structures; see Camp 2007; Quilty-Dunn forthcoming). For instance, not all semantically related concepts are harbored in associative structures. Semantically related concepts may in fact also be directly associated (as in doctor/nurse) or they may not (as in horse/zebra; see Perea and Rosa 2002). The difference in structure is not just a theoretical possibility, as these different structures have different functional profiles: for example, conditioned associations appear to last longer than semantic associations do in subjects with dementia (Glosser and Friedman 1991).
关联结构不需要只存在于简单的概念之间。人们可能有理由假设命题元素之间(见第 5 节)或概念和价之间(见第 8 节)之间的联想结构。但是,这些程序并不意味着所有结构都是联想的或命题的——头脑可能包含其他表征格式(例如,模拟量级或标志性结构;参见 Camp 2007;Quilty-Dunn 即将出版)。例如,并非所有语义相关的概念都包含在关联结构中。语义相关的概念实际上也可能直接相关(如 doctor/nurse ),也可能不相关(如 horse/zebra;参见 Perea 和 Rosa 2002)。结构上的差异不仅仅是理论上的可能性,因为这些不同的结构具有不同的功能特征:例如,在痴呆受试者中,条件关联似乎比语义关联持续时间更长(Glosser 和 Friedman 1991)。
4.1 Associative Symmetry 4.1 关联对称
The analysis of associative structures implies that, ceteris paribus, associations are symmetric in their causal effects: if a thinker has a bond between salt/pepper, then salt should bring about pepper just as well as pepper brings about salt (for extensive discussion of the symmetry point see Quilty-Dunn and Mandelbaum 2019). But all else is rarely equal. For example, behaviorists such as Thorndike, Hull, and Skinner knew that the order of learning affected the causal sequence of recall: if one is always hearing “salt and pepper” then salt will be more poised to activate pepper than pepper to activate salt. So, included in the ceteris paribus clause in the analysis of associative structures is the idealization that the learning of the associative elements was equally well randomized in order.
对联想结构的分析表明,ceteris paribus ,关联在其因果效应上是对称的:如果一个思想家在盐/胡椒之间有联系,那么盐应该产生胡椒,就像胡椒产生盐一样(有关对称点的广泛讨论,请参见 Quilty-Dunn 和 Mandelbaum 2019)。但其他一切都很少相等。例如,桑代克、赫尔和斯金纳等行为学家知道,学习的顺序会影响回忆的因果序列:如果一个人总是听到“盐和胡椒”,那么盐将比胡椒更容易激活盐。因此,在关联结构的分析中,ceteris paribus 子句中包括理想化,即关联元素的学习同样按顺序随机化。
Similarly, associative symmetry is violated when there are differing amounts of associative connections between the individual associated elements. For example, in the green/toucan case, most thinkers will have many more associations stemming from green than stemming from toucan. Suppose we have a thinker that only associates toucan with green, but associates green with a large host of other concepts (e.g., grass, vegetables, tea, kermit, seasickness, moss, mold, lantern, ireland, etc). In this case one can expect that toucan will more quickly activate green than green will activate toucan, for the former bond will have its activation strength less weakened amongst other associates than the latter will.
同样,当各个关联元素之间存在不同数量的关联连接时,也会违反关联对称性。例如,在绿色/巨嘴鸟的情况下,大多数思想家来自绿色的联想比源自巨嘴鸟的联想要多得多。假设我们有一个思想家,他只将巨嘴鸟与绿色联系起来,但将绿色与一大堆其他概念(例如,草、蔬菜、茶、kermit、晕船、苔藓、霉菌、灯笼、爱尔兰等)联系起来。在这种情况下,可以预期巨嘴鸟会比绿色激活巨嘴鸟更快地激活绿色,因为前者在其他伙伴中的激活强度会比后者弱。
4.2 Activation Maps of Associative Structure4.2 关联结构的激活映射
An associative activation map (sometimes called a “spreading activation” map, Collins and Luftus 1975) is a mapping for a single thinker of all the associative connections between concepts.[18] There are many ways of operationalizing associative connections. In the abstract, a psychologist will attempt to probe which concepts (or other mental elements) activate which other concepts (or elements). Imagine a subject who is asked to say whether a string of letters constitutes a word or not, which is the typical goal given to subjects in a “lexical decision task”. If a subject has just seen the word “mouse”, we assume that the concept mouse was activated. If the subject is then quicker to say that, e.g., “cursor” is a word than the subject is to say that “toaster” is, then we can infer that cursor was primed, and is thus associatively related to mouse, in this thinker. Likewise, if we find that “rodent” is also responded to quicker, then we know that rodent is associatively related to mouse. Using this procedure, one can generate an associative mapping of a thinker’s mind. Such a mapping would constitute a mapping of the associative structures one harbors. However, to be a true activation map—a true mapping of what concepts facilitate what—the mapping would also need to include information about the violations of symmetry between concepts.
联想激活图(有时称为“传播激活”图,Collins 和 Luftus 1975)是概念之间所有联想联系的单个思考者的映射。[18]有许多方法可以操作关联连接。抽象地说,心理学家将尝试探索哪些概念(或其他心理元素)激活了哪些其他概念(或元素)。想象一下,一个被试被要求说一串字母是否构成一个单词,这是在 “词汇决策任务 ”中给被试的典型目标。如果对象刚刚看到了单词 “mouse”,则我们假设概念 mouse 已被激活。如果主语说出“光标”是一个词,而不是主语说“烤面包机”是,那么我们可以推断出光标是启动的,因此与老鼠 相关联。同样,如果我们发现 “啮齿动物” 的反应也更快,那么我们就知道 啮齿动物 与 老鼠 相关。使用此过程,可以生成思想家思想的联想映射。这样的映射将构成一个人所拥有的联想结构的映射。然而,要成为真正的激活图——什么概念促进什么的真实映射——该映射还需要包括有关概念之间对称性违规的信息。
4.3 Relation Between Associative Learning and Associative Structures4.3 联想学习与联想结构之间的关系
The British Empiricists desired to have a thoroughgoing pure associationist theory, for it allowed them to lessen the load of innate machinery they needed to posit. Likewise, the behaviorists also tended to want a pure associationist theory (sometimes out of a similar empiricist tendency, other times because they were radical behaviorists like Skinner, who banned all discussion of mental representations). Pure associationists tend to be partial to a connection that Fodor (2003) refers to as “Bare-Boned Association”. The idea is that the current strength of an association connection between X and Y is determined, ceteris paribus, by the frequency of the past associations of X and Y. As stated, Bare-Boned Association assumes that associative structures encode, at least implicitly, the frequency of past associations of X and Y, and the strength of that associative bond is determined by the organism’s previous history of experiencing Xs and Ys.[19] In other words, the learning history of past associations determines the current functional profile of the corresponding associative structures.[20]
英国经验主义者希望有一个彻底的纯粹关联主义理论,因为它使他们能够减轻他们需要假设的先天机制的负担。同样,行为主义者也倾向于想要一个纯粹的联想主义理论(有时是出于类似的经验主义倾向,有时是因为他们是像斯金纳这样的激进行为主义者,他禁止所有关于心理表征的讨论)。纯粹的联想论者往往偏向于 Fodor (2003) 所说的“裸露的联想”的联系。这个想法是 X 和 Y 之间关联连接的当前强度是由 X 和 Y 的过去关联频率确定的。如前所述,Bare-Boned Association 假设关联结构至少隐含地编码了 X 和 Y 的过去关联频率,并且该关联键的强度由生物体之前经历 X s 和 Y s 的历史决定。[19]换句话说,过去关联的学习历史决定了相应关联结构的当前功能概况。[20]。
Although the picture sketched above, where associative learning eventuates in associative structure, is appealing for many, it is not forced upon one, as there is no a priori reason to bar any type of structure to arise from a particular type of learning. One may, for example, gain propositional structures from associative learning (see Mitchell et al. 2009 and Mandelbaum 2016 for arguments that this is more than a mere logical possibility). This can happen in two ways. In the first, one may gain an associative structure that has a proposition as one of its associates. Assume that every time one’s father came home he immediately made dinner. In such a case one might associate the proposition daddy is home with the concept dinner (that is one might acquire: daddy is home/dinner). However, one might also just have a propositional structure result from associative learning. If every time one’s father came home he made dinner, then one might just end up learning if daddy is home then dinner will come soon, which is a propositional structure.
尽管上面勾勒的图片,即联想学习最终在联想结构中发生,对许多人来说很有吸引力,但它并不是强加给一个人的,因为没有先验的理由禁止任何类型的结构产生于特定类型的学习。例如,一个人可以从联想学习中获得命题结构(参见 Mitchell et al. 2009 和 Mandelbaum 2016 的论点,即这不仅仅是一种逻辑可能性)。这可以通过两种方式发生。在第一种情况下,一个人可能会获得一个联想结构,该结构有一个命题作为其关联体之一。假设每次父亲回家时,他都会立即做晚饭。在这种情况下,人们可能会将命题 daddy is home 与概念 dinner 联系起来(即人们可能会获得:daddy is home/dinner )。然而,一个人也可能只是联想学习的命题结构结果。如果每次一个人的爸爸回家都做了晚饭,那么一个人最终可能会知道如果爸爸在家,那么晚餐很快就会到来,这是一个命题结构。
4.4 Extinction and Counterconditioning4.4 消光和反调节
There is a different, tighter relationship between associative learning and associative structures concerning how to modulate an association. Associative theorists, especially from Pavlov onward, have been clear on the functional characteristics necessary to modulate an already created association. There have been two generally agreed upon routes: extinction and counterconditioning. Suppose that, through associative learning, you have learned to associate a CS with a US. How do we break that association? Associationists have posited that one breaks an associative structure via two different types of associative learning (/unlearning). Extinction is the name for one such process. During extinction one decouples the external presentation of the CS and the US by presenting the CS without the US (and sometimes the US without the CS). Over time, the organism will learn to disconnect the CS and US.
关于如何调节联想,联想学习和联想结构之间存在着不同的、更紧密的关系。联想理论家,尤其是从巴甫洛夫开始的联想,已经清楚地了解调节已经建立的联想所必需的功能特征。有两条普遍同意的路线:灭绝和反调节。假设通过联想学习,您已经学会了将 CS 与 US 相关联。我们如何打破这种关联?关联主义者认为,通过两种不同类型的联想学习 (/unlearning) 打破了联想结构。 Extinction 就是这样一个过程的名称。在灭绝期间,通过呈现没有 US(有时没有 CS 的 US)来呈现 CS 和 US 的外部呈现。随着时间的推移,有机体将学会断开 CS 和 US 的连接。
Counterconditioning names a similar process to extinction, though one which proceeds via a slightly different method. Counterconditioning can only occur when an organism has an association between a mental representation and a valence, as acquired in an evaluative conditioning paradigm. Suppose that one associates ducks with a positive valence. To break this association via counterconditioning one introduces ducks not with a lack of positive valence (as would happen in extinction) but with the opposite valence, a negative valence. Over multiple exposures, the initial representation/valence association weakens, and is perhaps completely broken.[21]
反条件反射作用命名了一个类似于灭绝的过程,尽管它通过一种略有不同的方法进行。只有当一个有机体在心理表征和效价之间有关联时,才会发生反条件反射,就像在评价性条件反射范式中获得的那样。假设将 ducks 与正价相关联。为了通过反制约来打破这种关联,人们引入的鸭子不是缺乏正价(就像在灭绝时发生的那样),而是具有相反的价价,即负价。在多次曝光中,初始表征/价关联减弱,并且可能完全断开。[21]。
How successful extinction and counterconditioning are, and how they work, is the source of some controversy, and some reason to see both methods as highly ineffectual (Bouton 2004). Although the traditional view is that extinction breaks associative bonds, it is an open empirical question whether extinction proceeds by breaking the previously created associative bonds, or whether it proceeds by leaving that bond alone but creating new, more salient (and perhaps context-specific) associations between the CS and other mental states (Bouton 2002, Bendana and Mandelbaum forthcoming). Additionally, reinstatement, the spontaneous reappearance of an associative bond after seemingly successful extinction, has been observed in many contexts (see, e.g., Dirikx et al. 2004 for reinstatement of fear in humans).[22]
灭绝和反条件反射的成功程度以及它们如何运作,是一些争议的根源,也是认为这两种方法都非常无效的一些理由(Bouton 2004)。尽管传统观点认为消退会打破联想纽带,但这是一个开放的实证问题,它是通过打破先前建立的联想纽带而进行的,还是通过不理会这种纽带但在 CS 和其他心理状态之间创造新的、更突出的(也许是特定于上下文的)联系(Bouton 2002,Bendana 和 Mandelbaum 即将出版)。此外,在许多情况下已经观察到恢复,即在看似成功的灭绝后自发地重新出现的联想纽带(参见,例如,Dirikx 等人,2004 年关于人类恐惧的恢复)。[22]
One fixed point in this debate is that one reverses associative structures via these two types of associative learning/unlearning, and only via these two pathways. What one does not do is try to break an associative structure by using practical or theoretical reasoning. If you associate salt with pepper, then telling you that salt has nothing to do with pepper or giving you very good reasons not to associate the two (say, someone will give you $50,000 for not associating them) won’t affect the association. This much has at least been clear since Locke. In the Essay concerning Human Understanding, in his chapter “On the Association of Ideas” (chapter XXIII) he writes,这场辩论中的一个固定点是,人们通过这两种类型的联想学习/取消学习来反转联想结构,而且只能通过这两种途径。我们不做的是尝试使用实践或理论推理来打破联想结构。如果你把盐和胡椒联系起来,那么告诉你盐和胡椒无关,或者给你很好的理由不把两者联系起来(比如,有人会给你 50,000 美元不把它们联系起来)不会影响这种关联。至少在洛克之后,这一点已经很清楚了。在《论人类理解论》一章中(第二十三章),他写道:
When this combination is settled, and while it lasts, it is not in the power of reason to help us, and relieve us from the effects of it. Ideas in our minds, when they are there, will operate according to their natures and circumstances. And here we see the cause why time cures certain affections, which reason, though in the right, and allowed to be so, has not power over, nor is able against them to prevail with those who are apt to hearken to it in other cases. (2.23.13)
当这种结合稳定下来,并且持续存在时,理性就不能帮助我们,使我们摆脱它的影响。我们脑海中的想法,当它们存在时,将根据它们的性质和环境运作。在这里,我们看到了为什么时间能治愈某些感情的原因,而这种感情虽然是正确的,而且是被允许的,但却无力支配,也无法战胜那些在其他情况下倾向于听从它的人。(2.23.13)
Likewise, say one has just eaten lutefisk and then vomited. The smell and taste of lutefisk will then be associated with feeling nauseated, and no amount of telling one that they shouldn’t be nauseated will be very effective. Say the lutefisk that made one vomit was covered in poison, so that we know that the lutefisk wasn’t the root cause of the sickness.[23] Having this knowledge won’t dislodge the association. In essence, associative structures are functionally defined as being fungible based on counterconditioning, extinction, and nothing else. Thus, assuming one sees counterconditioning and extinction as types of associative learning, we can say that associative learning does not necessarily eventuate in associative structures, but associative structures can only be modified by associative learning.
同样,假设一个人刚刚吃了北欧碱鱼然后呕吐了。然后,北欧碱鱼的气味和味道将与恶心感相关联,无论告诉一个人他们不应该恶心都不会非常有效。假设使一次呕吐的北欧碱鱼 被毒药覆盖,这样我们就知道北欧碱鱼不是疾病的根本原因。[23]拥有这些知识不会驱逐协会。从本质上讲,关联结构在功能上被定义为基于反条件反射、消退和其他任何东西的可替代性。因此,假设一个人将反条件反射和消亡视为联想学习的类型,我们可以说联想学习不一定会在联想结构中发生,但联想结构只能通过联想学习来改变。
5. Associative Transitions 5. 关联过渡
So far we’ve discussed learning and mental structures, but have yet to discuss thinking. The pure associationist will want a theory that covers not just acquisition and cognitive structure, but also the transition between thoughts. Associative transitions are a particular type of thinking, akin to what William James called “The Stream of Thought” (James 1890). Associative transitions are movements between thoughts that are not predicated on a prior logical relationship between the elements of the thoughts that one connects. In this sense, associative transitions are contrasted with computational transitions as analyzed by the Computational Theory of Mind (Fodor 2001; Quilty-Dunn and Mandelbaum 2018,2019; see the entry on Computational Theory of Mind). CTM understands inferences as truth preserving movements in thought that are underwritten by the formal/syntactic properties of thoughts. For example inferring the conclusion in modus ponens from the premises is possible just based on the form of the major and minor premise, and not on the content of the premises. Associative transitions are transitions in thought that are not based on the logico-syntactic properties of thoughts. Rather, they are transitions in thought that occur based on the associative relations among the separate thoughts.
到目前为止,我们已经讨论了学习和心理结构,但尚未讨论 思考 .纯粹的联想主义者会想要一个不仅涵盖后天和认知结构,还涵盖思想之间过渡的理论。联想过渡是一种特殊类型的思维,类似于威廉·詹姆斯 (William James) 所说的“思潮”(James 1890)。联想过渡是思想之间的运动,它不以一个人所连接的思想元素之间的先验逻辑关系为前提。从这个意义上说,联想转换与计算心智理论分析的计算转换形成对比(Fodor 2001;Quilty-Dunn 和 Mandelbaum 2018,2019;参见 Computational Theory of Mind 上的条目)。CTM 将推理理解为思想中保留真理的运动,这些运动由思想的形式/句法属性支撑。例如,从前提推断出 modus ponens 的结论是可能的,只是根据主要和次要前提的形式,而不是根据前提的内容。联想过渡是思想中的过渡,它不是基于思想的逻辑句法属性的。相反,它们是基于独立思想之间的联想关系而发生的思想转变。
Imagine an impure associationist model of the mind, one that contains both propositional and associative structures. A computational inference might be one such as inferring you are a g from the thoughts if you are an f, then you are a g, and you are an f. However, an associative transition is just a stream of ideas that needn’t have any formal, or even rational, relation between them, such as the transition from this coffee shop is cold to russia should annex idaho, without there being any intervening thoughts. This transition could be subserved merely by one’s association of idaho and cold, or it could happen because the two thoughts have tended to co-occur in the past, and their close temporal proximity caused an association between the two thoughts to arise (or for many other reasons). Regardless of the etiology, the transition doesn’t occur on the basis of the formal properties of the thoughts.[24]
想象一个不纯粹的心智联想模型,一个同时包含命题和联想结构的模型。计算推理可能是这样一种,例如从思想中推断你是 g ,如果你是 f ,那么你是 g ,你是 f 。然而,联想过渡只是一连串的想法,它们之间不需要任何正式的,甚至不需要理性的关系,例如从这家咖啡店的过渡是冷的,俄罗斯应该吞并爱达荷州,没有任何干预的想法。这种转变可能仅仅通过一个人对 idaho 和 cold 的联想来支撑,或者可能是因为这两个想法在过去倾向于同时发生,并且它们在时间上的接近导致两个想法之间的联系出现(或出于许多其他原因)。无论病因如何,这种转变都不会根据思想的形式属性发生。[24]。
According to this taxonomy, talk of an “associative inference” (e.g., Anderson et al. 1994; Armstrong et al. 2012) is a borderline oxymoron. The easiest way to give sense to the idea of an associative inference is for it to involve transitions in thought that began because they were purely inferential (as understood by the computational theory of mind) but then became associated over time. For example, at first one might make the modus ponens inference because a particular series of thoughts instantiates the modus ponens form. Over time the premises and conclusion of that particular token of a modus ponens argument become associated with each other through their continued use in that inference and now the thinker merely associates the premises with the conclusion. That is, the constant contiguity between the premises and the conclusion occurred because the inference was made so frequently, but the inference was originally made so frequently not because of the associative relations between the premises and conclusion, but because the form of the thoughts (and the particular motivations of the thinker). This constant contiguity then formed the basis for an associative linkage between the premises and the conclusion. [25]
根据这个分类法,谈论“结合推理”(例如,Anderson 等人,1994 年;Armstrong et al. 2012)是一个边缘矛盾的说法。理解联想推理概念的最简单方法是它涉及思想中的转变,这些转变始于因为它们是纯粹的推理(正如计算心智理论所理解的),但随后随着时间的推移变得关联。例如,起初,人们可能会进行 modus ponens 推理,因为一系列特定的思想实例化了 modus ponens 形式。随着时间的推移,modus ponens 论证的特定标记的前提和结论通过它们在该推理中的持续使用而相互关联,现在思想者只是将前提与结论联系起来。也就是说,前提和结论之间的持续连续性是因为推理如此频繁地进行,但推理最初之所以如此频繁地进行,不是因为前提和结论之间的关联关系,而是因为思想的形式(以及思考者的特定动机)。这种恒定的连续性随后构成了前提和结论之间联想联系的基础。[25]。
As was the case for associative structures, associative transitions in thought are not just a logical possibility. There are particular empirical differences associated with associative transitions versus inferential transitions. Associative transitions tend to move across different content domains, whereas inferential transitions tend to stay on a more focused set of contents. These differences have been seen to result in measurable differences in mood: associative thinking across topics bolsters mood when compared to logical thinking on a single topic (Mason and Bar 2012).与联想结构一样,思想中的联想转换不仅仅是一种逻辑上的可能性。与联想过渡与推断过渡存在特殊的经验差异。关联过渡往往在不同的内容域之间移动,而推理过渡往往停留在更集中的内容集上。这些差异被认为会导致情绪的可测量差异:与单一主题的逻辑思维相比,跨主题的联想思维可以增强情绪(Mason 和 Bar 2012)。
6. Associative Instantiation 6. 关联实例化
The associationist position so far has been neutral on how associations are to be implemented. Implementation can be seen at a representational (that is psychological) level of explanation, or at the neural level. A pure associationist picture would posit an associative implementation base at one, or both, of these levels.[26]
到目前为止,协会主义者的立场在如何实施协会方面一直保持中立。实现可以在表征(即心理)解释层面或神经层面看到。纯粹的关联主义图景将假设在这些级别的一个或两个级别上有一个关联实现基础。[26]。
The most well-known associative instantiation base is a class of networks called Connectionist networks (see the entry on connectionism). Connectionist networks are sometimes pitched at the psychological level (see, e.g., Elman 1991; Elman et al. 1996; Smolensky 1988). This amounts to the claim that models of algorithms embedded in the networks capture the essence of certain mental processes, such as associative learning. Other times connectionist networks are said to be models of neural activity (“neural networks”). Connectionist networks consist in sets of nodes, generally input nodes, hidden nodes, and output nodes. Input nodes are taken to be analogs of sensory neurons (or sub-symbolic sensory representations), output nodes the analog of motor neurons (or sub-symbolic behavioral representations), and hidden nodes are stand-ins for all other neurons.[27] The network consists in these nodes being connected to each other with varying strengths. The topology of the connections gives one an associative mapping of the system, with the associative weights understood as the differing strengths of connections. On the psychological reading, these associations are functionally defined; on the neurological reading, they are generally understood to be representing synaptic conductance (and are the analogs of dendrites).[28] Prima facie, these networks are purely associative and do not contain propositional elements, and the nodes themselves are not to be equated with single representational states (such as concepts; see, e.g., Gallistel and King 2009).
最著名的关联实例化基是一类称为 Connectionist networks 的网络(参见 connectionism 上的条目)。联结主义网络有时在心理层面上提出(参见,例如,Elman 1991;Elman 等人,1996 年;Smolensky 1988 年)。这相当于声称嵌入网络中的算法模型捕捉了某些心理过程的本质,例如联想学习。其他时候,连接主义网络被称为神经活动的模型(“神经网络”)。连接主义网络由节点集组成,通常是输入节点、隐藏节点和输出节点。输入节点被视为感觉神经元(或亚符号感觉表征)的类似物,输出节点被视为运动神经元(或亚符号行为表征)的类似物,而隐藏节点是所有其他神经元的替身。[27]该网络由这些节点组成,这些节点以不同的强度相互连接。连接的拓扑结构为人们提供了系统的关联映射,其中关联权重被理解为连接的不同强度。在心理阅读中,这些关联在功能上是定义的;在神经学读数上,它们通常被理解为代表突触电导(并且是树突的类似物)。[28]从表面上看,这些网络是纯粹的结合性的,不包含命题元素,节点本身不应等同于单一的表征状态(例如概念;参见,例如 Gallistel 和 King 2009)。
However, a connectionist network can implement a classical Turing machine architecture (see, e.g., Fodor and McLaughlin 1990; Chalmers 1993). Many, if not most, of the adherents of classical computation, for example proponents of CTM, think that the brain is an associative network, one which implements a classical computational program. Some adherents of CTM do deny that the brain runs an associative network (see, e.g., Gallistel and King 2009, who appear to deny that there is any scientific level of explanation that association is intimately involved in), but they do so on separate empirical grounds and not because of any logical inconsistency with an associative brain implementing a classical mind.
然而,连接主义网络可以实现经典的图灵机架构(参见,例如 Fodor 和 McLaughlin 1990;Chalmers 1993 年)。许多(如果不是大多数)经典计算的拥护者,例如 CTM 的支持者,认为大脑是一个联想网络,一个实现经典计算程序的网络。CTM 的一些追随者确实否认大脑运行着一个联想网络(参见,例如 Gallistel 和 King 2009,他们似乎否认存在任何科学层面的解释与联想密切相关),但他们这样做是基于不同的经验基础,而不是因为与实现经典思维的联想大脑有任何逻辑上的不一致。
When discussing an associative implementation base it is important to distinguish questions of associationist structure from questions of representational reality. Connectionists have often been followers of the Skinnerian anti-representationalist tradition (Skinner 1938). Because of the distributed nature of the nodes in connectionist networks, the networks have tended to be analyzed as associative stimulus/response chains of subsymbolic elements. However, the question of whether connectionist networks have representations which are distributed in patterns of activity throughout different nodes of the network, or whether connectionist networks are best understood as containing no representational structures at all, is orthogonal to both the question of whether the networks are purely associative or computational, and whether the networks can implement classical architectures.
在讨论联想实现基础时,区分联想结构问题和表征现实问题是很重要的。联结主义者通常是斯金纳反表征主义传统的追随者(斯金纳 1938)。由于连接主义网络中节点的分布式性质,这些网络倾向于被分析为子符号元素的联想刺激/反应链。然而,连接主义网络是否具有以活动模式分布在网络不同节点中的表示,或者连接主义网络是否最好理解为根本不包含表示结构的问题,与网络是纯粹的结合还是计算的问题,以及网络是否可以实现经典架构的问题。
7. Relation between the Varieties of Association and Related Positions7. 关联品种与相关位置之间的关系
These four types of associationism share a certain empiricist spiritual similarity, but are logically, and empirically, separable. The pure associationist who wants to posit the smallest number of domain-general mental processes will theorize that the mind consists of associative structures acquired by associative learning which enter into associative transitions and are implemented in an associative instantiation base. However, many hybrid views are available and frequently different associationist positions become mixed and matched, especially once issues of empiricism, domain-specificity, and gradual learning arise. Below is a partial taxonomy of where some well-known theorists lie in terms of associationism and these other, often related doctrines.
这四种类型的联想主义在精神上具有一定的相似性,但在逻辑上和实证上是可以分开的。想要假设最少数量的域通用心理过程的纯粹联想主义者将理论上认为,心智由通过联想学习获得的联想结构组成,这些结构进入联想过渡并在联想实例化基础中实现。然而,有许多混合观点可用,并且经常不同的关联主义立场变得混合和匹配,尤其是在出现经验主义、领域特异性和渐进学习的问题时。以下是一些著名理论家在关联主义和其他这些通常相关的学说方面所处位置的部分分类。
Prinz (2002) and Karmiloff-Smith (1995) are examples of empiricist non-associationists. It is rare to find an associationist who is a nativist, but plenty of nativists have aspects of associationism in their own work. For example, even the arch-nativist Jerry Fodor maintains that intramodular lexicons contain associative structures (Fodor 1983). Similarly, there are many non-behaviorist (at least non-radical, analytic, or methodological behaviorist) associationists, such as Elman (1991), Smolensky (1988), Baeyens (De Houwer and Baeyens 2001), and modern day dual process theorists such as Evans and Stanovich (2013). It is quite difficult to find a non-associationist behaviorist, though Tolman approximates one (Tolman 1948). Elman and Smolensky also qualify as representationalist associationists, and Van Gelder (1995) as an anti-representationalist non-associationist. Karmiloff-Smith (1995) can be interpreted as, for some areas of learning, a proponent of gradual learning without being associationist (some might also read contemporary Bayesian theorists, e.g., Tenenbaum et al. 2011 and Chater et al. 2006 as holding a similar position for some areas of learning). Rescorla (1988) and Heyes (2012) claim to be associationists who are pro step-wise, one shot learning (though Rescorla sees his project as a continuation of the classical conditioning program, others see his data as grist for the anti-associationist, pro-computationalist mill, see Gallistel and King 2009; Quilty-Dunn and Mandelbaum 2019). Lastly, Tenenbaum and his contemporary Bayesians colleagues sometimes qualify as holding a domain-general learning position without it being associationist.[29]
Prinz (2002) 和 Karmiloff-Smith (1995) 是经验主义非关联主义者的例子。很少找到一个本土主义者的联想主义者,但很多本土主义者在他们自己的作品中都有联想主义的一面。例如,即使是原始主义者 Jerry Fodor 也坚持认为模内词典包含联想结构(Fodor 1983)。同样,有许多非行为主义(至少是非激进的、分析的或方法论的行为主义的)关联论者,如 Elman (1991)、Smolensky (1988)、Baeyens (De Houwer 和 Baeyens 2001),以及现代二元过程论者,如 Evans 和 Stanovich (2013)。要找到一个非关联主义的行为主义者是相当困难的,尽管托尔曼近似于一个(Tolman 1948)。Elman 和 Smolensky 也有资格成为代表性协会主义者,Van Gelder (1995) 也有资格成为反代表性非协会主义者。Karmiloff-Smith (1995) 可以解释为,对于某些学习领域,支持渐进式学习,而不是联想主义(有些人也可能将当代贝叶斯理论家,例如 Tenenbaum 等人,2011 年和 Chater 等人,2006 年)解释为在某些学习领域持有类似的立场)。Rescorla (1988) 和 Heyes (2012) 声称自己是支持循序渐进、一次性学习的关联主义者(尽管 Rescorla 将他的项目视为经典条件反射程序的延续,其他人将他的数据视为反关联主义、支持计算主义工厂的素材,参见 Gallistel 和 King 2009;Quilty-Dunn 和 Mandelbaum 2019 年)。最后,特南鲍姆和他同时代的贝叶斯同事有时有资格担任一般领域的学习立场,而不是联想主义。[29]。
8. Associationism in Social Psychology8. 社会心理学中的联想主义
Since the cognitive revolution, associationism’s influence has mostly died out in cognitive psychology and psycholinguistics. This is not to say that all aspects of associative theorizing are dead in these areas; rather, they have just taken on much smaller, more peripheral roles (for example, it has often been suggested that mental lexicons are structured, in part, associatively, which is why lexical decision tasks are taken to be facilitation maps of one’s lexicon). In other areas of cognitive psychology (for example, the study of causal cognition), associationism is no longer the dominant theoretical paradigm, but vestiges of associationism still persist (see Shanks 2010 for an overview of associationism in causal cognition). Associationism is also still alive in the connectionist literature, as well as in the animal cognition tradition.
自认知革命以来,联想主义的影响在认知心理学和心理语言学中大多消失了。这并不是说联想理论化的所有方面在这些领域都已死去;相反,他们只是承担了更小、更边缘的角色(例如,经常有人认为心理词典在一定程度上是联想式的,这就是为什么词汇决策任务被视为一个人词典的促进图)。在认知心理学的其他领域(例如,因果认知的研究),关联主义不再是占主导地位的理论范式,但关联主义的残余仍然存在(参见 Shanks 2010 关于因果认知中关联主义的概述)。联想主义在联结主义文献以及动物认知传统中也仍然存在。
But the biggest contemporary stronghold of associationist theorizing resides in social psychology, an area which has traditionally been hostile to associationism (see, e.g., Asch 1962, 1969). The ascendance of associationism in social psychology has been a fairly modern development, and has caused a revival of associationist theories in philosophy (e.g., Madva and Brownstein 2019). The two areas of social psychology that have seen the greatest renaissance of associationism are the implicit attitude and dual-process theory literature. However, in the late 2010s social psychology has begun to take a critical look at associationist theories (e.g., Mann et al. 2019).
但联想主义理论化在当代最大的堡垒在于社会心理学,这是一个传统上对联想主义持敌对态度的领域(参见,例如,Asch 1962,1969)。联想主义在社会心理学中的崛起是一个相当现代的发展,并导致了哲学中联想主义理论的复兴(例如,Madva 和 Brownstein 2019)。社会心理学中联想主义复兴最大的两个领域是内隐态度和双过程理论文献。然而,在 2010 年代后期,社会心理学已经开始批判性地看待联想主义理论(例如,Mann 等人,2019 年)。
8.1 Implicit Attitudes 8.1 内隐态度
Implicit attitudes are generally operationally defined as the attitudes tested on implicit tests such as the Implicit Association Test (Greenwald et al. 1998), the Affect Misattribution Procedure (Payne et al. 2005), the Sorted Paired Feature Task (Bar-Annan et al. 2009) and the Go/No-Go Association Task (Nosek and Banaji 2001). Implicit attitudes are contrasted with explicit attitudes, attitudes operationalized as the one’s being probed when one gives an explicit response like a marking on a Likert scale, feeling thermometer, or in free report. Such operationalizations leave open the question of whether there are any natural kinds to which explicit and implicit attitudes refer. In general implicit attitudes are characterized as being mental representations that are unavailable for explicit report and inaccessible to consciousness (cf. Hahn et al. 2014; Berger 2020).
内隐态度通常在操作上定义为在内隐测试中测试的态度,例如内隐联想测试(Greenwald 等人,1998 年)、影响错误归因程序(Payne 等人,2005 年)、排序配对特征任务(Bar-Annan 等人,2009 年)和通过/不参与关联任务(Nosek 和 Banaji 2001 年)。内隐态度与外显态度形成对比,当一个人给出明确的回答时,如李克特量表上的标记、感觉温度计或免费报告,态度作为一个人被探究。这种操作化留下了一个悬而未决的问题,即是否存在显性和隐性态度所指代的任何自然类型。一般来说,内隐态度被描述为无法进行明确报告且意识无法接近的心理表征(参见 Hahn 等人,2014 年;Berger 2020 年)。
The default position among social psychologists is to treat implicit attitudes as if they are associations among mental representations (Fazio 2007), or among pairs of mental representations and valences. In particular, they treat implicit attitudes as associative structures which enter into associative transitions. Recently this issue has come under much debate. In an ever expanding series of studies De Houwer and his collaborators have taken to show that associative learning is, at base, relational, propositional contingency learning; i.e., that all putatively associative learning is in fact a nonautomatic learning process that generates and evaluates propositional hypotheses (Mitchell et al. 2009; De Houwer 2009, 2011, 2014 2019; Hughes et al. 2019). Other researchers have approached the question also using learning as the entrance point to the debate, demonstrating effects that non-associative acquisition creates stronger attitudes than associative acquisition (Hughes et al. 2019). For example, one might demonstrate that learning through merely reading an evaluative statement creates a stronger implicit attitude than repeated associative exposures (Kurdi and Banaji 2017, 2019; Mann et al. 2019). Other researchers have championed propositional models not based on learning, but instead based on how implicit attitudes change regardless of how they are acquired. For instance, Mandelbaum (2016) argued that logical/evidential interventions modulate implicit attitudes in predictable ways (e.g., using double negation to cancel each other out), while others have used diagnosticity to show that implicit attitudes update in a non-associationistic, propositional way (e.g., after reading a story about a man who broke into a building and appeared to ransack it you learn that we jumped into save people from a fire and immediately change your opinion of the man from negative to positive; Mann and Ferguson 2015; Mann et al. 2017; Van Dessel et al. 2019). (For more on implicit attitudes see the entry on implicit bias).
社会心理学家的默认立场是将内隐态度视为心理表征之间的关联(Fazio 2007),或者心理表征和效价对之间的关联。特别是,他们将隐性态度视为进入联想过渡的联想结构。最近,这个问题引起了很多争论。在不断扩大的一系列研究中,De Houwer 和他的合作者已经证明,联想学习从根本上说是关系性的、命题的或有学习;即,所有假定的联想学习实际上都是一个非自动学习过程,它产生和评估命题假设(Mitchell 等人,2009 年;De Houwer 2009、2011、2014、2019;Hughes 等人,2019 年)。其他研究人员也使用学习作为辩论的切入点来解决这个问题,证明了非联想习得比联想习得产生更强的态度(Hughes 等人,2019 年)。例如,人们可能会证明,仅通过阅读评价性陈述来学习比重复的联想暴露会产生更强烈的内隐态度(Kurdi 和 Banaji 2017,2019;Mann 等人,2019 年)。其他研究人员倡导的命题模型不是基于学习,而是基于内隐态度如何变化,而不管它们是如何获得的。例如,Mandelbaum (2016) 认为,逻辑/证据干预以可预测的方式调节内隐态度(例如,使用双重否定来相互抵消),而其他人则使用诊断来表明内隐态度以非关联、命题的方式更新(例如,在阅读了一个关于一个闯入建筑物并似乎洗劫一空的人的故事后,您了解到我们跳入了 Save People from a Fire,并立即改变了你的看法男人从消极到积极;Mann 和 Ferguson 2015;Mann 等人,2017 年;Van Dessel 等人,2019 年)。
8.2 Dual Process Theories 8.2 双重过程理论
Associative structures and transitions are widely implicated in a particular type of influential dual-process theory. Though there are many dual-process theories in social psychology (see, e.g., the papers in Chaiken and Trope 1999, or the discussion in Evans and Stanovich 2013), the one most germane to associationism is also the most popular. It originates from work in the psychology of reasoning and is often also invoked in the heuristics and biases tradition (see, e.g., Kahneman 2011). It has been developed by many different psychological theorists (Sloman 1996; Smith and Decoster 2000; Wilson et al. 2000; Evans and Stanovich 2013) and, in parts, taken up by philosophers too (see, e.g., Gendler 2008; Frankish 2009; see also some of the essays in Evans and Frankish 2009).
联想结构和过渡广泛涉及一种特定类型的有影响力的双过程理论。尽管社会心理学中有许多双过程理论(例如,参见 Chaiken 和 Trope 1999 年的论文,或 Evans 和 Stanovich 2013 年的讨论),但与联想主义最密切相关的理论也是最受欢迎的。它起源于推理心理学的工作,也经常在启发式和偏见传统中被引用(参见,例如,Kahneman 2011)。它已被许多不同的心理学理论家开发(Sloman 1996;Smith 和 Decoster 2000;Wilson 等人,2000 年;Evans 和 Stanovich 2013),并且部分也被哲学家所采用(参见,例如,Gendler 2008;法兰克语 2009;另见 Evans 和 Frankish 2009 中的一些文章)。
The dual-process strain most relevant to the current discussion posits two systems, one evolutionarily ancient intuitive system underlying unconscious, automatic, fast, parallel and associative processing, the other an evolutionarily recent reflective system characterized by conscious, controlled, slow, “rule-governed” serial processes (see, e.g., Evans and Stanovich 2013). The ancient system, sometimes called “System 1”, is often understood to include a collection of autonomous, distinct subsystems, each of which is recruited to deal with distinct types of problems (see Stanovich 2011 for a discussion of “TASS—the autonomous set of systems”). Although theories differ on how System 1 interacts with System 2,[30] the theoretical core of System 1 is arguing that its processing is essentially associative. As in the implicit attitude debate, dual systems models have recently come under fire (see Kruglanski 2013; Osman 2013; Mandelbaum 2016; De Houwer 2019), though they remain very popular.
与当前讨论最相关的双过程菌株假设两个系统,一个是进化上古老的直觉系统,是无意识、自动、快速、平行和联想加工的基础,另一个是进化上最近的反思系统,其特征是有意识的、受控的、缓慢的、“规则支配的”系列过程(参见,例如,Evans 和 Stanovich 2013)。古老的系统,有时被称为“系统 1”,通常被理解为包括一组自主的、不同的子系统,每个子系统都被招募来处理不同类型的问题(参见 Stanovich 2011 对“TAS——自治系统集”的讨论)。尽管关于系统 1 如何与系统 2 交互的理论存在差异,但[30]系统 1 的理论核心是论证其处理本质上是结合的。与内隐态度辩论一样,双系统模型最近也受到了抨击(参见 Kruglanski 2013;奥斯曼 2013 年;Mandelbaum 2016 年;De Houwer 2019),尽管它们仍然非常受欢迎。
9. Criticisms of Associationism 9. 对结社主义的批评
Associationism has been a dominant theme in mental theorizing for centuries. As such, it has garnered an appreciable amount of criticism.
几个世纪以来,联想主义一直是心理理论的一个主导主题。因此,它受到了相当多的批评。
9.1 Learning Curves 9.1 学习曲线
The basic associative learning theories imply, either explicitly or implicitly, slow, gradual learning of associations (Baeyens et al. 1995). The learning process can be summarized in a learning curve which plots the frequency (or magnitude) of the conditioned response as a function of the number of reinforcements (Gallistel et al. 2004: 13124). Mappings between CRs and USs are gradually built up over numerous trials (in the lab) or experiences (in the world). Gradual, slow learning has come under fire from a variety of areas (see sections 9.3 and 9.4.1). However, here we just focus on the behavioral data. In a series of works re-analyzing animal behavior, Gallistel (Gallistel et al. 2004; Gallistel and King 2009) has argued that although group-level learning curves do display the properties of being negatively accelerated and gradually developing, these curves are misleading because no individual’s learning curve has these properties. Gallistel has argued that learning for individuals is generally step-like, rapid, and abrupt. An individual’s learning from a low-level of responding to asymptotic responding is very quick. Sometimes, the learning is so quick that it is literally one-shot learning. For example, after analyzing multiple experiments of animal learning of spatial location Gallistel writes
基本的联想学习理论明确或隐含地暗示了对联想的缓慢、渐进的学习(Baeyens et al. 1995)。学习过程可以用学习曲线来概括,该曲线将条件响应的频率(或幅度)绘制为强化次数的函数(Gallistel et al. 2004: 13124)。CR 和 US 之间的映射是通过无数次试验(在实验室中)或经验(在世界上)逐渐建立起来的。渐进、缓慢的学习受到了来自各个领域的抨击(参见第 9.3 节和第 9.4.1 节)。但是,这里我们只关注行为数据。在一系列重新分析动物行为的作品中,Gallistel(Gallistel 等人,2004 年;Gallistel 和 King 2009) 认为,尽管群体层面的学习曲线确实显示出负加速和逐渐发展的特性,但这些曲线具有误导性,因为没有个体的学习曲线具有这些特性。Gallistel 认为,个人的学习通常是阶梯式的、快速的和突然的。一个人从低水平的反应中学习渐近反应是非常快的。有时,学习速度如此之快,以至于实际上是一次性学习。例如,在分析了动物学习空间位置的多个实验后,Gallistel 写道
The learning of a spatial location generally requires but a single experience. Several trials may, however, be required to convince the subject that the location is predictable from trial to trial. (Gallistel et al. 2004: 13130)
学习空间位置通常只需要一次体验。然而,可能需要几次试验才能让受试者相信该地点在一次又一次的试验中是可预测的。(Gallistel 等人,2004:13130)
Gallistel argues that the reason the group learning curves look to be smooth and gradual is that there are large individual differences between subjects in terms of when the onset latency of the step-wise curves begin (Gallistel et al. 2004: 13125); in other words, different animals take different amounts of time for the learning to commence. The differences between individual subject’s learning curves are predicated on when the steps begin and not by the speed of the individual animal’s learning process. All individuals appear to show rapid rises in learning, but since each begins their learning at a different time, when we average over the group, the rapid step-wise learning appears to look like slow, gradual learning (Gallistel et al. 2004: 13124).
Gallistel 认为,群体学习曲线看起来平滑和渐进的原因是,就逐步曲线的开始潜伏期何时开始而言,受试者之间存在很大的个体差异(Gallistel 等人,2004:13125);换句话说,不同的动物需要不同的时间来开始学习。个体受试者学习曲线之间的差异取决于步骤的开始时间,而不是个体动物的学习过程的速度。所有个体的学习似乎都表现出快速的提升,但由于每个人的学习时间不同,当我们对群体进行平均时,快速的逐步学习似乎看起来像是缓慢的、渐进的学习(Gallistel 等人,2004:13124)。
9.2 The Problem of Predication 9.2 谓词的问题
The problem of predication is, at its core, a problem of how an associative mechanism can result in the acquisition of subject/predicate structures, structures which many theorists believe appear in language, thought, and judgment. The first major discussion of the problem appears in Kant (1781/1787), but variants of the basic Kantian criticism can be seen across the contemporary literature (see, e.g., Chomsky 1959; Fodor and Pylyshyn 1988; Fodor 2003; Mandelbaum 2013a; for the details of the Kantian argument see the entry on Kant’s Transcendental Argument).
谓词问题的核心是联想机制如何导致主语/谓语结构的获得,许多理论家认为这些结构出现在语言、思想和判断中。对这个问题的第一次主要讨论出现在康德(1781/1787)中,但在当代文献中可以看到基本康德批评的变体(例如,参见乔姆斯基 1959;Fodor 和 Pylyshyn 1988;Fodor 2003 年;曼德尔鲍姆 2013a;有关康德论证的细节,请参阅康德的先验论证 )。
For a pure associationist, association is “semantically transparent” (see Fodor 2003), in that it purports to add no additional structure to thoughts. When a simple concept, X and a simple concept Y, become associated one acquires the associative structure X/Y. But X/Y has no additional structure on top of their contents. Knowing that X and Y are associated amounts to knowing a causal fact: that activating Xs will bring about the activation of Ys and vice versa. However, so the argument goes, some of our thoughts appear to have more structure than this: the thought birds fly predicates the property of flying onto birds. The task for the associationist is to explain how associative structures can distinguish a thinker who has a single (complex) thought birds fly from a thinker who conjoins two simple thoughts in an associative structure where one thought, birds, is immediately followed by another, fly. As long as the two simple thoughts are reliably causally correlated so that, for a thinker, activations of birds regularly brings about fly, then that thinker has the associative structure birds/fly. Yet it appears that thinker hasn’t yet had the thought birds fly. The problem of predication is explaining how a purely associative mechanism could eventuate in complex thoughts. In Fodor’s terms the problem boils down to how association, a causal relation among mental representations, can affect predication, a relation among intentional contents (Fodor 2003).
对于纯粹的联想主义者来说,联想是“语义上透明的”(参见 Fodor 2003),因为它声称没有为思想添加额外的结构。当一个简单概念 X 和一个简单概念 Y 成为关联时,一个人获得了联想结构 X / Y 。但是 X / Y 在其内容之上没有额外的结构。知道 X 和 Y 是相关的,就等于知道一个因果事实:激活 X s 将导致 Y s 的激活,反之亦然。然而,正如争论所说,我们的一些思想似乎比这更有结构性:鸟飞的思想预示着飞到鸟身上的特性。联想论者的任务是解释联想结构如何区分具有单一(复杂)思想鸟飞的思想家和将两个简单思想结合在一个联想结构中的思想者,其中一种思想 birds 紧随其后,另一种思想 fly 。只要这两个简单的思想在因果关系上是可靠的,以至于对于一个思考者来说,鸟的激活经常带来飞,那么这个思考者就有联想结构鸟/飞。然而,这位 thinker 似乎还没有让思想鸟飞起来。谓词的问题在于解释纯粹的联想机制如何最终导致复杂的思想。用 Fodor 的话来说,问题归结为关联,即心理表征之间的因果关系,如何影响谓词,即意向内容之间的关系(Fodor 2003)。
A family of related objections to associationism can be interpreted as variations on this theme. For example, problems of productivity, compositionality, and systematicity for associationist theorizing appear to be variants of the problem of predication (for more on these specific issues see the entries on the Language of Thought Hypothesis and on compositionality). If association doesn’t add any additional structure to the mental representations that get associated, then it is hard to see how it can explain the compositionality of thought, which relies on structures that specify relations among intentional contents. Compositionality requires that the meaning of a complex thought is determined by the meanings of its simple constituents along with their syntactic arrangements. The challenge to associationism is to explain how an associative mechanism can give rise to the syntactic structures necessary to distinguish a complex thought like birds fly from the temporal succession of two simple thoughts birds and fly. Since the compositionality of thought is posited to undergird the productivity of thought (thinkers’ abilities to think novel sentences of arbitrary lengths, e.g., green birds fly, giant green birds fly, cuddly giant green birds fly, etc.), associationism has problems explaining productivity.
一系列对结社主义的相关反对意见可以解释为这个主题的变体。例如,联想主义理论化的生产力、组合性和系统性问题似乎是预测问题的变体(有关这些具体问题的更多信息,请参阅关于思想语言假说和组合性的条目)。如果联想没有为联想的心理表征添加任何额外的结构,那么就很难看出它如何解释思想的构成性,它依赖于指定意向内容之间关系的结构。组合性要求一个复杂思想的意义由其简单组成部分的意义及其句法安排决定。联想主义的挑战是解释联想机制如何产生区分复杂思想(如鸟飞)与两个简单思想的时间连续(鸟和飞)所必需的句法结构。由于思想的组合性被认为是思想生产力的基础(思考者思考任意长度的新句子的能力,例如,绿鸟飞、巨型绿鸟飞、可爱的巨型绿鸟飞等),联想主义在解释生产力方面存在问题。
Systematicity is the thesis that there are predictable patterns among which thoughts a thinker is capable of entertaining. Thinkers that can entertain thoughts of certain structures can always entertain distinct thoughts that have related structure. For instance, any thinker who can think a complex thought of the form “X transitive verb Y” can think “Y transitive verb X”.[31] Systematicity entails that we won’t find any thinker that can only think one of those two thoughts, in which case we could not find a person who could think audrey wronged max, but not max wronged audrey. Of course, these two thoughts have very different effects in one’s cognitive economy. The challenge for the associationist is to explain how the associative structure audrey/wronged/max can be distinguished from the structure max/wronged/audrey, while capturing the differences in those thoughts’ effects.
系统性是这样一个论点,即存在可预测的模式,其中思想是能够娱乐的。能够接受某些结构的思想的思想家总是可以接受具有相关结构的不同思想。例如,任何能够思考“ X 及物动词 Y ”形式的复杂思想的思想者都可以思考“ Y 及物动词 X ”。[31]系统性意味着我们不会找到任何只能思考这两种想法之一的思想家,在这种情况下,我们找不到一个可以认为奥黛丽冤枉了麦克斯,但不能认为麦克斯冤枉了奥黛丽的人。当然,这两种想法在一个人的认知经济中具有非常不同的影响。联想论者面临的挑战是解释如何区分联想结构 audrey/wronged/max 与结构 max/wronged/audrey ,同时捕捉这些思想效果的差异。
Associationists have had different responses to the problem. Some have denied that human thought is actually compositional, productive, and systematic, and other non-associationists have agreed with this critique. For example, Prinz and Clark claim “concepts do not compose most of the time” (2002: 62), and Johnson (2004) argues that the systematicity criterion is wrongheaded (see Aydede 1997 for extended discussion of these issues). Rumelhart et al. offer a connectionist interpretation of “schemata”, one which is intended to cover some of the phenomenon mentioned in this section (Rumelhart et al. 1986). Others have worked to show that classical conditioning can indeed give rise to complex associative structures (Rescorla 1988). In defense of the associationist construal of complex associations Rescorla writes,
协会主义者对这个问题有不同的反应。有些人否认人类思想实际上是组合的、生产性的和系统的,而其他非关联论者也同意这种批评。例如,Prinz 和 Clark 声称“概念在大多数时候并不构成”(2002:62),而 Johnson (2004) 认为系统性标准是错误的(参见 Aydede 1997 对这些问题的扩展讨论)。Rumelhart et al. 对 “schemata” 提供了一种连接主义的解释,旨在涵盖本节中提到的一些现象 (Rumelhart et al. 1986)。其他人则努力证明经典条件反射确实可以产生复杂的联想结构(Rescorla 1988)。为了捍卫复杂联想的联想主义解释,Rescorla 写道:
Clearly, the animals had not simply coded the RH [complex] compound in terms of parallel associations with its elements. Rather they had engaged in some more hierarchical structuring of the situation, forming a representation of the compound and using it as an associate. (Rescorla 1988: 156)
显然,这些动物并不是简单地根据与其元素的平行关联来编码 RH [复合物] 化合物。相反,他们对这种情况进行了一些更分层的结构,形成了一个化合物的代表,并将其用作伙伴。(Rescorla 1988:156)
Whether or not associationism has the theoretical tools to explain such complex compounds by itself is still debated (see, e.g., Fodor 2003; Mitchell 2009; Gallistel and King 2009; Quilty-Dunn and Mandelbaum 2019).
联想主义是否具有解释这种复杂化合物本身的理论工具仍然存在争议(参见,例如,Fodor 2003;米切尔 2009 年;Gallistel 和 King 2009;Quilty-Dunn 和 Mandelbaum 2019 年)。
9.3 Word Learning 9.3 单词学习
Multiple issues in the acquisition of the lexicon appear to cause problems for associationism. Some of the most well known examples are reviewed below (for further discussion of word learning and associationism see Bloom 2000).
词汇获取中的多个问题似乎会导致联想主义出现问题。下面回顾了一些最著名的例子(有关单词学习和联想的进一步讨论,请参见 Bloom 2000)。
9.3.1 Fast Mapping 9.3.1 快速映射
Children learn words at an incredible rate, acquiring around 6,000 words by age 6 (Carey 2010: 184). If gradual learning is the rule, then words too should be learned gradually across this time. However, this does not appear to be the case. Susan Carey discovered the phenomenon of “fast mapping”, which is one-shot learning of a word (Carey 1978a, 1978b; Carey and Bartlett 1978). Her most influential example investigated children’s acquisition of “chromium” (a color word referring to olive green). Children were shown one of two otherwise identical objects, which only differed in color and asked, “Can you get me the chromium tray, not the red one, the chromium one” (recited in Carey 2010: 2). All of the children handed over the correct tray at that time. When the children were later tested in differing contexts, more than half remembered the referent of “chromium”. These findings have been extended—for example, Markson and Bloom (1997) showed that they are not specific to the remembering of novel words, but also hold for novel facts.
孩子们以惊人的速度学习单词,到 6,000 岁时获得了大约 6 个单词(Carey 2010:184)。如果循序渐进是规则,那么单词也应该在这段时间内逐渐学习。然而,情况似乎并非如此。Susan Carey 发现了“快速映射”现象,即一次性学习单词(Carey 1978a, 1978b;Carey 和 Bartlett 1978 年)。她最有影响力的例子调查了儿童对 “铬”(一个指橄榄绿的颜色词)的习得。孩子们被展示两个原本相同的物体中的一个,它们只是颜色不同,并问道,“你能给我拿铬托盘吗,不是红色的,铬的”(在 Carey 2010:2 中朗诵)。当时所有的孩子都交出了正确的托盘。当孩子们后来在不同的环境中接受测试时,超过一半的人记得 “铬 ”的指代。这些发现已经得到扩展——例如,Markson 和 Bloom (1997) 表明它们并不特定于新词的记忆,也适用于新奇的事实。
Fast mapping poses two problems for associationism. The first is that the learning of a new word did not develop slowly, as would be predicted by proponents of gradual learning. The second is that in order for the word learning to proceed, the mind must have been aided by additional principles not given by the environment. Some of these principles such as Markman’s (1989) taxonomic, whole object, and mutual exclusivity constraints, and Gleitman’s syntactic bootstrapping (Gleitman et al. 2005), imply that the mind does add structure to what is learned. Consequently, the associationist claim that learning is just mapping external contingencies without adding structure is imperiled.
快速映射给关联主义带来了两个问题。首先,一个新词的学习并不像渐进式学习的支持者所预测的那样发展缓慢。第二个是,为了让学习这个词继续进行,大脑必须得到环境没有给出的额外原则的帮助。其中一些原则,如 Markman (1989) 的分类学、整体对象和互斥性约束,以及 Gleitman 的句法引导(Gleitman et al. 2005),意味着大脑确实为所学内容添加了结构。因此,联想主义者声称学习只是绘制外部偶发事件而不增加结构是危险的。
9.3.2 Syntactic Category Learning9.3.2 句法类别学习
“Motherese”, the name of the type of language that infants generally hear, consists of simple sentences such as “Nora want a bottle?” and “Are you tired?”. These sentences almost always contain a noun and a verb. Yet, the infant’s vocabulary massively over-represents nouns in the first 100 words or so, while massively under-representing the verbs (never mind adjectives or adverbs, which almost never appear in the first 100 words infants produce; see, e.g., Goldin-Meadow, Seligman, and Gelman 1976). Even more surprising is that the over-representation of nouns to verbs holds even though the incidence of each word (that is, the token frequency) is higher for the verbs than for the nouns in the common set used by mothers. (Snedeker and Gleitman 2004: 259, citing data from Sandhoffer, Smith, and Luo 2000)
“Motherese”是婴儿通常听到的一种语言类型的名称,由简单的句子组成,例如“Nora want a bottle?”和“Are you tired?”。这些句子几乎总是包含一个名词和一个动词。然而,婴儿的词汇在前 100 个单词左右严重地过度代表了名词,而大大地低估了动词(更不用说形容词或副词了,它们几乎从未出现在婴儿产生的前 100 个单词中;参见,例如,Goldin-Meadow、Seligman 和 Gelman 1976)。更令人惊讶的是,名词对动词的过度表示仍然存在,即使动词的每个单词的出现率(即标记频率)高于母亲使用的通用集中的名词。(Snedeker 和 Gleitman 2004:259,引用了 Sandhoffer、Smith 和 Luo 2000 的数据)
Moreover, children hear a preponderance of determiners (“the” and “a”) but don’t produce them (Bloom 2000). These facts are not specific to English, but hold cross-culturally (see, e.g., Caselli et al. 1995). The disparity between the variation of the syntactic categories infants receive as input and produce as output is troublesome to associationism, insofar as associationism is committed to the learned structures (and the behaviors that follow from them) merely patterning what is given in experience.
此外,孩子们听到了占主导地位的限定词(“the”和“a”),但没有产生它们(Bloom 2000)。这些事实并非英语所特有,而是跨文化的(参见 Caselli et al. 1995)。婴儿作为输入和作为输出接受的句法类别的变化之间的差异对联想主义来说是很麻烦的,因为联想主义致力于学习结构(以及由此产生的行为)仅仅模式化了经验中给出的内容。
9.4 Against the Contiguity Analysis of Associationism9.4 反对关联主义的连续分析
Contiguity has been a central part of associationist analyses since the British Empiricists. In the experimental literature, the problem of figuring out the parameters needed for acquiring an association due to the contiguity of its relata has sometimes been termed the problem of the “Window of Association” (e.g., Gallistel and King 2009). Every associationist theory has to specify what temporal window two properties must instantiate in order for those properties to be associated.[32] A related problem for contiguity theorists is that if the domain generality of associative learning is desired, then the window needs to be homogenous across content domains. The late 1960s saw persuasive attacks on domain generality, as well as the necessity and sufficiency of the contiguity criterion in general.
自英国经验主义者以来,连续性一直是关联主义分析的核心部分。在实验文献中,由于其相对关系的连续性而弄清楚获得关联所需的参数的问题有时被称为“关联之窗”问题(例如,Gallistel 和 King 2009)。每个关联主义理论都必须指定两个属性必须实例化什么时间窗口才能关联这些属性。[32]对于连续理论家来说,一个相关的问题是,如果需要联想学习的域通用性,那么窗口需要在内容域之间是同质的。1960 年代后期,出现了对域通用性以及一般连续性标准的必要性和充分性的有说服力的攻击。
9.4.1 Against the Necessity of Contiguity9.4.1 反对连续的必要性
Research on “taste aversions” and “bait-shyness” provided a variety of problems with contiguity in the associative learning tradition of classical conditioning. Garcia observed that a gustatory stimulus (e.g., drinking water or eating a hot dog) but not an audiovisual stimulus (a light and a sound) would naturally become associated with feeling nauseated. For instance, Garcia and Koelling (1966) paired an audiovisual stimulus (a light and a sound) with a gustatory stimulus (flavored water). The two stimuli were then paired with the rats receiving radiation, which made the rats feel nauseated. The rats associated the feeling of nausea with the water and not with the sound, even though the sound was contiguous with the water. Moreover, the delay between ingesting the gustatory stimulus and feeling nauseated could be quite long, with the feeling not coming on until 12 hours later (Roll and Smith 1972), and the organism needn’t even be conscious when the negative feeling arises. (For a review, see Seligman 1970; Garcia et al. 1974). The temporal delay shows that the CS (the flavored water) needn’t be contiguous with the US (the feeling of nausea) in order for learning to occur, thus showing that contiguity isn’t necessary for associative learning.
对 “味觉厌恶 ”和 “诱饵害羞 ”的研究在经典条件反射的联想学习传统中提供了各种连续性问题。Garcia 观察到,味觉刺激(例如,喝水或吃热狗)而不是视听刺激(光和声音)自然会与恶心有关。例如,Garcia 和 Koelling (1966) 将视听刺激(光和声音)与味觉刺激(调味水)配对。然后将这两个刺激与接受辐射的大鼠配对,这让大鼠感到恶心。大鼠将恶心的感觉与水而不是声音联系起来,即使声音与水是相邻的。此外,摄入味觉刺激和感到恶心之间的延迟可能相当长,直到 12 小时后才出现这种感觉(Roll 和 Smith 1972),当负面情绪出现时,有机体甚至不需要有意识。(有关评论,请参阅 Seligman 1970;Garcia 等人,1974 年)。时间延迟表明 CS(调味水)不需要与 US(恶心的感觉)连续即可进行学习,从而表明连续性对于联想学习不是必需的。
Garcia’s work also laid bare the problems with the domain general aspect of associationism. In the above study the rat was prepared to associate the nausea with the gustatory stimulus, but would not associate it with the audiovisual stimulus. However, if one changes the US from feeling nauseated to receiving shocks in perfect contiguity with the audiovisual and gustatory stimuli, then the rats will associate the shocks with the audiovisual stimulus but not with the gustatory stimulus. That is, rats are prepared to associate audiovisual stimuli with the shock but are contraprepared to associate the shocks with the gustatory stimulus. Thus, learning does not seem to be entirely domain general (for similar content specificity effects in humans, see Baeyens et al. 1990).[33]
Garcia 的工作还揭示了关联主义的领域一般方面的问题。在上述研究中,大鼠准备将恶心与味觉刺激联系起来,但不会将其与视听刺激联系起来。然而,如果一个人将美国从感到恶心转变为与视听和味觉刺激完全连续地接受电击,那么大鼠会将电击与视听刺激联系起来,而不是与味觉刺激联系起来。也就是说,大鼠准备将视听刺激与味觉刺激联系起来,但准备将电击与味觉刺激联系起来。因此,学习似乎并不完全是领域通用的(有关人类中类似的内容特异性效应,参见 Baeyens 等人,1990 年)。[33]。
Lastly, “The Garcia effect” has also been used to show problems in the learning curve (see section 9.1). “Taste aversions” are the phenomena whereby an organism gets sick from ingesting the stimulus and the taste (or odor, Garcia et al. 1974) of that stimulus gets associated with the feeling of sickness. As anyone who has had food poisoning can attest, this learning can proceed in a one-shot fashion, and needn’t have a gradual rise over many trials (taste aversions have also been observed in humans, see, e.g., Bernstein and Webster 1980; Bernsetin 1985; Logue et al. 1981; Rozin 1986).
最后,“加西亚效应” 也被用来显示学习曲线中的问题(参见 9.1 节 )。“味觉厌恶”是指有机体因摄入刺激而生病的现象,并且该刺激的味道(或气味,Garcia 等人,1974 年)与生病的感觉相关联。正如任何经历过食物中毒的人都可以证明的那样,这种学习可以一次性进行,并且不需要在许多试验中逐渐增加(在人类中也观察到了味觉厌恶,参见 Bernstein 和 Webster 1980;Bernsetin 1985 年;Logue 等人,1981 年;Rozin 1986 年)。
9.4.2. Against the Sufficiency of Contiguity9.4.2. 反对连续性的充分性
Kamin’s famous blocking experiments (1969) showed that not all contiguous structures lead to classical conditioning. A rat that has already learned that CS1 predicts a US, will not learn that a subsequent CS2 predicts the US, if the CS2 is always paired with the CS1. Suppose that a rat has learned that a light predicts a shock because of the constant contiguity of the light and shock. After learning this, the rat has a sound introduced which only arises in conjunction with the light and the shock. As long as the rat had previously learned that the light predicts the shock, it will not learn that the sound does (as can be seen on later trials that have the sound alone). In sum, having learned that the CS1 predicts the US blocks the organism from learning that the CS2 predicts the US.[34] So even though CS2 is perfectly contiguous with the US, the association between CS2 and the US remains unlearned, thus serving as a counterexample to sufficiency of contiguity.[35]
Kamin 著名的阻塞实验 (1969) 表明,并非所有连续结构都会导致经典条件反射。如果 CS2 总是与 CS1 配对,那么已经知道 CS1 预测美国的老鼠不会知道后续的 CS2 预测美国。假设一只老鼠已经学会了光可以预测电击,因为光和电波的恒定连续性。在了解了这一点之后,老鼠会引入一种声音,这种声音只会与光和电击一起出现。只要老鼠之前已经知道光可以预测电击,它就不会知道声音可以预测电击(这在后来单独有声音的试验中可以看出)。总之,在了解了 CS1 预测美国后,阻止了生物体学习 CS2 预测美国。[34]因此,即使 CS2 与美国完全相邻,CS2 和美国之间的关联仍然未被了解,因此成为相邻性充分性的反例。[35]。
Similarly Rescorla (1968) demonstrated that a CS can appear only when the US appears and yet still have the association between them be unlearnable. If a tone is arranged to bellow only when there are shocks, but there are still shocks when there are no tones (that is, the CS only appears with the US, but the US sometimes appears without the CS), no associative learning between the CS and the US will occur. Instead, subjects (in Rescorla 1968, rats) will only learn a connection between the shock and the experimental situation—e.g., the room in which the experiment is carried out.同样,Rescorla (1968)
证明,只有当 US 出现时,CS 才能出现,但它们之间的关联仍然是不可学习的。如果将声调安排为仅在有冲击时发出吝吟,但在没有声调时仍然存在冲击(即,CS 仅与 US 一起出现,但 US 有时在没有 CS 的情况下出现),则 CS 和 US 之间不会发生关联学习。相反,受试者(在 Rescorla 1968 中,大鼠)只会了解电击与实验情况之间的联系——例如,进行实验的房间。
In large part because of the problems discussed in 9.4, many classical conditioning theorists gave up the traditional program. Some, like Garcia, appeared to give up the classical theoretical framework altogether (Garcia et al. 1974), others, such as Rescorla and Wagner, tried to usher the framework into the modern era (see, Rescorla and Wagner 1972; Rescorla 1988), where conditioning is seen as sensitive to base rates and driven by informational pick-up.[36] Whether this movement is interpreted as a substantive revision of classical conditioning (Rescorla 1988; Heyes 2012) or a wholesale abandoning of it (Gallistel and King 2009) is debatable.
很大程度上是因为 9.4 中讨论的问题,许多经典条件反射理论家放弃了传统的程序。有些人,如 Garcia,似乎完全放弃了古典理论框架(Garcia et al. 1974),其他人,如 Rescorla 和 Wagner,试图将框架引入现代(参见 Rescorla 和 Wagner 1972;Rescorla 1988),其中条件反射被认为对基准速率敏感,并由信息拾取驱动。[36]这一运动是否被解释为对经典条件反射的实质性修订(Rescorla 1988;Heyes 2012 年)或全面放弃它(Gallistel 和 King 2009 年)是值得商榷的。
9.5 Coextensionality 9.5 共延性
The Rescorla experiment also demonstrates another problem in associative theorizing: the question of why some property is singled out as a CS as opposed to different, equally contemporaneously instantiated properties. Put a different way, one needs a principle to say what the “same situation” amounts to in generalizations such as Thorndike’s laws. For instance, if a CS and a US, say a tone and a shock, are perfectly paired so that they are either both present or both absent, the organism won’t associate the location it received shocks (e.g., the experimental setting) with getting shocked, it will just associate the tone with the shocks. But in the condition where the US occurs without the CS, but the CS does not occur without the US, the organism will gain an association between the shocks and the location. However, in both cases the location is present on every trial.[37] In contrast to shocks, x-ray radiation, when used as a US, never appears to become associated with location, even if they are always perfectly paired (Garcia et al. 1972).[38]Rescorla
实验还证明了关联理论化中的另一个问题:为什么某些属性被挑选出来作为 CS 而不是不同的、同等同时期实例化的属性。换句话说,人们需要一个原则来说明“相同情况”在桑代克定律等概括中相当于什么。例如,如果 CS 和 US,比如语气和电击,完美配对,以至于它们要么都存在,要么都不存在,那么生物体不会将它受到电击的位置(例如,实验环境)与受到电击联系起来,它只会将音调与电击联系起来。但是,在 US 在没有 CS 的情况下发生,但 CS 在没有 US 的情况下不会发生的情况下,生物体将在冲击和位置之间建立联系。但是,在这两种情况下,每个试验中都会显示该位置。[37]与冲击相反,当 X 射线辐射用作 US 时,即使它们总是完美配对,它们似乎也永远不会与位置相关联(Garcia 等人,1972 年)。[38]。
The problem of saying which properties become associated when multiple properties are coinstantiated sometimes goes by the name the “Credit Assignment Problem” (see, e.g., Gallistel and King 2009).[39] Some would argue that this problem is a symptom of a larger issue: trying to use extensional criteria to specify intentional content (see, e.g., Fodor 2003). Associationists need a criterion to specify which of the coextensive properties will in fact be learned, and which not.
当多个属性共同实例化时,说哪些属性变得关联的问题有时被称为“信用分配问题”(参见,例如 Gallistel 和 King 2009)。[39]有些人会争辩说,这个问题是一个更大问题的症状:试图使用外延标准来指定意向性内容(参见,例如,Fodor 2003)。关联论者需要一个标准来指定哪些 coextensive 属性实际上将被学习,哪些不会。
An additional worry stems from the observation that sometimes the lack of a property being instantiated is an integral component of what is learned. To deal with the problem of missing properties, contemporary associationists have introduced an important element to the theory: inhibition. For example, if a US and a CS only appear when the other is absent, the organism will learn a negative relationship holds between them; that is, the organism will learn that the absence of the CS predicts the US.[40] Here the CS becomes a “conditioned inhibitor” of the US. Inhibition, using associations as modulators and not just activators, is a central part of current associationist thinking. For example, in connectionist networks, inhibition is implemented by the activation of certain nodes inhibiting the activation of other nodes. Connection weights can be positive or negative, with the negative weight standing in for the inhibitory strength of the association.
另一个担忧源于以下观察:有时缺少被实例化的属性是所学内容不可或缺的组成部分。为了解决属性缺失的问题,当代联想主义者为该理论引入了一个重要元素:抑制。例如,如果 US 和 CS 仅在另一个不存在时出现,则生物体将学习它们之间存在负向关系;也就是说,有机体将了解到 CS 的缺失可以预测美国。[40]在这里,CS 成为 US 的“条件性抑制剂”。抑制,使用关联作为调节剂而不仅仅是激活剂,是当前关联主义思维的核心部分。例如,在连接主义网络中,抑制是通过激活某些节点抑制其他节点的激活来实现的。连接权重可以是正的或负的,负权重代表关联的抑制强度。
Bibliography
- Anderson, J., K. Spoehr, and D. Bennett, 1994, “A Study in Numerical Perversity: Teaching Arithmetic to a Neural Network”, in Neural Networks for Knowledge Representation and Inference, D. Levine and M. Aparicio IV (eds.), East Sussex: Psychology Press, pp. 311–335.
- Armstrong, K., S. Kose, L. Williams, A. Woolard, and S. Heckers, 2012, “Impaired Associative Inference in Patients with Schizophrenia”, Schizophrenia Bulletin, 38(3): 622–629.
- Asch, S., 1962, “A Problem in the Theory of Associations”, Psychologische Beitrage, (6): 553–563.
- –––, 1969, “A Reformulation of the Problem of Association”, American Psychologist, 24(2): 92–102.
- Aydede, M., 1997, “Language of Thought: The Connectionist Contribution”, Minds and Machines, 7(1): 57–101.
- Baeyens, F., P. Eelen, O. Van den Bergh, and G. Crombez, 1990, “Flavor-Flavor and Color-Flavor Conditioning in Humans”, Learning and Motivation, 21(4): 434–455.
- Baeyens,F., P. Eelen, and G. Crombez, 1995, “Pavlovian Associations are Forever: On Classical Conditioning and Extinction”, Journal of Psychophysiology, 9(2): 127–141.
- Bar-Anan Y., B. Nosek, and M. Vianello, 2009, “The Sorting Paired Features Task: A Measure of Association Strengths”, Experimental Psychology, 56(5): 329–343.
- Bates, E. and B. MacWhinney, 1987, “Competition, Variation, and Language Learning”, in B. MacWhinney (ed.), Mechanisms of Language Acquisition, Hillsdale, N.J.: Lawrence Erlbaum Associates, pp. 157–193.
- Bendana, J. and E. Mandelbaum, forthcoming, “The Fragmentation of Belief”, in D. Kindermann, C. Borgoni, and A. Onofri (eds.), The Fragmented Mind, Oxford: Oxford University Press.
- Berger, J., 2020, “Implicit attitudes and awareness”, Synthese, 197(3): 1291–1312.
- Bernstein, I. and M. Webster, 1980, “Learned Taste Aversions in Humans”, Physiology and Behavior, 25(3): 363–366.
- Bernstein, I., 1985, “Learned Food Aversions in the Progression of Cancer and its Treatment”, in N. Braveman and P. Bronstein, (eds.), Experimental Assessments and Clinical Applications of Conditioned Food Aversions, New York: New York Academy of Sciences, pp. 365–80.
- Black, W. and W. Prokasy (eds.), 1972, Classical Conditioning II: Current Research and Theory, New York: Appleton-Century-Crofts.
- Bloom, P., 2000, How Children Learn the Meanings of Words, Cambridge, MA: MIT Press.
- Bouton, M., 2002, “Context, Ambiguity, and Unlearning: Sources of Relapse after Behavioral Extinction”, Biological Psychiatry, 52(10): 976–986.
- –––, 2004, “Context and Behavioral Processes in Extinction”, Learning and Memory, 11(5): 485–494.
- Brett, L., W. Hankins, and J. Garcia, 1976, “Prey-Lithium Aversions. III: Buteo hawks”, Behavioral Biology, 17(1): 87–98.
- Camp, L., 2007, “Thinking with Maps”, Philosophical Perspectives, 21(1): 145–182.
- Carey, S., 1978a, “Less May Never Mean More”, in R. Campbell and P. Smith, (eds.), Recent Advances in the Psychology of Language, New York: Plenum Press, p. 109–132.
- –––, 1978b, “The Child as Word Learner”, in J. Bresnan, G. Miller, and M. Halle, (eds.), Linguistic Theory and Psychological Reality, Cambridge, MA: MIT Press, pp. 264–293.
- –––, 2010, “Beyond Fast Mapping”, Language Learning and Development, 6(3): 184–205.
- Carey, S. and E. Bartlett, 1978, “Acquiring a Single New Word”, Proceedings of the Stanford Child Language Conference, 15: 17–29.
- Caselli, M.C., E. Bates, P. Casadio, J. Fenson, L. Fenson, L. Sanderl, and J. Weir, 1995, “A Cross-linguistic Study of Early Lexical Development”, Cognitive Development, 10(2): 159–199.
- Chaiken, S. and Y. Trope (eds.), 1999, Dual-Process Theories in Social Psychology, New York: Guilford Press.
- Chalmers, D., 1993, “Connectionism and Compositionality: Why Fodor and Pylyshyn Were Wrong”, Philosophical Psychology, 6(3): 305–319.
- Chater, N., 2009, “Rational Models of Conditioning”, Behavioral and Brain Sciences, 32(2): 204–205.
- –––, J. Tenenbaum, and A. Yuille, 2006, “Probabilistic Models of Cognition: Conceptual Foundations”, Trends in Cognitive Sciences, 10(7): 287–291.
- Chomsky, N., 1959, “A Review of B.F. Skinner’s Verbal Behavior”, Language, 35(1): 26–58.
- Churchland, P., 1986, “Some Reductive Strategies in Cognitive Neurobiology”, Mind, 95(379): 279–309.
- –––, 1989, A Neurocomputational Perspective: The Nature of Mind and the Structure of Science, Cambridge, MA: MIT.
- Churchland, P. and T. Sejnowski, 1990, “Neural Representation and Neural Computation”, Philosophical Perspectives, 4: 343–382.
- Collins, A. and E. Loftus, 1975, “A Spreading-Activation Theory of Semantic Processing”, Psychological Review, 82(6): 407–428.
- Danks D., 2013, “Moving from Levels and Reduction to Dimensions and Constraints”, Proceedings of the 35th Annual Conference of the Cognitive Science Society, 35: 2124–2129.
- De Houwer, J., 2009, “The Propositional Approach to Associative Learning as an Alternative for Association Formation Models”, Learning & Behavior, 37(1): 1–20.
- –––, 2011, “Evaluative Conditioning: A Review of Procedure Knowledge and Mental Process Theories”, in T. Schachtman and S. Reilly (eds.), Associative Learning and Conditioning Theory: Human and Non-Human Applications, New York: Oxford University Press, pp. 399–416.
- –––, 2014, “A Propositional of Implicit Evaluation”, Social and Personality Psychology Compass, 8(7): 342–353.
- –––, 2018, “Propositional Models of Evaluative Conditioning”, Social Psychological Bulletin, 13(2): 1–21.
- –––, 2019, “Moving Beyond System 1 and System 2: Conditioning, Implicit Evaluation, and Habitual Responding Might Be Mediated by Relational Knowledge”, Experiental Psychology, 66(4): 257–265.
- De Houwer, J., S. Thomas, and F. Baeyens, 2001, “Association Learning of Likes and Dislikes: A Review of 25 years of Research on Human Evaluative Conditioning”, Psychological Bulletin, 127(6): 853–869.
- Dehaene, S., 2011, The Number Sense: How the Mind Creates Mathematics, Oxford: Oxford University Press.
- Diaz, E., G. Ruis, and F. Baeyens, 2005, “Resistance to Extinction of Human Evaluative Conditioning Using a Between-Subjects Design”, Cognition and Emotion, 19(2): 245–268.
- Dickinson, A., D. Shanks, and J. Evenden, 1984, “Judgment of Act-Outcome Contingency: The role of Selective Attribution”, The Quarterly Journal of Experimental Psychology, 36(1): 29–50.
- Dirikx, T., D. Hermans, D. Vansteenwegen, F. Baeyens, and P. Eelen, 2004, “Reinstatement of Extinguished Conditioned Responses and Negative Stimulus Valence as a Pathway to Return of Fear in Humans”, Learning and Memory, 11: 549–54.
- Elman, J., 1991, “Distributed Representations, Simple Recurrent Networks, and Grammatical Structure”, Machine learning, 7(2–3): 195–225.
- Elman, J., E. Bates, M. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plunkett, 1996, Rethinking Innateness: A Connectionist Perspective on Development, Cambridge, MA: MIT Press.
- Evans, G., 1982, The Varieties of Reference, J. McDowell (ed.), Oxford: Clarendon Press.
- Evans, J., and K. Frankish (eds.), 2009, In Two Minds: Dual Processes and Beyond, Oxford: Oxford University Press.
- –––, and K. Stanovich, 2013, “Dual-Process Theories of Higher Cognition: Advancing the Debate,” Perspectives on Psychological Science, 8(3): 223–241.
- Fazio, R., 2007, “Attitudes as Object-Evaluation Associations of Varying Strength”, Social Cognition, 25(5): 603–637.
- Festinger, L. and J. Carlsmith, 1959, “Cognitive Consequences of Forced Compliance”, The Journal of Abnormal and Social Psychology, 58(2): 203–210.
- Field, A. and G. Davey, 1999, “Reevaluating Evaluative Conditioning: A Nonassociative Explanation of Conditioning Effects in the Visual Evaluative Conditioning Paradigm”, Journal of Experimental Psychology: Animal Behavior Processes, 25(2): 211–224.
- Fodor, J., 1983, The Modularity of Mind, Cambridge, MA: MIT Press.
- –––, 2001, The Mind Doesn’t Work that Way, Cambridge, MA: MIT Press.
- –––, 2003, Hume Variations, Oxford: Clarendon Press.
- Fodor, J., and B. McLaughlin, 1990, “Connectionism and the Problem of Systematicity: Why Smolensky’s Solution Doesn’t Work”, Cognition, 35(2): 183–204.
- Fodor, J., and Z. Pylyshyn, 1988, “Connectionism and Cognitive Architecture: A Critical Analysis”, Cognition, 28(1–2): 3–71.
- Frankish, K., 2009, “Systems and Levels: Dual-System Theories and the Personal-Subpersonal Distinction”, in Evans and Frankish 2009: pp.89–107.
- Gagliano, M., V. Vyazovsky, A. Borbely, M. Grimonprez, and M. Depczynski, 2016, “Learning by Association in Plants”, Scientific Reports, 6(38427): 1–8.
- Gallistel, C., S. Fairhurst, and P. Balsam, 2004, “The Learning Curve: Implications of a Quantitative Analysis”, Proceedings of the National Academy of Sciences of the United States of America, 101(36): 13124–13131.
- Gallistel, C., and A. King, 2009, Memory and the Computational Brain: Why Cognitive Science Will Transform Neuroscience, West Sussex: Wiley Blackwell.
- Garcia, J., 1981, “Tilting at the Paper Mills of Academe”, American Psychologist, 36(2): 149–158.
- Garcia, J., R. Kovner, and K. Green, 1970, “Cue Properties vs Palatability of Flavors in Avoidance Learning”, Psychonomic Science, 20(5): 313–314.
- Garcia, J., B. McGowan, and K. Green, 1972, “Biological Constraints on Conditioning II”, in Black and Prokasy 1972: pp.3–27.
- Garcia, J., W. Hankins, and K. Rusiniak, 1974, “Behavioral Regulation of the Milieu Interne in Man and Rat”, Science, 185(4154): 824–831.
- Garcia, J., R.A. Koelling, 1966, “Relationship of cue to consequence in avoidance learning”, Psychonomic Science, 4: 123–124.
- Gendler, T., 2008, “Alief and Belief”, Journal of Philosophy, 105(10): 634–63.
- Gleitman, L., K. Cassidy, R. Nappa, A. Papafragou, and J. Trueswell, 2005, “Hard Words”, Language Learning and Development, 1(1): 23–64.
- Glosser, G. and R. Freidman, 1991, “Lexical but not Semantic Priming in Alzheimer’s Disease”, Psychology and Aging, 6(4): 522–27.
- Goldin-Meadow, S., M. Seligman, and S. Gelman, 1976, “Language in the Two-Year Old”, Cognition, 4(2): 189–202.
- Greenwald, A., D. McGhee, and J. Schwartz, 1998, “Measuring Individual Differences in Implicit Cognition: The Implicit Association Test”, Journal of Personality and Social Psychology, 74(6): 1464–1480.
- Hahn, A., C. Judd, H. Hirsch, and I. Blair, 2014, “Awareness of Implicit Attitudes”, Journal of Experimental Psychology: General, 143(3): 1369–1392.
- Heyes, C., 2012, “Simple Minds: A Qualified Defence of Associative Learning”, Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1603): 2695–2703.
- Hughes, S., Y. Ye, P. Van Dessel, and J. De Houwer, 2019, “When people co occur with good or bad events: Graded effects of relational qualifiers on evaluative conditioning.”, Personality and Social Psychology Bulletin, 45(2): 196–208.
- Hull, C., 1943, Principles of Behavior, New York: Appleton-Century-Crofts.
- Hume, D., 1738, A Treatise of Human Nature, L.A. Selby-Bigge (ed.), 2nd ed., revised by P.H. Nidditch, Oxford: Clarendon Press, 1975.
- James, W., 1890, The Principles of Psychology (Vol. 1), New York: Holt.
- Johnson, K., 2004, “On the Systematicity of Language and Thought”, Journal of Philosophy, 101(3): 111–139.
- Kahneman, D., 2011, Thinking, Fast and Slow, New York: Farrar, Straus and Giroux.
- Kamin, L., 1969, “Predictability, Surprise, Attention, and Conditioning”, in B. Campbell and R. Church (eds.), Punishment and Aversive Behavior, New York: Appleton-Century-Crofts, pp. 279–296.
- Kant, I., 1781/1787, Critique of Pure Reason, in P. Guyer and A. Wood (eds.), Critique of Pure Reason, New York: Cambridge University Press.
- Karmiloff-Smith, A., 1995, Beyond Modularity: A Developmental Perspective on Cognitive Science, Cambridge, MA: MIT Press/Bradford Books.
- Kruglanski, A., 2013, “Only One? The Default Interventionist Perspective as a Unimodel—Commentary on Evans & Stanovich”, Perspectives on Psychological Science, 8(3): 242–247.
- Kurdi, B., and M. Banaji, 2017, “Repeated evaluative pairings and evaluative statements: How effectively do they shift implicit attitudes?”, Journal of Experimental Psychology: General, 146(2): 194–213.
- –––, 2019, “Attitude change via repeated evaluative pairings versus evaluative statements: Shared and unique features”, Journal of Personality and Social Psychology, 116(5): 681–703.
- Locke, J., 1690, An Essay Concerning Human Understanding, in Peter H. Nidditch (ed.), An Essay Concerning Human Understanding, Oxford: Clarendon Press, 1975,
- Logue, A., I. Ophir, and K. Strauss, 1981, “The Acquisition of Taste Aversion in Humans”, Behavioral Research and Therapy, 19(4): 319–33.
- Luka, B., and L. Barsalou, 2005, “Structural facilitation: Mere exposure effects for grammatical acceptability as evidence for syntactic priming in comprehension”, Journal of Memory and Language, 52: 444–467.
- Lycan, W, 1990, “The Continuity of the Levels of Nature”, in W. Lycan (ed.), Mind and Cognition: A Reader, Cambridge: Basil Blackwell, pp. 77–96.
- Madva, A., and M. Brownstein, 2018, “Stereotypes, Prejudice, and the Taxonomy of the Implicit Social Mind”, Nous, 52(3): 611–644.
- Mandelbaum, E., 2013a, “Against Alief”, Philosophical Studies, 165(1): 197–211.
- –––, 2013b, “Numerical Architecture”, Topics in Cognitive Science, 5(2): 367–386.
- –––, 2016, “Attitude, Inference, Association: On the Propositional Structure of Implicit Attitudes”, Nous, 50(3): 629–658.
- –––, 2017, “Seeing and Conceptualizing: Modularity and the Shallow Contents of Vision”, Philosophy and Phenomenological Research, 97(2): 267–283.
- –––, 2019, “Troubles with Bayesianism: An Introduction to the Psychological Immune System”, Mind & Language, 34(2): 141–157.
- Mann, T., and M. Ferguson, 2015, “Can we undo our first impressions? The role of reinterpretation in reversing implicit evaluations”, Journal of Social and Personality Psychology, 108(6): 823–849.
- –––, 2017, “Reversing implicit first impressions through reinterpretation after a two-day delay.”, Journal of Experimental Social Psychology, 68: 122–127.
- Mann, T., B. Kurdi, and M. Banaji, 2019, “ How effectively can implicit evaluations be updated? Using evaluative statements after aversive repeated evaluative pairings”, Journal of Experimental Psychology: General, doi: 10.1037/xge0000701.
- Markman, E., 1989, Categorization and Naming in Children: Problems of Induction, Cambridge, MA: MIT Press.
- Markson, L. and P. Bloom, 1997, “Evidence Against a Dedicated System for Word Learning in Children”, Nature, 385(6619): 813–815.
- Marr, D., 1982, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, NY: W.H. Freeman and Co.
- Mason, M. and M. Bar, 2012, “The Effect of Mental Progression on Mood”, Journal of Experimental Psychology: General, 141(2): 217–221. doi:10.1037/a0025035
- McClelland, J., M. Botvinick, D. Noelle, D. Plaut, T. Rogers, M. Seidenberg, and L. Smith, 2010, “Letting Structure Emerge: Connectionist and Dynamic Systems Approaches to Cognition”, Trends in Cognitive Sciences, 14(8): 348–356.
- Minsky, M., 1963, “Steps toward Artificial Intelligence”, in E. Feigenbaum and J. Feldman (eds.), Computers And Thought, New York, NY: McGraw-Hill, pp. 406–450.
- Mitchell, C., J. De Houwer, and P. Lovibond, 2009, “The Propositional Nature of Human Associative Learning”, Behavioral and Brain Sciences, 32(2): 183–246.
- Nosek, B. and M. Banaji, 2001, “The Go/No-Go Association Task”, Social Cognition, 19(6): 625–66.
- Osman, M., 2013, “A Case Study Dual-Process Theories of Higher Cognition—Commentary on Evans & Stanovich”, Perspectives on Psychological Science, 8(3): 248–252.
- Pavlov, I., 1906, “The Scientific Investigation of the Psychical Faculties or Processes in the Higher Animals”, Science, 24(620): 613–619.
- –––, 1927, Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex, Oxford: Oxford University Press.
- Payne, B., Cheng, C., Govorun, O., and Stewart, B., 2005, “An Inkblot for Attitudes: Affect Misattribution as Implicit Measurement”, Journal of Personality and Social Psychology, 89(3): 277–293.
- Perea, M. and E. Rosa, 2002, “The Effects of Associative and Semantic Priming in the Lexical Decision Task”, Psychological Research, 66(3): 180–194.
- Prinz, J., 2002, Furnishing the Mind: Concepts and their Perceptual Basis, Cambridge, MA: MIT Press.
- ––– and A. Clark, 2004, “Putting Concepts to Work: Some Thoughts for the 21st Century”, Mind & Language, 19(1): 57–69.
- Quilty-Dunn, J. forthcoming, “Perceptual Pluralism”, Nous, 1–41.
- Quilty-Dunn, J. and E. Mandelbaum, 2018, “Inferential Transitions”, Australasian Journal of Philosophy, 96(3): 532–547.
- –––, 2019, “Non-Inferential Transitions: Imagery and Association”, in T. Chan and A. Nes (eds.),Inference and Consciousness, New York: Routledge, pp. 151–171.
- Rescorla, R., 1968, “Probability of Shock in the Presence and Absence of CS in Fear Conditioning”, Journal of Comparative and Physiological Psychology, 66(1): 1–5.
- –––, 1988, “Pavlovian Conditioning: It’s Not What You Think It Is”, American Psychologist, 43(3): 151–160.
- Rescorla, E., and A. Wagner, 1972, “A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Nonreinforcement”, in Black and Prokasy 1972, pp. 64–99.
- Roll, D. and J. Smith, 1972, “Conditioned Taste Aversion in Anesthetized Rats”, in M. Hager and J. Seligman (eds.), Biological Boundaries of Learning. New York: Appleton-Century-Crofts, pp. 98–102.
- Rozin, P., 1986, “One-Trial Acquired Likes and Dislikes in Humans: Disgust as a US, Food Predominance, and Negative Learning Predominance”, Learning and Motivation, 17(2): 180–189.
- Rumelhart, D., P. Smolensky, J. McClelland, and G. Hinton, 1986, “Sequential Thought Processes in PDP Models”, in J.McClelland and D. Rumelhart (eds.), Parallel Distributed Processing Vol. 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models, Cambridge, MA: MIT Press, pp. 7–57.
- Rusiniak, K., W. Hankins, J. Garcia, and L. Brett, 1979, “Flavor-illness Aversions: Potentiation of Odor by Taste in Rats”, Behavioral and Neural Biology, 25(1): 1–17.
- Rydell, R. and A. McConnell, 2006, “Understanding Implicit and Explicit Attitude Change: A Systems of Reasoning Analysis”, Journal of Personality and Social Psychology, 91(6): 995–1008.
- Sandhoffer, C., L. Smith, and J. Luo, 2000, “Counting Nouns and Verbs in the Input: Differential Frequencies, Different Kinds of Learning?”, Journal of Child Language, 27(3): 561–585.
- Seligman, M., 1970, “On the Generality of the Laws of Learning”, Psychological Review, 77(5): 406–418.
- Shanks, D., 2010, “Learning: From Association to Cognition”, Annual Review of Psychology, 1, 273–301.
- Skinner, B., 1938, The Behavior of Organisms: An Experimental Analysis, Oxford: Appleton-Century.
- –––, 1953, Science and Human Behavior, New York: Simon and Schuster.
- Sloman, S., 1996, “The Empirical Case for Two Systems of Reasoning”, Psychological Bulletin, 119(1): 3–22.
- Smith, E. R. and J. DeCoster, 2000, “Dual-Process Models in Social and Cognitive Psychology: Conceptual Integration and Links to Underlying Memory Systems”, Personality and Social Psychology Review, 4(2): 108–131.
- Smith, J. and D. Roll, 1967, “Trace Conditioning with X-rays as an Aversive Stimulus”, Psychonomic Science, 9(1): 11–12.
- Smolensky, P., 1988, “On the Proper Treatment of Connectionism”, Behavioral and Bruin Sciences, 11(1): l–23.
- Snedeker, J. and L. Gleitman, 2004, “Why it is Hard to Label Our Concepts”, in D. Hall and S. Waxman (eds.), Weaving a Lexicon, Cambridge, MA: MIT Press, pp. 257–294.
- Stanovich, K., 2011, Rationality and the Reflective Mind, New York: Oxford University Press.
- Tenenbaum, J., C. Kemp, T. Griffiths, and N. Goodman, 2011, “How to Grow a Mind: Statistics, Structure, and Abstraction”, Science, 331(6022): 1279–1285.
- Thorndike, E., 1911, Animal intelligence: Experimental studies, New York: Macmillan.
- Todrank, J., D. Byrnes, A. Wrzesniewski, and P. Rozin, 1995, “Odors can Change Preferences for People in Photographs: A Cross-Modal Evaluative Conditioning Study with Olfactory USs and Visual CSs”, Learning and Motivation, 26(2): 116–140.
- Tolman, E., 1948, “Cognitive Maps in Rats and Men”, Psychological Review, 55(4): 189–208.
- Van Dessel, P., Y. Ye, and J. De Houwer 2019, “Chaning deep-rooted implicit evaluation in the blink of an eye: engative verbal information shifts automatic liking of Gandhi”, Social Psychological and Personality Science, 10(2): 266–273.
- Van Gelder, T., 1995, “What Might Cognition Be, If not Computation?”, The Journal of Philosophy, 91(7): 345–381.
- Vansteenwegen, D., G. Francken, B. Vervliet, A. De Clercq, and P. Eelen, 2006, “Resistance to Extinction in Evaluative Conditioning”, Journal of Experimental Psychology: Animal Behavior Processes, 32(1): 71–79.
- Wilson, T., S. Lindsey, and T. Schooler, 2000, “A Model of Dual Attitudes”, Psychological Review, 107(1): 101–26. [Wilson, Lindsey, and Schooler 2000 available online]
via:
-
Connectionism
-
Associationist Theories of Thought