经典论文 | 洛特卡定律:科学生产率的频率分布

科学生产率的频率分布

The frequency distribution of scientific productivity(科学生产力的频率分布)Alfred J. Lotka于1926年发表的一篇经典论文,该论文首次揭示了科学家生产力分布的不均衡现象,并提出著名的“洛特卡定律”并给出了定量描述。

在论文中,洛特卡通过分析《化学文摘》和阿尔巴赫所著的《物理学历史一览表》,发现了在给定时间段内,只有少数科学家(频率分布较高)会产生大量的论文或成果,而大部分科学家(频率分布较低)只会产生少量的论文或成果。且这种分布模式呈现出幂律的形式:科学家的生产力与其排名的幂律指数呈负相关关系,即排名越靠前的科学家,其生产力越高。

具体而言,洛特卡定律表明,第二名科学家的生产力大约是第一名科学家的 1 4 \dfrac{1}{4} 41,第三名科学家的生产力大约是第一名科学家的 1 9 \dfrac{1}{9} 91,以此类推。

该论文的主要贡献在于揭示了科学家生产力分布的不均衡现象,并提出了定量的数学模型来描述这种分布规律,对于理解科学家的科研活动、科学团队的组成和科学产出的分布具有重要意义,成为科学计量学和科学研究评价的基础之一。

  • 来源:Journal of the Washington Academy of Sciences, June 19, 1926, Vol. 16, No. 12 (June 19, 1926), pp. 317-323
  • 原文链接:https://www.jstor.org/stable/24529203

正文

It would be of interest to determine, if possible, the part which men of different calibre contribute to the progress of science.

确定不同才干的人对科学进步做出的贡献部分,如果可能的话,这将是有趣的。

Considering first simple volume of production, a count was made of the number of names, in the decennial index of Chemical Abstracts 1907-1916, against which appeared 1, 2, 3 . . . . entries. Names of firms (e.g. Aktiengesellschaft, etc.) were omitted from reckoning, since they represent the output, not of a single individual, but of an unknown number of persons. The letters A and B of the alphabet only were covered. These were treated both separately and in the aggregate, with the results shown in the table and in figures 1 and 2 below.

首先简单考虑期刊的发行数量,对1907至1916年《化学文摘》索引中拥有1,2,3…篇文献的作者姓名的数量进行统计。由于公司名称(例如,Aktiengesellschaft等)代表的是不确定人数的多人产出,因此将其排除在计算之外。仅考虑字母表中的A和B。这些数据既单独处理,又进行了总体统计,结果显示在下方的表格和图1、图2中。

A similar process was also applied to the name index of Auerbach’s Geschichtstafeln der Physik (J. A. Barth, Leipzig, 1910) which cover the entire range of history up to and including the year 1900. In this case we obtain a measure not merely of volume of productivity, but account is taken, in some degree, also of quality, since only the out standing contributions find a place in this little volume, with its 110 pages of tabular text. The figures and relations thus obtained are shown in the table and in figures 1 and 2.

同样的过程也应用于阿尔巴赫编写的《物理学历史一览表》(J. A. Barth, Leipzig, 1910)的姓名索引中,该索引覆盖了从物理学开始一直到1900年的整个发展历。在这种情况下,我们获得的不仅仅是产出量的测度,在某种程度上也考虑了质量,因为只有在物理学中做出杰出的贡献的人物才会出现在这本只有110页小册子当中。因此,通过这些数据和关系,我们得到了表1、图1和图2中所显示的结果。

表1.科学生产率的频率分布
作者数量n 发表指定数目论文的人数
论文数量
百分比
化学文摘 阿尔巴赫的字母表 化学文摘 阿尔巴赫的字母表
字母A 字母B A+B 观测值 计算值[1] 观测值 计算值[2]
A B A+B A+B 全部字母
总计 1543 5348 6891 1325
1 890 3,101 3,991 784 57.68 57.98 57.92 56.69 59.17 60.79
2 230 829 1,059 204 14.91 15.5 15.37 15.32 15.4 15.2
3 111 382 493 127 7.19 7.14 7.15 7.12 9.58 6.75
4 58 229 287 50 3.76 4.28 4.16 4.14 3.77 3.8
5 41 143 184 33 2.66 2.67 2.67 2.72 2.49 2.43
6 42 89 131 28 2.72 1.66 1.9 1.92 2.11 1.69
7 20 93 113 19 1.3 1.74 1.64 1.44 1.43 1.24
8 24 61 85 19 1.56 1.14 1.23 1.12 1.43 0.95
9 21 43 64 6 1.36 0.8 0.93 0.9 0.45 0.75
10 15 50 65 7 0.97 0.93 0.94 0.73 0.53 0.61
11 9 32 41 6 0.58 0.6 0.59 0.61 0.45 0.5
12 11 36 47 7 0.71 0.67 0.68 0.52 0.53 0.42
13 6 26 32 4 0.39 0.49 0.46 0.45 0.3 0.36
14 7 21 28 4 0.45 0.39 0.41 0.39 0.3 0.31
15 3 18 21 5 0.19 0.34 0.3 0.34 0.38 0.27
16 4 20 24 3 0.26 0.37 0.35 0.3 0.23 0.24
17 4 14 18 3 0.26 0.26 0.26 0.27 0.23 0.21
18 5 14 19 1 0.32 0.26 0.28 0.24
19 3 14 17 0 0.19 0.26 0.25 0.22
20 6 8 14 0 0.39 0.15 0.2 0.2
21 0 9 9 1 0.17 0.13 0.18
22 2 9 11 3 0.13 0.17 0.16 0.17
23 4 4 8 0 0.26 0.07 0.12 0.15
24 4 4 8 3 0.26 0.07 0.12 0.14
25 0 9 9 2 0.17 0.13 0.13
26 3 6 9 0 0.19 0.11 0.13 0.12
27 1 7 8 1 0.06 0.13 0.12 0.11
28 2 8 10 0 0.13 0.15 0.15 0.11
29 2 6 8 0 0.13 0.11 0.12 0.1
30 2 5 7 1 0.13 0.09 0.1 0.09
31 0 3 3 0 0.06 0.04
32 0 3 3 0 0.06 0.04
33 3 3 6 0 0.19 0.06 0.09
34 1 3 4 1 0.06 0.06 0.06
35 0 0 0 0
36 0 1 1 0 0.02 0.01
37 0 1 1 1 0.02 0.01
38 1 3 4 0 0.06 0.06 0.06
39 0 3 3 0 0.06 0.04
40 1 1 2 0 0 0.02 0.03
41 0 1 1 0 0.02 0.01
42 0 2 2 0 0.04 0.03
43 0 0 0 0
44 0 3 3 0 0.06 0.04
45 0 4 4 0 0.07 0.06
46 1 1 2 0 0.06 0.02 0.03
47 0 3 3 0 0.06 0.04
48 0 0 0 2
49 0 1 1 0.02 0.01
50 1 1 2 0.06 0.02 0.03
51 0 1 1 0.02 0.01
52 0 2 2 0.04 0.03
53 0 2 2 0.04 0.03
54 0 2 2 0.04 0.03
55 2 1 3 0.13 0.02 0.04
56 0 0 0
57 0 1 1 0.02 0.01
58 0 1 1 0.02 0.01
59-60 0 0 0
61 0 2 2 0.04 0.03
62-65 0 0 0
66 0 1 1 0.02 0.01
67 0 0 0
68 0 2 2 0.04 0.03
69-72 0 0 0
73 0 1 1 0.02 0.01
74-77 0 0 0
78 0 1 1 0.02 0.01
79 0 0 0
80 1 0 1 0.06 0.01
81-83 0 0 0
84 0 1 1 0.02 0.01
85-94 0 0 0
95 0 1 1 0.02 0.01
96-106 0 0 0
107 1 0 1 0.06 0.01
108 0 0 0
109 0 1 1 0.02 0.01
110-113 0 0 0
114 0 1 1 0.02 0.01
115-345 0 0 0
346 1 0 1 0.06 0.01

[1]: 根据 f = 56.9 n 1 ∗ 888 得到 f= \dfrac{56.9}{n^{1*888}}得到 f=n188856.9得到

[2]: 根据 f = 600 π 2 n 2 得到 f = \dfrac{600}{\pi^2 n^2}得到 f=π2n2600得到

On plotting the frequencies of persons having made 1, 2, 3 . . . . contributions, against these numbers 1, 2, 3 . . . . of contributions, both variables on a logarithmic scale, it is found that in each case the points are rather closely scattered about an essentially straight line having a slope of approximately two to one. The approach to this ratio is particularly close in the case of the data taken from Auerbach’s tables. Determined by least squares, the slope of the curve to Auerbach’s data, as determined from the first 17 17 17 points1, was found to be 2.021 ± 0.017 2.021 ± 0.017 2.021±0.017. Similarly, the slope for the data in the Chemical Abstracts, letters A and B jointly, as determined from the first thirty points, came out as 1.888 ± 0.007 1.888 ± 0.007 1.888±0.007. The general formula for the relation thus found to exist between the frequency y of persons making x contributions is

将发表1,2,3…篇论文的作者频数y,对这些论文数量1,2,3…x进行绘图,两变量均采用对数坐标,发现在每种情况下,数据点相对密集地散布在近似直线上,该直线的斜率大约为2比1。在从阿尔巴赫的表格中提取的数据中,特别接近这一比率。通过最小二乘法确定,根据阿尔巴赫的数据,从前17个数据点1得到的曲线斜率为 2.021 ± 0.017 2.021 ± 0.017 2.021±0.017。类似地,从前30个数据点确定的《化学文摘》中的数据,包括字母A和B,得到的斜率为 1.888 ± 0.007 1.888 ± 0.007 1.888±0.007。这样找到作者发表 x x x 篇论文的频数 y y y x x x 之间关系的一般公式为:
x n y = c o n s t x^n y= const xny=const

For the special case that n = 2 n = 2 n=2 (inverse square law of scientific productivity) the value of the constant in ( 1 ) (1) (1) is found as follows:

对于特殊情况,当 n = 2 n=2 n=2时 (科学产出的反平方定律),公式 ( 1 ) (1) (1) 中的常数值可通过如下过程确定:
y 1 = c 1 2 y_1 = \frac{c}{1^2} y1=12c

y 2 = c 2 2 y_2 = \frac{c}{2^2} y2=22c
y n = c n 2 y_n = \frac{c}{n^2} yn=n2c

∑ 1 ∞ y = c ( 1 1 2 + 1 2 2 + 1 3 2 + … … ) = c ∑ 1 ∞ 1 x 2 = c π 2 6 c = 6 π 2 ∑ 1 ∞ y \begin{align*} \sum^{∞}_{1}y &= c (\frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2}+……)\tag{5} \\&= c\sum^{∞}_{1}\frac{1}{x^2}\tag{6} \\&= c\frac{\pi^2}{6}\tag{7} \\ c &= \frac{6}{\pi^2}\sum^{∞}_{1}y\tag{8} \end{align*} 1yc=c(121+221+321+……)=c1x21=c6π2=π261y(5)(6)(7)(8)

But, since y y y is a frequency, the summation ∑ 1 ∞ y \sum^{∞}_{1}y 1y gives unity.

Then finally

但是,由于 y y y 是一个频率,所以 ∑ 1 ∞ y = 1 \sum^{∞}_{1}y = 1 1y=1

因此,最后
c = 6 π 2 = 6 9.87 = 0.6079    o r    60.79    p e r   c e n t \begin{align*} c &= \frac{6}{\pi^2}\tag{9} \\&= \frac{6}{9.87}\tag{10} \\&= 0.6079~~or ~~60.79 ~~per ~cent\tag{11} \end{align*} c=π26=9.876=0.6079  or  60.79  per cent(9)(10)(11)

Thus, according to the inverse square law, the proportion of all contributors who contribute a single item should be just over 60 60 60 per cent. In the cases here examined the actual proportion of this class to the whole was 59.2 59.2 59.2 per cent in Auerbach’s data ( 1325 1325 1325 contributors), 57.7 57.7 57.7 per cent in the Chemical Abstracts under initial A A A ( 1543 1543 1543 contributors) 57.98 57.98 57.98 under letter B B B ( 5348 5348 5348 contributors) and 57.9 57.9 57.9 under letters A A A and B B B jointly ( 6891 6891 6891 contributors).

因此,根据反平方定律,贡献一篇文献的作者所占全部作者的比例应略高于 60 60% 60。在这里所研究的案例中,这个类别相对于整体的实际比例是:阿尔巴赫的数据中是 59.2 % 59.2\% 59.2%(1325名作者),《化学文摘》中以 A A A字母开头的是 57.7 % 57.7\% 57.7%(1543名作者),以 B B B字母开头的是 57.98 % 57.98\% 57.98%(5348名作者),首字母为 A A A B B B开头的合计比例是 57.9 % 57.9\% 57.9% (6891名作者)。

在这里插入图片描述

Fig. 1.—Frequency diagram showing per cent of authors mentioned once, twice, etc., in Auerbach’s Geschichtstafeln der Physik, entire alphabet, and in the decennial index of Chemical Abstracts 1907-1916, letters A and B. The dotted line indicates frequencies computed according to the inverse square law

图1. 频数分布图显示在阿尔巴赫编写的《物理学历史一览表》全本字母表中以及1907-1916年十年期间《化学文摘》字母A和B部分中,被提及一次、两次等的作者所占百分比。点线表示根据反平方定律计算出的频数

在这里插入图片描述

Fig. 2.—Logarithmic frequency diagram showing number of authors mentioned once, twice, etc., in Auerbach’s tables (points indicated by crosses), and in Chemical Abstracts, letters A and B (points indicated by circles). The fully drawn line indicates points given by inverse square law, exponent = 2; the line of dashes corresponds to exponent 1.89.

图 2. 对数坐标频数分布图,显示在Auerbach表中被提及一次、两次等的作者数量(叉号表示)以及在《化学文摘》字母A和B部分中被提及的作者数量(圆圈表示)。实线表示依据幂指数为2的反平方定律计算的点,虚线对应幂指数1.89。

Frequency distributions of the general type$ (1)$ have a wide range of applicability to a variety of phenomena,2 and the mere form of such a distribution throws little or no light on the underlying physical relations.3 The fact that the exponent has, in the examples shown, approximately the value 2 enables us to state the result in the following simple form:

形如公式 ( 1 ) (1) (1)的一般形式频数分布对各种现象都有广泛的适用性,仅从这样分布的形式很难看出其潜在的物理关系。在所示的例子中,指数的大约值为2,使我们能够用以下简单形式陈述结果:

In the cases examined it is found that the number of persons making 2 contributions is about one-fourth of those making one; the number making 3 contributions is about one-ninth, etc. ; the number making n n n contributions is about 1 n 2 \dfrac{1}{n^2} n21 of those making one;4 and the proportion, of all contributors, that make a single contribution, is about 60 60 60 per cent.

在所研究的案例中发现,做出两项贡献的人数约为做出一项贡献人数的四分之一;做出三项贡献的人数约为做出一项贡献人数的九分之一,依此类推;做出 n n n 项贡献的人数约为做出一项贡献人数的 1 n 2 \dfrac{1}{n^2} n214而所有贡献者中只做出单次贡献的比例约为60%。

The fact that two such widely different sources as Chemical Abstracts (listing practically all current work in chemistry over a ten year period) and Auerbach’s tables (listing selected important contributions only, in physics, for all historical time) give very similar results, seems somewhat remarkable. It would be interesting to extend this study to such a work as Darmstaedter’s Handbuch der Geschichte der Naturwissenschaften und der Technik. Unfortunately the index of this work does not indicate multiple entries of the same year under one author’s name, but distinguishes only separately dated entries. It would therefore be necessary in each case to refer to the text. On the other hand the work could be abridged by restricting the inquiry to one or two letters of the alphabet, as was here done in the case of the Chemical Abstracts.

值得注意的是,《化学文摘》(涵盖了近十年的化学研究成果)和阿尔巴赫编写的《物理学历史一览表》(仅列出了物理学整个历史时期的重要贡献),引人注目的是,这两个截然不同的数据源给出了非常相似的结果。将这项研究扩展到《Darmstaedter’s Handbuch der Geschichte der Naturwissenschaften und der Technik》等作品将是很有意思的。不幸的是,该作品的索引并未显示同一作者名下同一年的多个条目,而仅区分了不同日期的条目。因此,每种情况下都需要参考正文内容。另一方面,可以通过将研究范围限制在字母表中的一个或两个字母上来简化工作,就像在《化学文摘》的案例中所做的那样。


  1. Beyond this point fluctuations become excessive owing to the limited number of persons in the sample. ↩︎ ↩︎

  2. Compare especially Cohrado Gini, Biblioteca dell’ Economista, ser. 5a, 20: Indici di concentrazione e di dipendenza. See also the Report of Commission of Housing and Regional Planning, State of New York, Jan. 11, 1926 : 59-73; and Income in the United States, by W. I. King and others; 2: 344 et seq. 1922. ↩︎

  3. C. J. Willis’ conclusions regarding the mechanism of evolution, inferred as they are from the occurrence of curves of this type in the relation between numbers of species and genera, seem for this reason to carry little conviction. See A. J. Lotka, Physical Biology: 311. 1925. ↩︎

  4. Fortunately, however, there are somewhat more persons of very great productivity than would be expected under this simple law. The very high figures (e.g., Abderhalden, 346 contributions in ten years) should perhaps be considered separately, since they are not the product of one person unassisted. Joint contributions have in all cases been credited to the senior author on ↩︎ ↩︎

猜你喜欢

转载自blog.csdn.net/YuvalNoah/article/details/131797006