Google Scholar and the Spectra of the Scientists

Google Scholar can be used to construct a metric which can show the relative "merit" of scientists in their corresponding fields of research, based on the work they've done.

Assuming that an author's name is unique (which is not always the case), one can construct a characteristic publication number or a publication eignevalue for a given author, say "john doe", as follows:

The publication eigenvalue for this author then, can be the number C(john doe), which has the continued fraction expansion:

To simplify the ordering which is present in the set {C(x):x\in author}, without loss of generality we can set a₀=1 and look instead at the number:

C(john doe)=[1;a₀',a₁',a₃',...,a_n',...], with a_n-1'=a_n, which maps the set {C(x):x\in author} into the interval (1,∞).

Note that in this case, sup_x{C(x):x\in author}=∞ and inf_x{C(x):x\in author}=1.

Adding a citation entry a>0 to an existent continued fraction expansion of C(x), can make C(x) either larger or smaller, depending on where a is added and the number of citations at level n^[18]. Specifically:

The main "weight" of the number C(x) will then be carried by the term a₁, which is the number of publications of author x and which provides a good approximation of C(x), as C(x)~C₂(x)=1+1/a₁, which is fairly reasonable.

The formal definition of C(x) is slightly more involved, mainly because one needs to define it uniquely. Here's then the formal definition:

It can now be seen that the definition above gives rise to a unique number C(x), as in the first definition for "john doe", above, because the suprema are taken over finite sets indexed by k,l,m,...,w.

The definition above gives rise to the metric: d(x,y)=|C(x)-C(y)|. Let's verify the metric's fundamental properties:

It is clear that a person with no publications, will have a characteristic number equal to infinity and the more publications an author has, the closer C(x) is to 1. This gives rise to a tempered distribution, and then one can define the publication percentile P(x) of a scientist x in this distribution to be: P(x)=100/C(x).

Fixing t=now(18/11/2010) and omitting the term a₀=1, let's then see these numbers for some scientists:

x	a_n, n≥1^[1]	C(x)	P(x)(%)	Class^[2]	Non-repeating Block^[3]	Repeating Block^[4]
Albert Einstein	4470,6443,5293,2374,2344,770, 589,332,314,332,314	1.000223714	99.97763364	q.i.	7	2
Paul Erdos	3030,1310,6984,4953,1809,1619, 332,161,16,161,16	1.000330033	99.96700760	q.i.	7	2
Isaac Newton	2480,1777,1331,1762,831,1762, 831	1.000403226	99.95969368	q.i.	3	3
Donald Knuth	1570,3685,3950,938,430,408, 496,5809,1743,1206,485,222, 15,15,17,4	1.000636943	99.93634629	r	16	0
Leonhard Euler	1390,259,2665,1820,1619,332, 161,16,161,16	1.000719422	99.92810947	q.i.	6	2
Henri Poincare	1120,827,2721,4859,4396,3809, 4396,3809	1.000892856	99.91079403	q.i.	4	2
John von Neumann	1020,12150,16445,6460,5214,3361, 2470,2156,4633,3275,2725,3771, 4842,9666,34313,8985,9666,34313, 8985	1.000980392	99.90205681	q.i.	13	3
Robert Oppenheimer^[5]	989,940,704,492,605,611, 314,195,191,195,191	1.001011121	99.89899001	q.i.	7	2
Benoit Mandelbrot	896,22145,14018,6668,12394,16883, 12394	1.001116071	99.88851729	q.i.	6	2
Carl Friedrich Gauss	888,607,1980,2812,3493,2812, 3493	1.001126124	99.88751427	q.i.	3	2
Georg Cantor	866,407,1164,1363,1319,465, 84,465,84	1.001154731	99.88466007	q.i.	5	2
Werner Heisenberg	863,1718,7160,7624,3251,2696, 3750,4792,9556,33976,8861,9556, 33976,8861	1.001158748	99.88425934	q.i.	8	3
Richard Feynman	847,3914,2621,1439,2313,1845, 1156,2078,1156,2078	1.001180637	99.88207551	q.i.	6	2
Johannes Kepler	791,80,3371,1219,67,38, 67,38	1.001264203	99.87373937	r	8	0
Max Planck	785,351,844,706,1100,3624, 3440,823,205,128,65,26, 6,1	1.001273881	99.87277400	r	14	0
Andrew Wiles	562,1127,2812,3493,2812,3493, 2812	1.001779357	99.82238039	q.i.	2	2
Subhash Kak	527,130,58,64,144,30, 35,15,8,4	1.001897506	99.81060882	r	10	0
Johann Heinrich Lambert	477,192,1007,2611,1028,379, 123,9,1	1.002096413	99.79079726	r	9	0
Claude Shannon	466,38281,11721,7672,3274,2724, 3768,4835,9646,34270,8975,9646, 34270,8975,9646	1.002145923	99.78586725	q.i.	9	3
Erwin Schroedinger	431,1168,2238,2166,830,524, 830,524	1.002320181	99.76851898	q.i.	4	2
Robert Devaney^[22]	382,3247,5433,6581,2552,1399, 1170,476,987,676,308,218, 581,237,382,278,382,278	1.002617799	99.73890360	q.i.	13	2
Srinivasa Ramanujan	349,446,481,128,58,44, 18,44,18	1.002865311	99.71428754	q.i.	5	2
Kurt Goedel	328,2024,4096,10422,3557,1496, 3557,1496	1.003048776	99.69604909	q.i.	4	2
Augustin-Louis Cauchy	278,180,841,2703,1991,3083, 3459,1328,7007,10158,1879,10158, 1879	1.003597050	99.64158420	q.i.	9	2
Douglas R. Hofstadter	249,821,3663,14885,1559,12637, 14885,1559,12637	1.004016045	99.60000195	q.i.	3	3
John Baez	207,559,291,335,188,894, 492,808,705,159,57,5	1.004830876	99.51923490	r	12	0
Grigori Perelman	172,701,522,549,191,180, 89,442,1097,463,1097,463, 1097	1.005813905	99.42197008	q.i.	8	2
John F. Waymouth^[30]	171,348,116,162,78,43, 113,504,65,94,87,7,1	1.005847855	99.41861436	r	13	0
Henri Lebesgue	148,167,788,2799,7119,8978, 7799,2557,7799,2557	1.006756483	99.32888603	q.i.	6	2
Gerald A. Edgar^[10]	141,436,3022,6600,2558,1412, 1178,484,1001,685,309,219, 586,238,388,284,388,284	1.007092083	99.29578604	q.i.	14	2
Constantin Caratheodory	131,326,7119,8978,7799,2557, 7799,2557	1.007633409	99.24244185	q.i.	4	2
Heinrich Begehr^[24]	119,142,40,181,74,107, 23,60,22,17,10,2, 2	1.008402864	99.16671556	q.i.	11	1
Bernhard Riemann	107,303,1980,2812,3493,2812, 3493	1.009345506	99.07410237	q.i.	3	2
A.O.L. Atkin^[6]	95,359,2812,3493,2812,3493	1.010526007	98.95836356	q.i.	2	2
Pierre de Fermat	85,15,177,766,160,243, 273,107,130,273,107,130,273	1.011755489	98.83810965	q.i.	6	3
Gottfried Leibniz	76,6,87,3430,11657,7624, 3251,2696,3750,4792,9556,33976, 8861,9556,33976,8861	1.013129158	98.70409832	q.i.	10	3
Menelaos Karanikolas^[7]	76,26,163,148,145,191, 84,134,114,432,92,432, 92	1.013151241	98.70194693	q.i.	8	2
Donald L. Shell^[8]	75,145,2536,15236,10092,1870, 10092,1870	1.013332107	98.68432992	q.i.	4	2
Vassili Nestoridis^[9]	71,291,169,56,52,119, 159,256,159,256,159	1.014083825	98.61117740	q.i.	6	2
Robert B. Israel^[10]	70,226,1664,7903,2659,1206, 485,222,15,15,17,4	1.014284811	98.59163707	r	12	0
Arturo Magidin^[28]	55,21,14,3	1.018166142	98.21579787	r	4	0
Victor D. Roberts^[30]	47,41,82,99,32,149, 233,105,77,41,25,17, 11,17,11	1.021265563	97.91772442	q.i.	11	2
Michael S. Lambrou^[31]	29,50,67,51,126,357, 693,344,79,1934,9484,14466, 10203,1882,10203,1882	1.034459001	96.66888675	q.i.	12	2
Donald L. Klipstein^[23]	28,16,8,8,6,1, 11,20,9,4,10,4	1.035635350	96.55908328	r	12	0
Johan E. Mebius^[28]	24,4,4,3,7,6, 3,6,4,6,4	1.041260390	96.03745703	q.i.	7	2
Paris Pamfilos^[16]	22,3,2	1.044871795	95.70552147	r	3	0
Ioannis Papadoperakis^[11]	22,291,169,56,52,119, 159,256,159,256	1.045447447	95.65282341	q.i.	6	2
This author	13,10,4,17,3,17, 3	1.076349896	92.90659143	q.i.	3	2
Dave L. Renfro^[20]	9,5,7,9,3,3	1.108760363	90.19081433	q.i.	4	1
Mikes Glinatsis^[19]	8,51,289,211,384,543, 726,1900,2369,1900,2369	1.124694397	88.91304184	q.i.	7	2
The author's father^[17]	6,9,18,29,17,13,6,2,1,1	1.163654584	85.93615441	r	10	0
Robert P. Munafo^[21]	4,3,1	1.235294118	80.95238095	r	3	0
James D. Hooker^[25]	3	1.333333333	75	r	1	0
...	...	...	...	...	...	...
2 publications no citations	2,0,0,0,0	3/2	66.66	r	1	0
1 publication no citations	1,0,0,0,0	2	50	r	1	0
no publications	0	∞	0	r	0	0

We can now define the Google Scholar Eigenspectrum of the author x to be the sequence of convergents for C(x), C_n(x)=[a₀=1;a₁,a₂,...,a_n-1].

The spectra of some authors are shown below. The dominant spectral line for each author lies approximately at C₂(x)=a₀+1/a₁=1+1/a₁. These spectra give you a rough idea of the colossal amount of work the corresponding authors have done in their fields.

x	C(x)	Eigenspectrum
Albert Einstein	1.000223714
Paul Erdos	1.000330033
Isaac Newton	1.000403226
Donald Knuth	1.000636943
Leonhard Euler	1.000719422
Henri Poincare	1.000892856
John von Neumann	1.000980392
Robert Oppenheimer^[5]	1.001011121
Benoit Mandelbrot	1.001116071
Carl Friedrich Gauss	1.001126124
Georg Cantor	1.001154731
Werner Heisenberg	1.001158748
Richard Feynman	1.001180637
Johannes Kepler	1.001264203
Max Planck	1.001273881
Andrew Wiles	1.001779357
Subhash Kak	1.001897506
Johann Heinrich Lambert	1.002096413
Claude Shannon	1.002145923
Erwin Schroedinger	1.002320181
Robert Devaney	1.002617799
Srinivasa Ramanujan	1.002865311
Kurt Goedel	1.003048776
Augustin-Louis Cauchy	1.003597050
Douglas R. Hofstadter	1.004016045
John Baez	1.004830876
Grigori Perelman	1.005813905
John F. Waymouth	1.005847855
Henri Lebesgue	1.006756483
Gerald E. Edgar	1.007092083
Constantin Caratheodory	1.007633409
Heinrich Begehr	1.008402864
Bernhard Riemann	1.009345506
A.O.L. Atkin	1.010526007
Pierre de Fermat	1.011755489
Gottfried Leibniz	1.013129158
Menelaos Karanikolas	1.013151241
Donald L. Shell	1.013332107
Vassili Nestoridis	1.014083825
Robert B. Israel	1.014284811
Arturo Magidin	1.018166142
Victor D. Roberts	1.021265563
Michael S. Lambrou	1.034459001
Donand L. Klipstein	1.035635350
Johan E. Mebius^[14]	1.041260390
Paris Pamfilos	1.044871795
Ioannis Papadoperakis	1.045447447
This author	1.076349896
Dave L. Renfro^[14]	1.108760363
Mikes Glinatsis	1.124694397
The author's father^[14]^[26]	1.163654584
Robert P. Munafo^[14]	1.235294118
Bronze Ratio-2^[27]	1.302775638
James D. Hooker	1.333333333
Silver Ratio-1^[27]	1.414213562
[1;2,10]	1.476190476
[1;2,14,2] (Tl?)	1.483333333
[1;2]	1.5
[1;1,1]	1.5
Golden Ratio^[27]	1.618033985
[1;1,1,1,32,1,2] (Na?!)	1.663333333
[1;1,3,4,5,6,3,2]	1.764064436
[1;1,10]	1.909090909
[1;1]	2

Euler proved that whenever the sequence of the associated continued fraction is periodic, C(x) will equal a certain quadratic irrational ζ of the form (P+sqrt(D))/Q. We find this ζ for this author.

> L2C:=proc(L)
> local l,c,n;
> l:=nops(L);c:=0;
> for n from 1 to l do
> c:=1/(c+L[l-n+1]);
> od;
> c;
> 1/c;
> end:

Consider then the simplest periodic continued fraction, with a period 2 block p:

What does it mean for this continued fraction to be periodic? It means exactly:

The above gives two solutions. For this author the continued fraction of works and citations (including the initial 1) is:

L=[1;13,10,4,17,3,17,3,17,3,...]. Therefore, we can recover the periodic part, as:

Note that this is complete quotient ζ₄. We can then use the recursion found on that same page, to recover the full continued fraction, using Maple:

> zeta4:=sol[1];
> zeta3:=4+1/zeta4;
> zeta2:=10+1/zeta3;
> zeta1:=13+1/zeta2;
> zeta0:=1+1/zeta1;

> evalf(zeta0);
1.076349896
> L:=[1,13,10,4,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3];
> evalf(L2C(L));
1.076349896

> zeta:=simplify(zeta0);
> conj:=denom(zeta)-2*op(denom(zeta))[2];#get rid of roots on denominator!
> zetan:=expand(numer(zeta)*conj);
> zetad:=expand(denom(zeta)*conj);
> zeta:=zetan/zetad;

And ζ=906371/842062-(1/2526186)*2805^(1/2), so the Publication Eigenvalues for this author, as a function of t=now, are ζ and ζ*, which are Quadratic Irrationals^[32].

> zetac:=zeta-2*op(2,zeta);
> eq:=x^2-(zeta+zetac)*x+zeta*zetac=0;
> eq:=simplify(eq);
> eq:=denom(op(3,op(1,eq)))*eq;

> solve(eq,x);
{906371/842062+(1/2526186)*2805^(1/2), 906371/842062-(1/2526186)*2805^(1/2)} = {ζ*,ζ}.

This equation then, is something like a characteristic equation or eigenequation for this author and the left part of the equation is something like a publication eigenfunction for this author as a function of this author's publications at the current time. As more works are published and more citations are shown, it is obvious that this characteristic equation changes as a function of time. It is a useful exercise for the reader to calculate the characteristic equation of other authors, higher on the table, above.

The convergents of C(x), are {C_n(x)}, n\in N, and they can be calculated with Maple:

> C:=proc(L,n)
> local cvgts;
> convert(L2C(L),confrac,cvgts);
> cvgts[n];
> end:

If δ is the Dirac Delta, the Google Scholar Publication Eigenspectrum of an author x shown on the above table then, is the convergent pulse train or Dirac Comb:

The Google Scholar Publication Signal or Publication Eigensignal of an author in the time domain then, will be the Inverse Fourier Transform of the author's Eigenspectrum:

If C(x) is rational the sum will consist of a finite number of terms and hence the Eigensignal will be periodic in the time domain^[12]. If C(x) is a quadratic irrational the sum will consist of infinitely many terms and the eigensignal will not be periodic in the time domain.

Let's calculate the real and imaginary components of the eigensignal for this author with Maple, by considering an approximation with the periodic part of C(x) repeating 4 times:

>a0:=2/T*Int(rexpr(t),t=0..T):
>a:=n->2/T*Int(rexpr(t)*cos(2*Pi*n*t/T),t=-T/2..T/2):
>b:=n->2/T*Int(rexpr(t)*sin(2*Pi*n*t/T),t=-T/2..T/2): #evaluates to 0

If C(x) is a quadratic irrational, the author's Google Scholar Eigensignal will be "almost" periodic, with a minimal period T=1/C_n(x) which is with Maple:

Now we can construct a Fourier Series approximation for the real Eigensignal with Maple:

> p1:=plot(rexpr(t),t=0..T,color=red):
> p2:=plot(F(12,t),t=0..T,color=green):
> display(p1,p2);

The harmonics of the author's real Eigensignal will then be a_n and the amplitude of the harmonics will be given as c_n=|a_n|. The author's Harmonic Spectrum with up to 10 harmonics is shown below.

> eps:=1e-1;
> PS:=[[[eps,0],[eps,evalf(abs(a0))]],seq([[eps+n,0],[eps+n,evalf(abs(sqrt(a(n)^2+b(n)^2)))]],n=1..10)]:
> plot(PS,n=0..10);

The 0-th harmonic a₀ (dc-term) is the red line (almost supressed). The amplitude of the dominant harmonic (green), which corresponds to the dominant spectral line shown on the author's Eigenspectrum on the table above, is |a₁|~11.95 and the author's Eigensignal broadcasts at a frequency f=1/T=C(x)~1.082 Hz.

We now have a Fourier Series approximation. Let's recover the Google Eigenspectrum from it.

The above is an approximation of the Google Scholar Eigenspectrum for this author. The function jumps very hight exactly at the convergents C_n(x) and in particular at C₂(x):

The publication Eigensignal's minimal period is roughly the time between two adjacent publications. For this author:

Accordingly, one can now define a x author's blue-shift relative to another author y, via the Google Scholar Metric, as d(x,y)=|C(x)-C(y)|. For example:

> LG:=[1,13,10,4,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17];#author
> LP:=[1,22,291,169,56,52,119,159,256,159,256,159,256,159];#author's advisor
> BS:=evalf(abs(L2C(LG)-L2C(LP)));
BS:=0.03090244910

The author's advisor is blue-shifted 0.03090244910 more than the author. x more blue-shifted than y means x has worked harder than y. Notice how high are the blue-shifts of the big guns in Mathematics.

Alternatively one may define an author's red-shift relative to unity. For example:

The author's advisor is 0.04544744665 and the author is 0.07634989575 red-shifted away from unity. Smaller red-shifts mean more work.