Google Scholar can be used to construct a metric which can show the relative "merit" of scientists in their corresponding fields of research, based on the work they've done.
Assuming that an author's name is unique (which is not always the case), one can construct a characteristic publication number or a publication eignevalue for a given author, say "john doe", as follows:
The publication eigenvalue for this author then, can be the number C(john doe), which has the continued fraction expansion:
C(john doe)=[a0;a1,a2,...,an,...].
To simplify the ordering which is present in the set {C(x):x\in author}, without loss of generality we can set a0=1 and look instead at the number:
C(john doe)=[1;a0',a1',a3',...,an',...], with an-1'=an, which maps the set {C(x):x\in author} into the interval (1,∞).
Note that in this case, supx{C(x):x\in author}=∞ and infx{C(x):x\in author}=1.
Adding a citation entry a>0 to an existent continued fraction expansion of C(x), can make C(x) either larger or smaller, depending on where a is added and the number of citations at level n[18]. Specifically:
The main "weight" of the number C(x) will then be carried by the term a1, which is the number of publications of author x and which provides a good approximation of C(x), as C(x)~C2(x)=1+1/a1, which is fairly reasonable.
The formal definition of C(x) is slightly more involved, mainly because one needs to define it uniquely. Here's then the formal definition:
It can now be seen that the definition above gives rise to a unique number C(x), as in the first definition for "john doe", above, because the suprema are taken over finite sets indexed by k,l,m,...,w.
A Metric Based On Google Scholar
The definition above gives rise to the metric: d(x,y)=|C(x)-C(y)|. Let's verify the metric's fundamental properties:
It is clear that a person with no publications, will have a characteristic number equal to infinity and the more publications an author has, the closer C(x) is to 1. This gives rise to a tempered distribution, and then one can define the publication percentile P(x) of a scientist x in this distribution to be: P(x)=100/C(x).
Fixing t=now(18/11/2010) and omitting the term a0=1, let's then see these numbers for some scientists:
x | an, n≥1[1] | C(x) | P(x)(%) | Class[2] | Non-repeating Block[3] | Repeating Block[4] |
Albert Einstein | 4470,6443,5293,2374,2344,770, 589,332,314,332,314 |
1.000223714 | 99.97763364 | q.i. | 7 | 2 |
Paul Erdos | 3030,1310,6984,4953,1809,1619, 332,161,16,161,16 |
1.000330033 | 99.96700760 | q.i. | 7 | 2 |
Isaac Newton | 2480,1777,1331,1762,831,1762, 831 |
1.000403226 | 99.95969368 | q.i. | 3 | 3 |
Donald Knuth | 1570,3685,3950,938,430,408, 496,5809,1743,1206,485,222, 15,15,17,4 |
1.000636943 | 99.93634629 | r | 16 | 0 |
Leonhard Euler | 1390,259,2665,1820,1619,332, 161,16,161,16 |
1.000719422 | 99.92810947 | q.i. | 6 | 2 |
Henri Poincare | 1120,827,2721,4859,4396,3809, 4396,3809 |
1.000892856 | 99.91079403 | q.i. | 4 | 2 |
John von Neumann | 1020,12150,16445,6460,5214,3361, 2470,2156,4633,3275,2725,3771, 4842,9666,34313,8985,9666,34313, 8985 |
1.000980392 | 99.90205681 | q.i. | 13 | 3 |
Robert Oppenheimer[5] | 989,940,704,492,605,611, 314,195,191,195,191 |
1.001011121 | 99.89899001 | q.i. | 7 | 2 |
Benoit Mandelbrot | 896,22145,14018,6668,12394,16883, 12394 |
1.001116071 | 99.88851729 | q.i. | 6 | 2 |
Carl Friedrich Gauss | 888,607,1980,2812,3493,2812, 3493 |
1.001126124 | 99.88751427 | q.i. | 3 | 2 |
Georg Cantor | 866,407,1164,1363,1319,465, 84,465,84 |
1.001154731 | 99.88466007 | q.i. | 5 | 2 |
Werner Heisenberg | 863,1718,7160,7624,3251,2696, 3750,4792,9556,33976,8861,9556, 33976,8861 |
1.001158748 | 99.88425934 | q.i. | 8 | 3 |
Richard Feynman | 847,3914,2621,1439,2313,1845, 1156,2078,1156,2078 |
1.001180637 | 99.88207551 | q.i. | 6 | 2 |
Johannes Kepler | 791,80,3371,1219,67,38, 67,38 |
1.001264203 | 99.87373937 | r | 8 | 0 |
Max Planck | 785,351,844,706,1100,3624, 3440,823,205,128,65,26, 6,1 |
1.001273881 | 99.87277400 | r | 14 | 0 |
Andrew Wiles | 562,1127,2812,3493,2812,3493, 2812 |
1.001779357 | 99.82238039 | q.i. | 2 | 2 |
Subhash Kak | 527,130,58,64,144,30, 35,15,8,4 |
1.001897506 | 99.81060882 | r | 10 | 0 |
Johann Heinrich Lambert | 477,192,1007,2611,1028,379, 123,9,1 |
1.002096413 | 99.79079726 | r | 9 | 0 |
Claude Shannon | 466,38281,11721,7672,3274,2724, 3768,4835,9646,34270,8975,9646, 34270,8975,9646 |
1.002145923 | 99.78586725 | q.i. | 9 | 3 |
Erwin Schroedinger | 431,1168,2238,2166,830,524, 830,524 |
1.002320181 | 99.76851898 | q.i. | 4 | 2 |
Robert Devaney[22] | 382,3247,5433,6581,2552,1399, 1170,476,987,676,308,218, 581,237,382,278,382,278 |
1.002617799 | 99.73890360 | q.i. | 13 | 2 |
Srinivasa Ramanujan | 349,446,481,128,58,44, 18,44,18 |
1.002865311 | 99.71428754 | q.i. | 5 | 2 |
Kurt Goedel | 328,2024,4096,10422,3557,1496, 3557,1496 |
1.003048776 | 99.69604909 | q.i. | 4 | 2 |
Augustin-Louis Cauchy | 278,180,841,2703,1991,3083, 3459,1328,7007,10158,1879,10158, 1879 |
1.003597050 | 99.64158420 | q.i. | 9 | 2 |
Douglas R. Hofstadter | 249,821,3663,14885,1559,12637, 14885,1559,12637 |
1.004016045 | 99.60000195 | q.i. | 3 | 3 |
John Baez | 207,559,291,335,188,894, 492,808,705,159,57,5 |
1.004830876 | 99.51923490 | r | 12 | 0 |
Grigori Perelman | 172,701,522,549,191,180, 89,442,1097,463,1097,463, 1097 |
1.005813905 | 99.42197008 | q.i. | 8 | 2 |
John F. Waymouth[30] | 171,348,116,162,78,43, 113,504,65,94,87,7,1 |
1.005847855 | 99.41861436 | r | 13 | 0 |
Henri Lebesgue | 148,167,788,2799,7119,8978, 7799,2557,7799,2557 |
1.006756483 | 99.32888603 | q.i. | 6 | 2 |
Gerald A. Edgar[10] | 141,436,3022,6600,2558,1412, 1178,484,1001,685,309,219, 586,238,388,284,388,284 |
1.007092083 | 99.29578604 | q.i. | 14 | 2 |
Constantin Caratheodory | 131,326,7119,8978,7799,2557, 7799,2557 |
1.007633409 | 99.24244185 | q.i. | 4 | 2 |
Heinrich Begehr[24] | 119,142,40,181,74,107, 23,60,22,17,10,2, 2 |
1.008402864 | 99.16671556 | q.i. | 11 | 1 |
Bernhard Riemann | 107,303,1980,2812,3493,2812, 3493 |
1.009345506 | 99.07410237 | q.i. | 3 | 2 |
A.O.L. Atkin[6] | 95,359,2812,3493,2812,3493 | 1.010526007 | 98.95836356 | q.i. | 2 | 2 |
Pierre de Fermat | 85,15,177,766,160,243, 273,107,130,273,107,130,273 |
1.011755489 | 98.83810965 | q.i. | 6 | 3 |
Gottfried Leibniz | 76,6,87,3430,11657,7624, 3251,2696,3750,4792,9556,33976, 8861,9556,33976,8861 |
1.013129158 | 98.70409832 | q.i. | 10 | 3 |
Menelaos Karanikolas[7] | 76,26,163,148,145,191, 84,134,114,432,92,432, 92 |
1.013151241 | 98.70194693 | q.i. | 8 | 2 |
Donald L. Shell[8] | 75,145,2536,15236,10092,1870, 10092,1870 |
1.013332107 | 98.68432992 | q.i. | 4 | 2 |
Vassili Nestoridis[9] | 71,291,169,56,52,119, 159,256,159,256,159 |
1.014083825 | 98.61117740 | q.i. | 6 | 2 |
Robert B. Israel[10] | 70,226,1664,7903,2659,1206, 485,222,15,15,17,4 |
1.014284811 | 98.59163707 | r | 12 | 0 |
Arturo Magidin[28] | 55,21,14,3 | 1.018166142 | 98.21579787 | r | 4 | 0 |
Victor D. Roberts[30] | 47,41,82,99,32,149, 233,105,77,41,25,17, 11,17,11 |
1.021265563 | 97.91772442 | q.i. | 11 | 2 |
Michael S. Lambrou[31] | 29,50,67,51,126,357, 693,344,79,1934,9484,14466, 10203,1882,10203,1882 |
1.034459001 | 96.66888675 | q.i. | 12 | 2 |
Donald L. Klipstein[23] | 28,16,8,8,6,1, 11,20,9,4,10,4 |
1.035635350 | 96.55908328 | r | 12 | 0 |
Johan E. Mebius[28] | 24,4,4,3,7,6, 3,6,4,6,4 |
1.041260390 | 96.03745703 | q.i. | 7 | 2 |
Paris Pamfilos[16] | 22,3,2 | 1.044871795 | 95.70552147 | r | 3 | 0 |
Ioannis Papadoperakis[11] | 22,291,169,56,52,119, 159,256,159,256 |
1.045447447 | 95.65282341 | q.i. | 6 | 2 |
This author | 13,10,4,17,3,17, 3 |
1.076349896 | 92.90659143 | q.i. | 3 | 2 |
Dave L. Renfro[20] | 9,5,7,9,3,3 | 1.108760363 | 90.19081433 | q.i. | 4 | 1 |
Mikes Glinatsis[19] | 8,51,289,211,384,543, 726,1900,2369,1900,2369 |
1.124694397 | 88.91304184 | q.i. | 7 | 2 |
The author's father[17] | 6,9,18,29,17,13,6,2,1,1 | 1.163654584 | 85.93615441 | r | 10 | 0 |
Robert P. Munafo[21] | 4,3,1 | 1.235294118 | 80.95238095 | r | 3 | 0 |
James D. Hooker[25] | 3 | 1.333333333 | 75 | r | 1 | 0 |
... | ... | ... | ... | ... | ... | ... |
2 publications no citations | 2,0,0,0,0 | 3/2 | 66.66 | r | 1 | 0 |
1 publication no citations | 1,0,0,0,0 | 2 | 50 | r | 1 | 0 |
no publications | 0 | ∞ | 0 | r | 0 | 0 |
The Eigenspectrum of an Author
We can now define the Google Scholar Eigenspectrum of the author x to be the sequence of convergents for C(x), Cn(x)=[a0=1;a1,a2,...,an-1].
The spectra of some authors are shown below. The dominant spectral line for each author lies approximately at C2(x)=a0+1/a1=1+1/a1. These spectra give you a rough idea of the colossal amount of work the corresponding authors have done in their fields.
x | C(x) | Eigenspectrum |
Albert Einstein | 1.000223714 | |
Paul Erdos | 1.000330033 | |
Isaac Newton | 1.000403226 | |
Donald Knuth | 1.000636943 | |
Leonhard Euler | 1.000719422 | |
Henri Poincare | 1.000892856 | |
John von Neumann | 1.000980392 | |
Robert Oppenheimer[5] | 1.001011121 | |
Benoit Mandelbrot | 1.001116071 | |
Carl Friedrich Gauss | 1.001126124 | |
Georg Cantor | 1.001154731 | |
Werner Heisenberg | 1.001158748 | |
Richard Feynman | 1.001180637 | |
Johannes Kepler | 1.001264203 | |
Max Planck | 1.001273881 | |
Andrew Wiles | 1.001779357 | |
Subhash Kak | 1.001897506 | |
Johann Heinrich Lambert | 1.002096413 | |
Claude Shannon | 1.002145923 | |
Erwin Schroedinger | 1.002320181 | |
Robert Devaney | 1.002617799 | |
Srinivasa Ramanujan | 1.002865311 | |
Kurt Goedel | 1.003048776 | |
Augustin-Louis Cauchy | 1.003597050 | |
Douglas R. Hofstadter | 1.004016045 | |
John Baez | 1.004830876 | |
Grigori Perelman | 1.005813905 | |
John F. Waymouth | 1.005847855 | |
Henri Lebesgue | 1.006756483 | |
Gerald E. Edgar | 1.007092083 | |
Constantin Caratheodory | 1.007633409 | |
Heinrich Begehr | 1.008402864 | |
Bernhard Riemann | 1.009345506 | |
A.O.L. Atkin | 1.010526007 | |
Pierre de Fermat | 1.011755489 | |
Gottfried Leibniz | 1.013129158 | |
Menelaos Karanikolas | 1.013151241 | |
Donald L. Shell | 1.013332107 | |
Vassili Nestoridis | 1.014083825 | |
Robert B. Israel | 1.014284811 | |
Arturo Magidin | 1.018166142 | |
Victor D. Roberts | 1.021265563 | |
Michael S. Lambrou | 1.034459001 | |
Donand L. Klipstein | 1.035635350 | |
Johan E. Mebius[14] | 1.041260390 | |
Paris Pamfilos | 1.044871795 | |
Ioannis Papadoperakis | 1.045447447 | |
This author | 1.076349896 | |
Dave L. Renfro[14] | 1.108760363 | |
Mikes Glinatsis | 1.124694397 | |
The author's father[14][26] | 1.163654584 | |
Robert P. Munafo[14] | 1.235294118 | |
Bronze Ratio-2[27] | 1.302775638 | |
James D. Hooker | 1.333333333 | |
Silver Ratio-1[27] | 1.414213562 | |
[1;2,10] | 1.476190476 | |
[1;2,14,2] (Tl?) |
1.483333333 | |
[1;2] | 1.5 | |
[1;1,1] | 1.5 | |
Golden Ratio[27] | 1.618033985 | |
[1;1,1,1,32,1,2] (Na?!) |
1.663333333 | |
[1;1,3,4,5,6,3,2] | 1.764064436 | |
[1;1,10] | 1.909090909 | |
[1;1] | 2 |
The Quadratic Eigenequation of an Author
Euler proved that whenever the sequence of the associated continued fraction is periodic, C(x) will equal a certain quadratic irrational ζ of the form (P+sqrt(D))/Q. We find this ζ for this author.
First we program some Maple code to calculate continued fractions.
> L2C:=proc(L)
> local l,c,n;
> l:=nops(L);c:=0;
> for n from 1 to l do
> c:=1/(c+L[l-n+1]);
> od;
> c;
> 1/c;
> end:
The above proc, takes as input a list of the form:
>L:=[a0=1,a1,a2,a3,a4,...];
and calculates the corresponding continued fraction.
Consider then the simplest periodic continued fraction, with a period 2 block p:
[a1;a2,a1,a2,...,a1,a2,...].
What does it mean for this continued fraction to be periodic? It means exactly:
p=a1+1/(a2+1/p) (1)
Equation (1) translated into continued fraction notation, is:
p=[a1;a2,p] (2)
Equation (2) translated into Maple notation, is:
L2C([a1;a2,p])=p
The last equation can be solved quickly with Maple.
> eq:=L2C([a1,a2,p])=p;
> sol:=solve(eq,p);
The above gives two solutions. For this author the continued fraction of works and citations (including the initial 1) is:
L=[1;13,10,4,17,3,17,3,17,3,...]. Therefore, we can recover the periodic part, as:
> sol:=subs({a1=17,a2=3},sol[1]),subs({a1=17,a2=3},sol[2]);
sol:=17/2+/-2805(1/2)/6
Note that this is complete quotient ζ4. We can then use the recursion found on that same page, to recover the full continued fraction, using Maple:
> zeta4:=sol[1];
> zeta3:=4+1/zeta4;
> zeta2:=10+1/zeta3;
> zeta1:=13+1/zeta2;
> zeta0:=1+1/zeta1;
Check:
> evalf(zeta0);
1.076349896
> L:=[1,13,10,4,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3];
> evalf(L2C(L));
1.076349896
Check #2:
> convert(evalf(zeta0),confrac);
[1, 13, 10, 4, 17, 2, 1]
Success!
> zeta:=simplify(zeta0);
> conj:=denom(zeta)-2*op(denom(zeta))[2];#get rid of roots on denominator!
> zetan:=expand(numer(zeta)*conj);
> zetad:=expand(denom(zeta)*conj);
> zeta:=zetan/zetad;
And ζ=906371/842062-(1/2526186)*2805(1/2), so the Publication Eigenvalues for this author, as a function of t=now, are ζ and ζ*, which are Quadratic Irrationals[32].
We finally recover the quadratic equation via Viete's Expressions:
> zetac:=zeta-2*op(2,zeta);
> eq:=x^2-(zeta+zetac)*x+zeta*zetac=0;
> eq:=simplify(eq);
> eq:=denom(op(3,op(1,eq)))*eq;
1263093*x2-2719113*x+1463387 = 0
Check:
> solve(eq,x);
{906371/842062+(1/2526186)*2805(1/2),
906371/842062-(1/2526186)*2805(1/2)} = {ζ*,ζ}.
This equation then, is something like a characteristic equation or eigenequation for this author and the left part of the equation is something like a publication eigenfunction for this author as a function of this author's publications at the current time. As more works are published and more citations are shown, it is obvious that this characteristic equation changes as a function of time. It is a useful exercise for the reader to calculate the characteristic equation of other authors, higher on the table, above.
The convergents of C(x), are {Cn(x)}, n\in N, and they can be calculated with Maple:
> C:=proc(L,n)
> local cvgts;
> convert(L2C(L),confrac,cvgts);
> cvgts[n];
> end:
If δ is the Dirac Delta, the Google Scholar Publication Eigenspectrum of an author x shown on the above table then, is the convergent pulse train or Dirac Comb:
The Google Scholar Publication Signal or Publication Eigensignal of an author in the time domain then, will be the Inverse Fourier Transform of the author's Eigenspectrum:
The latter evaluates to:
If C(x) is rational the sum will consist of a finite number of terms and hence the Eigensignal will be periodic in the time domain[12]. If C(x) is a quadratic irrational the sum will consist of infinitely many terms and the eigensignal will not be periodic in the time domain.
Let's calculate the real and imaginary components of the eigensignal for this author with Maple, by considering an approximation with the periodic part of C(x) repeating 4 times:
> L:=[1,13,10,4,17,3,17,3,17,3,17,3];
> S:=proc(L,n)
> local i;
> add(Dirac(xi-C(L,i)),i=1..nops(L));
> end:
> AS:=t->Int(S(L,xi)*exp(2*Pi*I*t*xi),xi=-infinity..infinity);
> with(plots):
> rexpr:=t->evalf(Re(AS(t)));
> imexpr:=t->evalf(Im(AS(t)));
> plot(rexp(t)r,t=0..10*T);
> plot(imexpr(t),t=0..10*T);
The two signals can now be approximated via the Fourier Series. For the real Eigensignal the Fourier Coefficients are given as:
>a0:=2/T*Int(rexpr(t),t=0..T):
>a:=n->2/T*Int(rexpr(t)*cos(2*Pi*n*t/T),t=-T/2..T/2):
>b:=n->2/T*Int(rexpr(t)*sin(2*Pi*n*t/T),t=-T/2..T/2): #evaluates to 0
These evaluate to functions of the convergents of C(x)[13].
If C(x) is a quadratic irrational, the author's Google Scholar Eigensignal will be "almost" periodic, with a minimal period T=1/Cn(x) which is with Maple:
> T:=1/C(L,nops(L));#find minimal period! (which we used in calculations above)
Now we can construct a Fourier Series approximation for the real Eigensignal with Maple:
> F:=(m,t)->a0/2+add(a(n)*cos(2*n*Pi*t/T)+b(n)*sin(2*n*Pi*t/T),n=1..m):
The two plots:
> p1:=plot(rexpr(t),t=0..T,color=red):
> p2:=plot(F(12,t),t=0..T,color=green):
> display(p1,p2);
The harmonics of the author's real Eigensignal will then be an and the amplitude of the harmonics will be given as cn=|an|. The author's Harmonic Spectrum with up to 10 harmonics is shown below.
> eps:=1e-1;
>
PS:=[[[eps,0],[eps,evalf(abs(a0))]],seq([[eps+n,0],[eps+n,evalf(abs(sqrt(a(n)^2+b(n)^2)))]],n=1..10)]:
> plot(PS,n=0..10);
The 0-th harmonic a0 (dc-term) is the red line (almost supressed). The amplitude of the dominant harmonic (green), which corresponds to the dominant spectral line shown on the author's Eigenspectrum on the table above, is |a1|~11.95 and the author's Eigensignal broadcasts at a frequency f=1/T=C(x)~1.082 Hz.
We now have a Fourier Series approximation. Let's recover the Google Eigenspectrum from it.
> SSS:=xi->evalc(Re(int(F(12,t)*exp(-2*Pi*t*xi*I),t=-10..10))):
> plot(SSS(xi),xi=1..2);
The above is an approximation of the Google Scholar Eigenspectrum for this author. The function jumps very hight exactly at the convergents Cn(x) and in particular at C2(x):
> evalf(SSS(C(L,2)));
119.3746313
We therefore have verified the commutativity in the following diagram:
The publication Eigensignal's minimal period is roughly the time between two adjacent publications. For this author:
> evalf(T*365/30)
11.23774294 (months)[15]
Accordingly, one can now define a x author's blue-shift relative to another author y, via the Google Scholar Metric, as d(x,y)=|C(x)-C(y)|. For example:
>
LG:=[1,13,10,4,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17,3,17];#author
> LP:=[1,22,291,169,56,52,119,159,256,159,256,159,256,159];#author's advisor
> BS:=evalf(abs(L2C(LG)-L2C(LP)));
BS:=0.03090244910
The author's advisor is blue-shifted 0.03090244910 more than the author. x more blue-shifted than y means x has worked harder than y. Notice how high are the blue-shifts of the big guns in Mathematics.
Alternatively one may define an author's red-shift relative to unity. For example:
> RSG:=evalf(L2C(LG)-1);
> RSP:=evalf(L2C(LP)-1);
0.7634989575e-1
0.04544744665
The author's advisor is 0.04544744665 and the author is 0.07634989575 red-shifted away from unity. Smaller red-shifts mean more work.