Chris
Harrison

Royal Society Archive Visualization: 1665-2005

Introduction

The Royal Society recently provided access to an archive of papers published in the scientific academy's prestigious journals. Some 25 thousand scholarly works are represented, which date from 1665 to 2005. Many notable scientific minds are represented, including Isaac Newton, Michael Faraday and Charles Darwin. This interesting data set was ripe for some visual tinkering. The database I used was put together by Brian Amento and Mike Yang of AT&T Labs.

The images are extremely large due to the huge volume of content and the necessity for high resolution print-outs. The entire timeline has been segmented into 10 sections. Contact me for high or custom resolution versions. I think this would be a very unique and educational installation for a hallway or ceiling. The length could range from 10 feet to 10,000 feet (I can render at any resolution). Medium resolution versions are linked from the thumbnails (5000x500 pixels).

The following journals are included:

  • Philosophical Transactions of the Royal Society A (1665-2005)
  • Philosophical Transactions of the Royal Society B (1887-2005)
  • Proceedings of the Royal Society A (1800-2005)
  • Proceedings of the Royal Society B (1905-2005)



Author Distribution

This visualization displays papers chronologically. Paper titles radiate downward from the vertical midpoint at a 45 degree angle. Within a single year, papers are sorted alphabetically. The year a volume was published is shown, centered among it's respective block of papers. The size varies linearly by the number of number of papers published during that year's volume. Authors are shown radiating upwards from the vertical midpoint at a 45 degree angle. Their positions are computed by calculating the average position of the papers they authored. The size of the author's name reflects how prolific they were (linear relationship). Essentially, author names are "centered" above the time period they were active.

Technical Note: Many of the papers in the The Royal Society database are missing author names. This is probably because of the labor needed to copy them from the old texts. In addition, names vary in format and spelling. For example, Edmond Halley is also spelled Edmund Halley, E. Halley and Edm. Halley. To compensate for the latter, names were truncated to single letter first names and full last names (e.g. E. Halley). However, this reduces uniqueness, increasing the likelihood of collisions. To avoid biasing the computation of average dates, a filtering process is applied. The process is roughly as follows: The standard deviation of dates is computed. If the standard deviation is large (which indicates multiple, time-varied, and prolific authors), the name is simply excluded. However, if the standard deviation is sufficiently small, the average date is recomputed excluding outliers. This is often the case if there is one major author and one or more lesser authors. It's really interesting to explore these images! For example, the first section (1665-1710) has Edmond Halley (of Halley Comet fame), Isaac Newton, Antony van Leeuwenhoek (inventor of the microscope) and other famous scholars.

What does this show? Well, you should take a look yourself. Here are some obvious ones:

  • Paper titles generally become shorter over time.
  • In 1763, the journal becomes a yearly publication, causing the number of publications per volume to drop.
  • There is a drastic drop in the number of papers published during World War II (1939-1945).
  • There is a boom of papers starting in 1965 and continuing to the present day. Barend Erasmus emailed me; he believes this was caused by large injections of government funding into science and engineering during World War II and the Cold War (e.g. Apollo program). David Hagen suggests the launch of Sputnik 1 in 1957 + extra science funding + 8 year lag in Ph.D. graduations = climb from 1965 onwards.
  • There are numerous prolific authors in the 1700s, which are unmatched in any other time period. Notable: Edmond Halley, Joseph Banks, William Herschel, Everard Home, William Watson, and John Desaguliers.



Word Distribution

This visualization has the same visual characteristics as the author distribution (above). However, instead of authors, this visualization explores the distribution of words in publication titles. words size is determined with a square root function, which helps dampen extremely common words (i.e. 'the' and 'of'). Only words used three or more times are shown. It's interesting to see how words evolve and fields like photography and electronics emerge.

Some interesting and popular words with their average year:

Fathoms - 1670
Voyages - 1686
Thereupon - 1702
Observationum - 1714
Love - 1751
Refractions - 1742
Intergalactic - 1823
Photographic - 1862
Cadmium - 1870
Magnetic - 1898
Deuterium - 1920
Circuit - 1925
Microprocessors - 1956
Nanotechnology - 2004
Terahertz - 2004

Special Note: Average location can be deceiving. words can have parabolic or other irregular distributions which causes words to "center" above a time periods which may have no relevance. However, after an inspection of the data, I believe this is a limited problem, effecting a small minority of words.



Combination

I considered several designs for combining author and word distribution into a single timeline. Ultimately, I settled with the design below. However, from a visualization viewpoint, this is far less understandable because of overlapping elements. Since the rendering was already plagued with readability issues, I figured I'd go all out and include almost all keywords and authors regardless of significance. The resulting infographic leans more on the side of aesthetics.

© Chris Harrison