I was curious about how people used the internet. Specifically, I wanted to see how internet behavior changed over the course of a day. Search engines are the gateway to the internet for most people, and so search queries provide insight into what people are doing and thinking. I had several assumptions before I started:
I was curious to see if data from search engines would support my anecdotal observations. I built a simple clock-like visualization that displays the top search terms over a 24-hour period. Displaying search terms in a cyclical layout (like a clock) allows continuous examination of trends that would otherwise be broken up. The data I had access to was both large and noisy. In response, I combined hourly data into week or year averages. All search strings were broken up into single words (period, commas and similar were considered whitespace as well). This helped pool frequent terms, and better illuminate search motivation (e.g. “information about taxes” and “information about chinchillas” counted as two hits for "information"). The top five search terms were shown for each hour, sized to reflect their relative frequency (larger = more popular). A list of stop words was developed to eliminate uninteresting terms (e.g. that, for, an, not, free). I have not modified the data in any way – you see it as it is.
Some might be wondering if international users in different time zones impacted the search distribution. This is probably true. However, my guess is that most users were based in North America (especially for Magellan in the late 90s and AOL in general). The data seems to support this as well, with search activity slowing down at night (western hemisphere time).
I ran the visualization with two unique data sets:
Magellan Voyeur Data Visualization
Magellan, search engine of yesteryear, offered a service called Voyeur, which displayed the last 10 search queries. Brian Amento of AT&T Labs archived this data in 10-minute intervals from 1997 to 2001. There are gaps in the data set from outages and changes to the Voyeur service. However, these events are assumed to be random, and thus have little impact on the distribution of search terms. Furthermore, because the data spanned a four-year period, I combined hourly data into yearly averages, which further helped to compensate for gaps and noise.
This data set is interesting for a few reasons. Foremost, it is more than decade old. People were searching for different things back then, and it shows. Secondly, the data spans a multi-year period, which helps exaggerate overarching trends. Lastly, and perhaps most importantly, Magellan was used to search for a variety of content by a diverse user group (including people at work, unlike the AOL data set).
The inner most ring is the average for 1997. Rings then work outward one year at a time until 2000. 2001 was not included because only a fraction of the year was collected. The size of the font is a linear relationship with the number of times the term appeared in that hour (e.g. 100 hits = Courier size 100). Time is EST.
I could explain every trend for you, but half the fun is exploring the data! For those who are lazy, here are some major (and obvious) trends to get you started:
1999 & 2000:
AOL Data Visualization
The AOL data set will live in infamy for it's much hyped breach of privacy. The data is a nice compliment to the voyeur data set as it is different in a several important ways. First, it is significantly larger (~30 million search queries). Secondly, the data was collected from March to May, 2006, a three-month period, and for a subset of users. Third, AOL caters to a very different user demographic; it is primarily targeted at home users, and thus, search queries seem to reflect more personal and less work-related topics. Adding to this difference is the fact the population on the internet has dramatically changed since the late 90s.
This data only spans nine weeks, and searching trends seemed to have changed little over this period (unlike the drastic differences in the multi-year voyeur data set). However, this is not necessarily bad – it simply shows that searching behavior on a weekly basis is not that volatile.
The most obvious trend is that myspace is popular - searches for the social website increase as people get home from work and fall off as people go to bed.
Perhaps more revealing are the second through fifth search terms. eBay picks up in the afternoon and evening period as one would expect. Entertainment related terms (lyrics and games) grow from 4pm onwards until bedtime. Sex and other porn-related terms are prevalent at night, starting around 11pm, although their frequency pales in comparison to daytime searches. Civic terms, such as state, county, gov and Florida are surprisingly ubiquitous, although mostly popular during the workday. Is AOL's average user a retired Floridian?
There are a few week-specific blips. Some are explainable, such as "Easter" and "happy" (see 24:00 hours on 6th ring out, aka the week before easter in 2006). I have no clue why "profileedit" and "myspace" become so popular in the 8th week (22-23:00 hours). Adultfriendfinder(.com) is also popular for a week (5th ring out, 3am-6am).
|© Chris Harrison|