March 31, 2009

Scientific user-generated content

On Monday we had a 4.3 earthquake centered near San Jose. It was the strongest earthquake I recall in 15 years, probably because I was on the 5th floor of a concrete building that swayed for more than 10 seconds.

I quickly went to earthquake.usgs.gov to verify the magnitude. But what I hadn’t noticed is that they also have an automated system for gathering and displaying citizen responses. The system gathers location data (by zip or street address) and then walks through a structured questionnaire to classify local intensity from I (not felt) to X (very heavy damage).
This is a great example of user-generated content, in some ways better than Wikipedia. There are less motivations for bias (than, say, editing a post on abortion of George Bush). There are a larger number of reports, quickly, that tend to minimize the effect of error by any one contributor.

Most importantly, unlike Wikipedia, the aggregation of earthquake observations does not require any coordination or personal editing to aggregate the disparate contributions into a coherent whole.

Will we see self-reported epidemiology? Alas, between hypochondria and the litigation lottery mentality in this country, there is a much higher risk of self-serving bias for such reports than for earthquakes.

2 comments:

Jeremy said...

This seems like a pretty interesting area for user-driven content, but the question that then comes to mind is what enterprising soul is going to come up with some actual cool usage models for having this. And also, how do you get people to contribute and care outside of an "event"

Joel West said...

I think the latter is an example of the strength of this kind of UGC.

The contributions happen during a particular time, they are auto-tied to a specific locus (in this case a seismic event), and users have a high level of motivation to contribute. I think the immediacy and motivation also reduce some of the bias risks.

To me, the question is perhaps how to aggregate this event-driven content in new ways. Could we determine that certain regions of the Bay Area are more sensitive than others? This could suggest other areas prone to liquefaction or other local effects, which was a huge issue in Loma Prieta.

The other question -- perhaps to your point -- is how this generalizes beyond the government gathering data on natural disasters. (I could see this would work well with tornados, unless of course your PC was in the mobile home that got decimated by the twister).