Monday, September 12, 2011

Is Distributed Computing Really Citizen Science?

For the past two weeks we've been working on a definition for citizen science.  After much research and many good reader comments we came up with a pretty good definition last week.  It covers the large range of activities performed by citizen scientists while still providing room for "professional" scientists to be involved as well. 

Most people were very happy with this answer, but there is one issue a commenter brought up that is worth a much greater discussion.  He noted that he considers "...citizen scientists to be providing either brute force observation or brute force analysis. I think this would (rightly) exclude someone, for example, who installs SETI@home (or similar application) as they are not providing a service that couldn't be provided by a super computer."  My initial reaction was to disagree as this blog has considered SETI@Home and other Distributed Computing projects as Citizen Science.  But it definitely merits further discussion.

First of all, there are a few reasons weighing against Distributed Computing as part of Citizen Science.
  • Lack of Scientific Involvement: This argument can be made in two ways.
    • In the first, citizen users are not actively participating in science by running these programs. They are just observers donating resources to the cause. They do not perform any independent research, develop hypotheses, make predictions, design the program, or analyze the results of the program.  Basically, they are not involved with any part of the scientific method.  This is the argument our commenter initially made.
    • There is also the related argument that the algorithms being run are not scientific, or at least are not part of the scientific method. The computer program itself is not engaged in research, hypothesis development, or analysis. It is just running numbers in the exact same way it does to to run an accounting spreadsheet or even play a YouTube video. When it comes down to it, all those videos are just computer calculations displayed as cute videos. So unless the Distributed Computing program performed some other tasks besides number-crunching, it would not be scientific in nature.
  • Non-Scientific Purpose: Many Distributed Computing projects do not have a scientific focus. When used for data mining stock tables to find a hot tip, or to crack a heavily-encrypted message, it is not being used to understand natural phenomena and is thus not scientific. So it is tool useful for science, but it not it's own field or branch of (Citizen) science. This views distributed computing like spreadsheet analysis...useful for science but not a science itself.

Then there are a number of reasons I've included it so far and that merit continuing consideration of Distributed Computing as a part of Citizen Science.
  • Historical: Over the last 15 years distributed computing projects have been some of the most-cited examples of Citizen Science in both scientific journals and mainstream media.  Much of the credit goes to SETI@Home which re-invigorated the field when it began in 1999.  Their development of an easy-to-use and simple-to-understand project gained the project rapid public recognition.  Writers build upon this recognition as an effective way to describe the Citizen Science concept.  So despite being eleven years old it is still one of the first examples writers use when describing Citizen Science.  Rghtly or wrongly, the common convention has become including Distributed Computing as part of the field.
  • Common Uses:  A quick look at lists of distributed computing projects show a large number, (usually over a majority) are utilized for "traditionally" scientific projects.  These chemistry, biology, pharmacology, and physics projects are designed to explore and test hypotheses according to the scientific method.  Distributed computing can be applied to many other fields too, such as computer science, mathematics, cryptology and economics that many people consider part of the sciences (and I'm not getting into that argument here).  But even taking a highly conservative approach that doesn't include these as sciences, distributed computing is clearly an important scientific tool in the more "traditional" areas.
  • Ladder to Increased Citizen Invovlement:  The lack of personal involvement that some see as a negative can also be considered a strength.  Looking at the Citizen Science field there are many ways to get involved and many different levels of involvement.  The highest level of involvement usually requires large amounts of time, energy, dedication ,and knowledge, and it takes a while to get to that point.  Many users instead play around with many different projects until they find the ones that are most interesting, then spending most of their time on just those few.  So we need to see Distributed Computing projects as part of a ladder to increasing involvement.  Even if it's just the first rung, we shouldn't discount this starting point for public scientific involvement.
  • Potential for Expanded Uses:  Some of the criticisms of Citizen Science revolve around it's lack of involvement by users.  Once a program is downloaded the computer does all the work...there is no intervention by the user.  Although that's the current model I don't believe it will remain that way.    There's no reason the Distributed Computing can't utilize more involve the user and combine the best aspects of human creativity with the brute-force calculations of computers.  In this scenario a problem is presented that requires a user to make certain assumptions, hypothesize the starting value of certain variables, or do independent research that sets the project's initial state.  Once that is done then a computer can run a series of time-consuming algorithms, presenting both the initial conditions and results to a central server.  For example, this could work well for developing a global climate model.  The computer program can crunch numbers and provide 50 years worth of temperature projections but it still requires input of initial temperatures of the relative impact of variables on each other.  So a person can guess that humidity is twice as important as cloud cover, or can do independent research to set better starting points for current temperatures.  Then the computer can run the program based on those conditions.  Or once the computation finishes the user can analyze the final product and modify assumptions in case the results seem faulty.  All of these future implementations would allow Distributed Computing models to include much more significant user involvment is currently provided.

Looking over these arguments I'm still strongly swayed that Distributed Computing does belong as an important aspect of Citizen Science.  Some of the strongest arguments are for areas where the field may lead and the potential it has.  Just because we are in the field's (relatively) early stages is no reason to discount it as a science.  Just writing this post the answer became increasingly undeniable to me (hopefully that didn't introduce a bias in my arguments).  I'm also loathe to go against the many experts who also agree that it's a science, though that's not my ultimate argument.  At the end of the day Citizen Science is all about incorporating unlikely elements into scientific research, and recognizing the scientific value of including everyday people.  So it seems disingenuous to deny Distributed Computing fans the title of Citizen Scientist.  There efforts deserve recognition too.

So those are my thoughts.  But does anyone out there disagree?  I've tried to provide some arguments on both sides though there may be many more I've missed.  Or I may have misconstrued some arguments already out there.  So let me know about it in the comments below.  I'd love to see the conversation continue.

No comments:

Post a Comment