Tuesday, July 10, 2012

Proving My Theories of Success (aka Keeping my Fingers Crossed!)

Over the last few weeks I've laid out my theory for predicting success in citizen science projects.  I've quantified that theory and evaluated the likely success of all the projects highlighted so far on my blog.  I've also collected results data including public popularity (Google hits) and peer-reviewed paper citations (google scholar).  I've even scrubbed the data and run some initial tests.  So let's see if my theory holds up.

First, just a quick note on the data.  I'm still scrubbing it a tiny amount as I have time to dig deeper into each result.  So you'll see a new category added in my Google Scholar search footnotes for "individually reviewed".  In these cases a project, such as the Sungrazer project, was providing odd results so I did some much broader searches and manually picked out the relevant hits.  It's time consuming and not necessary for all the projects, but in this case it was useful (and called for by the ambiguous initial results I'd had).

Now back to the results.  First I did some preliminary checks of the hypothesized success ranking against google popularity.  This was not successful.  There was too much going on and not strong correlations.  I'd go through the results but it's pretty dull and uninformative.  So I next turned to the rankings compared to Google Scholar results.  This was much a bit more promising, especially when using the Google Scholar rank (compared to all results) versus hypothesized success score.  The results are below:

There are two main things of interest.  First, note the regression line and it's slightly positive slope.  This would tend to disprove my thesis, but look more closely at the groupings.  We have a sizable group of highly-successful projects all with high hypothesized success scores in the upper right hand corner.  Just where they should be under the theory.  But the mass of projects below are what pull away the regression.  So looking a bit closer and controlling for the type of project, we get a cleaner result that also fits the theory much more nicely.

The most logical place to start is controlling by primary area of science.  All these projects focus on approximately the same things so they should provide good comparisons.  And indeed they do:

The only problem above is with the Astronomy set which shows only a very small correlation (a zero-slope regression line). I have some answers for that as well, but that will be a whole other posting! I also want to dive even deeper into each success factor to further control our variables.

Overall this is a much better result.  All the regressions provide a negative slope, meaning the higher the hypothesized success rate the greater the number of peer-reviewed papers generated by that project.  So clearly there is something going on here.  This also fits in with our earlier discussion of separating projects types when analyzing success, such as distributed computing vs. non-distributed computing (aka "interactive") projects.   But what exactly is going on here?  There are still more clues hiding in the data.

So far we've used an aggregate hypothesized success score using every single trait I identified and weighting them equally.  But are some more important than others?  Do some have very little impact on project in the comments below and we can check them out in the next set of data runs. 

UPDATE: Hyperlinked to previous posts explaining the concept of "Hypothesized Success Ranking" and added R-squared values to charts to better indicate how well the regression line fits.  My next post will be much more statistically-based so I was holding off the math a bit in this post.  But in hindsight some additional data is necessary. 


  1. What is the source and methodology behing the "hypothesized success ranking"? I may have missed it in an earlier post. If that is the case, I apologize.

  2. Good question...I was referring to the projects scoring highest uner my ranking system (http://www.openscientist.org/2012/06/ranking-citizen-science-projects-by.html). Under my theory those should be the most successful, and that's what I'm trying to show. Hopefully. :)

  3. ah yes, this makes good sense. Truly enjoying the exercise.

  4. Lamentably, in the event that you are not watchful, you can resemble a ton of the borrowers who end up getting restricted into a consistent obtaining cycle.
    Cash Advance Chicago

  5. Admonished is forearmed! So it is with payday advances. In the event that you realize what this sort of loan will cost you, you will be in a superior position to measure the advantages and disadvantages of settling on it. Along these lines you will likewise know precisely what sum you have to pay back to the payday credit organizations. payday loans corona

  6. The next time I read a blog, I hope that it doesnt disappoint me as much as this one. I mean, I know it was my choice to read, but I actually thought you have something interesting to say. All I hear is a bunch of whining about something that you could fix if you werent too busy looking for attention.


  7. Students who are interested in downloading the WBBSE 9th class Question Paper 2023 will be able to do so in the final week of May. The board will release the WBBSE 9th Question Paper 2023 one and a half months after the test. The investigation was carried out. It was held at several examination centres around the state of West Bengal. WBBSE 9th Question Paper 2023 The 9th grade test is a crucial exam in a candidate's career. The 9th grade grades are extremely significant for all candidates; they are necessary in both the public and private sectors. WBBSE 9th Question Paper 2023 is now available on the WB official website West Bengal Class 9th Question Paper 2023.