First let's look at the data broadly. The most significant result is no result...nearly fifty percent of the success factors have zero correlation to either the popularity or scholarly success of these projects. This is not unexpected and part of the reason why we separated results for distributed computing projects in the first place. But it still merits discussion. Looking closely at the data all of these items show absolutely no difference between projects in their success ranking factor. In other words, each project addresses these factors in the same way or they just aren't expressed at all. So there are a more limited number of factors that differentiate the projects, limiting the potential factors for success. A positive interpretation is that project designers only need to focus on a few key elements to make a highly successful project since most will not impact the result. However it could also be interpreted as proof that further research is needed on success factors for distributed computing projects beyond what we've done here. While I do promise to address that in future posts, let's see what the data we do have says.
Now let's look at what effects the scholarly scholarly impact of Distributed Computing projects. By fat the largest correlation, almost 1:1, is Providing Data Access. This confused me for a while and made me double-check my data to make sure it wasn't just one outlier or a series of unreliable data points causing the trend. And that's what it turned out to be. The only project that does anything different in this aspect than other projects is SETI@Home, and the overall success of that project is likely showing that correlation. So it may be a real factor but there is not enough data here to prove it. I also need to review the other projects more closely on this factor to make sure the lack of differentiation seen in my experience truly holds up. that may be a future post as well, and I would encourage anyone who sees differences from their experience to let me know in the comments below.
Be audacious has a very similar problem. While there are differences with two projects (and not just one), it's still a minor differentiation and nothing I can build a statistical argument on. So we will also need to disregard that item for lack of evidence. So what does that leave us with? The correlation chart of success factors for scholarly success below shows the answer:
Entertain | n/a |
Reward | 0.006933365 |
Challenge | n/a |
Educate | n/a |
Motivate the User | n/a |
Create a Community | n/a |
Interact in Real Time | 0.006933365 |
Provide Feedback | 0.699078158 |
Offer Excitement | 0.515624856 |
Encourage Dialogue | 0.444842746 |
Provide Data Access | 0.967159265 |
Allow for Errors | n/a |
Be Audacious | 0.705797165 |
Stay Focused | n/a |
Make it Convenient | n/a |
Make Learning Easy | 0.567665542 |
Make Participating Easy | 0.071321992 |
Tthe importance of Providing Feedback sticks out the most to me. This data point has significant variation as project designers work to varying degrees to provide timely updates on the project and it's results. Some use e-mail newsletters, some update information on their web site, some provide usage statistics in the program interface, and others use combinations of these techniques. This helps keep projects popular by keeping them in the participant's (and the public's ) eye, and keeps people motivated by showing the benefits they provide to science, but how is this a scholarly success factor?
The important thing to remember is that distributed computing projects typically have a single goal (or set of goals) that don't change over time. Researchers design the project for a very particular problem and use the brute force of everyone's computers to solve it. So there is no benefit to a project only half-completed...the problem remains unsolved and the work already done gets discarded. So if there are not enough participants to see a project to it's end there will be no results and all the time will be wasted. Meaning no final result and no academic papers. So even though this is a popularity measure it is vital to scholarly success.
Speaking of popular success, below is the correlation chart for those success factors:
Entertain | n/a |
Reward | -0.075083869 |
Challenge | n/a |
Educate | n/a |
Motivate the User | n/a |
Create a Community | n/a |
Interact in Real Time | -0.075083869 |
Provide Feedback | 0.021533374 |
Offer Excitement | 0.231458187 |
Encourage Dialogue | 0.052357277 |
Provide Data Access | 0.274326387 |
Allow for Errors | n/a |
Be Audacious | 0.144201148 |
Stay Focused | n/a |
Make it Convenient | n/a |
Make Learning Easy | 0.053787176 |
Make Participating Easy | 0.077344773 |
The important idea on this chart is the need to "Offer Excitement". This should not be a surprise when you think of how projects gain popularity in the first place. Most bring people in through existing citizen science portals (such as OpenScientist) advertising them, or more likely, from people reading popular press articles about the project. The press loves an exciting story and will focus most on projects with the best narrative. So in some ways it is not directly related to what we consider scientific success. Except for one important thing.
Remember that from a participant's point of view, none of these take significant time, energy, or resources. There is minimal operational time and many even share the same infrastructure (such as the BOINC platform). So projects can't differentiate themselves on ease of use or on a network infrastructure effect. Instead they compete by how exciting a project is. The more exciting, the more participants will join and the more computing cycles will be performed. So it is a necessary component to project success, bringing in a critical mass of participants and ensuring enough computer time will be devoted to the project.