This series returns because of one email to the author popular demand. If you have no idea what this is about, please consult parts one, two and three.
As I mentioned in my last post on the Good Judgment Project (GJP), in season 3 I was part of a team of “super forecasters”. My team did OK, but we were significantly less successful than the other “supers”. This season has now come to an end and the project is about to launch the next one in August. I’d like to offer some reflections.
What went well?
We were able to exchange information across “super” groups, and some people were really impressive. I saw spread sheets and discussions that were much more sophisticated than what I expected in this “just for fun” setting. Apparently, the increasingly challenging task (with many rather tough questions and a high work load) really sparked the participants’ ambitions.
From what I could see in the cross-team forum, communication was very lively and mostly helpful. But I’m not sure how much it really mattered in the end. Given the number of questions and the old-fashioned bulletin board format, it was hard to keep up with every possibly relevant bit of information.
My own team brought together people from very different backgrounds, and it was nice to have a round of introductions. Given that the questions belong to different clusters, we proceeded to assign everyone two areas to prioritize and researching and answering questions.
What didn’t go so well?
Judging from the experience in my particular group, not everything was “super” in the end. My new group did not invest much more time and effort than my previous, “normal” teams. After all, the incentives and potential pitfalls are very similar: if communication does not yield results, people stop typing long messages; if there are only three or four active users, the team cannot perform as well; high cognitive load due to many open questions can be discouraging.
One innovation for season 3, the use of (paid) facilitators to ensure smoothly working teams, fell completely short — at least in our group. In theory, the facilitator would have helped us with coordinating tasks and making sure that no items are forgotten. But the person in charge did not really live up to that promise, and the only email I can remember getting from him is a goodbye note. This might be worth trying again, though.
Finally, one lesson I draw from my experience in the newly established and ultimately lowest-ranking “super” team: Being put in an environment that’s supposedly excellent and seeing the amount of work and experience the other teams bring to the table can be intimidating. I’m not sure how much work the GJP organizers should put into creating a positive team spirit and provide regular feedback, but at least in our case it might have helped.
The future of forecasting
I’m curious to see whether season 4 will be able to push the limits of what works in a “just for fun” effort. (After all, teenagers in World of Warcraft guilds spend a lot of time for coordination and planning, too.) It seems that the Good Judgment Project operators are considering some changes to the user interface to help manage the work load. I agree that there is some room for improvements.
I would love to hear what the GJP researchers have to say on the merit of inter-group exchange. Theoretically, it could either lead to group think or help everyone improve by leading to efficient information sharing. Generally, I am not sure to what extent the “meta” discussions and well-meaning exchange of tips that took place between teams are at odds with optimizing performance. Given that everyone has limited resources, maybe this process should be streamlined and formalized. Easy sharing of links to news sources is probably a good idea.
Of course none of that can change the fact that many questions are just impossible to answer with any kind of certainty. For example: predicting the behavior of small groups with secretive proceedings (Vatican, North Korea, Taliban…) or factoring in different layers of scientific and political uncertainty (“will the Swiss lab report that Arafat’s body contained a significantly elevated level of polonium-210?”). But it’s still fun to try your best, and I’m looking forward to season 4.
I may be the only one who has followed this complete series! I joined the GJP in Season 3 and continued in Season 4. After the first year, my score for accuracy put me in the 77th percentile, whereas this year I was in the 96th. Obviously, a year’s experience helped the second time round, as well as a greater emphasis on the process of forecasting rather than the result. However, in my opinion the main reason for my improvement was that I was in a more engaged team. In Season 3, there were no more than 4 regular forecasters (out of 13), whereas in Season 4 there were at least 7 out of 10.
Khalid, good to hear from you! I hope I’ll have part 5 ready soon, with some reflections on the final season.