At the beginning of august, the Good Judgment Project (GJP) has kicked off the third round of forecasting. As I’ve described in two earlier blog posts (#1 and #2), the idea is to come up with the best way to predict the outcomes of political negotiations and elections, or just the likelihood of hypothetical events.
I’ve been promoted to one of eight teams of “superforecasters”. One of the perks coming with this upgrade was seeing the training materials before anyone else. The GJP researchers also invited us to a small conference/workshop in Philadelphia to discuss last season and prepare for the new one. I was unable to attend, but the slides were very informative. In addition, each “super” team now has a facilitator assigned to it, who is meant to help with coordination.
According to the info given by Phil Tetlock, the GJP convincingly won the first two rounds of the tournament. The four competing research programs were now shut down and some of their forecasters joined this project. Judging from the GJP’s repeat success, geopolitical forecasting rewards skill rather than luck. This is supported by the GJP’s internal data: 50 of the 60 top forecasters from season 1 ended up at the top in season 2. So there appears to be less regression to the mean than one might expect.
There was, however, room for points of criticism. One is that the tournament so far has asked very rigorous questions, but these did not necessarily connect to very relevant problems. Solving this requires thinking about the link between individual events and bigger, theory-driven questions in Bayesian terms. Second, the focus on predicting a certain event might detract from very serious tail risks. Sure, we can be relatively confident in our prediction that a confrontation in region X causing 50 casualties is unlikely, but shouldn’t we spend more time thinking about the potential of a far more deadly incident? Tetlock proposed to think about early warning indicators that should trigger contingency plans, since just acknowledging that there can always be “black swans” doesn’t really help with planning. So ideally, each of the forecasting items should be related to (i) its implications for competing hypotheses about broader geopolitical developments, and (ii) an assessment of its possible association to long-term, high-impact risks.
After just two weeks of forecasting (and no resolved items), we don’t have a measure of performance yet. But my team is extremely active in the forum and in the comments section for each forecasting item, which is great! As I have noted previously, interaction within teams is crucial. While there’s always the danger of “group think” and overconfidence, lively discussions are beneficial:
- new information is disseminated quickly
- different styles of analysis are combined: one team member might be good at calculating odds using historical base rates, while another adds on-the-ground experience or a great overview of news reports
- team members can spot and correct logical inconsistencies in each others’ comments; or just oversights: sometimes you provide a great analysis, but then vote “95% yes” instead of “95% no” by accident
- there’s a certain peer pressure to regularly assess your predictions; or to put it more positively: I’m far more motivated by the “committed teamwork” setting than in a prediction market or when confronted with an inactive team
- seeing other people deliberate allows for a good assessment of your own (relative) knowledge regarding the respective question
I’ll post an update about the new questions and my first experiences in season 3 once a number of questions have been closed/resolved. I’m very curious how this experiment will continue…
The article made me laugh until I suddenly lost my conviction that it was intended to be funny.
Personally, I find involuntary comedy highly enjoyable.
Care to explain?