Search results for: forecaster diary

Mathis Lohaus

The Amateur Forecaster’s Diary Pt. 4

Good Judgment Project

This series returns because of one email to the author popular demand. If you have no idea what this is about, please consult parts one, two and three.

As I mentioned in my last post on the Good Judgment Project (GJP), in season 3 I was part of a team of “super forecasters”.  My team did OK, but we were significantly less successful than the other “supers”. This season has now come to an end and the project is about to launch the next one in August. I’d like to offer some reflections.

What went well?

We were able to exchange information across “super” groups, and some people were really impressive. I saw spread sheets and discussions that were much more sophisticated than what I expected in this “just for fun” setting. Apparently, the increasingly challenging task (with many rather tough questions and a high work load) really sparked the participants’ ambitions.

From what I could see in the cross-team forum, communication was very lively and mostly helpful. But I’m not sure how much it really mattered in the end. Given the number of questions and the old-fashioned bulletin board format, it was hard to keep up with every possibly relevant bit of information.

My own team brought together people from very different backgrounds, and it was nice to have a round of introductions. Given that the questions belong to different clusters, we proceeded to assign everyone two areas to prioritize and researching and answering questions.

What didn’t go so well?

Judging from the experience in my particular group, not everything was “super” in the end. My new group did not invest much more time and effort than my previous, “normal” teams. After all, the incentives and potential pitfalls are very similar: if communication does not yield results, people stop typing long messages; if there are only three or four active users, the team cannot perform as well; high cognitive load due to many open questions can be discouraging.

One innovation for season 3, the use of (paid) facilitators to ensure smoothly working teams, fell completely short — at least in our group. In theory, the facilitator would have helped us with coordinating tasks and making sure that no items are forgotten. But the person in charge did not really live up to that promise, and the only email I can remember getting from him is a goodbye note. This might be worth trying again, though.

Finally, one lesson I draw from my experience in the newly established and ultimately lowest-ranking “super” team: Being put in an environment that’s supposedly excellent and seeing the amount of work and experience the other teams bring to the table can be intimidating. I’m not sure how much work the GJP organizers should put into creating a positive team spirit and provide regular feedback, but at least in our case it might have helped.

The future of forecasting

I’m curious to see whether season 4 will be able to push the limits of what works in a “just for fun” effort. (After all, teenagers in World of Warcraft guilds spend a lot of time for coordination and planning, too.) It seems that the Good Judgment Project operators are considering some changes to the user interface to help manage the work load. I agree that there is some room for improvements.

I would love to hear what the GJP researchers have to say on the merit of inter-group exchange. Theoretically, it could either lead to group think or help everyone improve by leading to efficient information sharing. Generally, I am not sure to what extent the “meta” discussions and well-meaning exchange of tips that took place between teams are at odds with optimizing performance. Given that everyone has limited resources, maybe this process should be streamlined and formalized. Easy sharing of links to news sources is probably a good idea.

 

Of course none of that can change the fact that many questions are just impossible to answer with any kind of certainty. For example: predicting the behavior of small groups with secretive proceedings (Vatican, North Korea, Taliban…) or factoring in different layers of scientific and political uncertainty (“will the Swiss lab report that Arafat’s body contained a significantly elevated level of polonium-210?”). But it’s still fun to try your best, and I’m looking forward to season 4.

Mathis Lohaus

The Amateur Forecaster’s Diary Pt. 3

Good Judgment Project

At the beginning of august, the Good Judgment Project (GJP) has kicked off the third round of forecasting. As I’ve described in two earlier blog posts (#1 and #2), the idea is to come up with the best way to predict the outcomes of political negotiations and elections, or just the likelihood of hypothetical events.

I’ve been promoted to one of eight teams of “superforecasters”. One of the perks coming with this upgrade was seeing the training materials before anyone else. The GJP researchers also invited us to a small conference/workshop in Philadelphia to discuss last season and prepare for the new one. I was unable to attend, but the slides were very informative. In addition, each “super” team now has a facilitator assigned to it, who is meant to help with coordination.

According to the info given by Phil Tetlock, the GJP convincingly won the first two rounds of the tournament. The four competing research programs were now shut down and some of their forecasters joined this project. Judging from the GJP’s repeat success, geopolitical forecasting rewards skill rather than luck. This is supported by the GJP’s internal data: 50 of the 60 top forecasters from season 1 ended up at the top in season 2. So there appears to be less regression to the mean than one might expect.

Slide from Phil Tetlock's presentation, illustrating the scale of possible Brier scores for measuring forecast accuracy
This slide from Phil Tetlock’s presentation illustrates the scale of Brier scores for measuring forecast accuracy

Continue reading

Mathis Lohaus

The Amateur Forecaster’s Diary Pt. 2

Good Judgment Project

Yesterday, all of us in the Good Judgment forecasting team received feedback on round 2, which has just finished. (If you have no idea what I’m talking about, please read part 1 of this series.) Time for me to reflect on the past months.

What kinds of questions were asked?

Early in the season, we were told that it would be more difficult than in the first round. This turned out to be true for three reasons. First, the admins simply asked more questions, making it harder to keep up with the tournament. Second, the share of rather obscure items was higher. Questions like the about “the removal of Traian Basescu from the office of President of Romania in a referendum vote before 1 August 2012” did not immediately ring bells with me. Third, and most importantly, the admins introduced conditional and ordered items.

Conditional items looked roughly like this: “Will Israel invade Gaza?”, (a) “if a Hamas rocket reaches Jerusalem”, (b) “if no rocket reaches Jerusalem”. While it is fairly obvious how the condition is thought to affect the probabilities in this examples, other cases were less straightforward. Anyway, this type of question further complicated the process, given that it offers another possibility to instinctively overstate probabilities, or to construct illogical connections between conditions. Some of them even had two sets of intertwined conditions.

Continue reading

Mathis Lohaus

The Amateur Forecaster’s Diary

Good Judgment Project

One and a half years ago, I signed up for the Good Judgment Project, which is run by a team from University of Pennsylvania and UC Berkeley. The project is one of five competing in a forecasting tournament sponsored by IARPA. Its main objective is to “dramatically enhance the accuracy, precision, and timeliness of forecasts for a broad range of event types, through the development of advanced techniques that elicit, weight, and combine the judgments of many intelligence analysts”.

This post is part one of a series on (amateur) forecasting: First, how does the tournament work?

Continue reading

Mathis Lohaus

Scenarios for European External Relations in 2025

Dahrendorf Symposium

Last week I had the pleasure of attending (parts of) the 2016 Dahrendorf Symposium hosted by Hertie School of Governance, LSE and Mercator foundation. The event focused on European foreign policy. I will summarize the debates on the final day in a separate blog post.

A few months ago, Hertie School hosted a scenario planning workshop as part of the Dahrendorf project. It focused on the EU’s relations to other world regions, trying to draw up scenarios for the year 2025. Meeting in five different working groups, the participants developed scenarios for the future relations between the EU and the U.S., China, Russia and Ukraine, Turkey, and the MENA region. Given my interest in forecasting and curiosity about scenario planning, I gladly signed up and contributed to the EU/U.S. working group.

At the Dahrendorf Symposium last week, Monika Sus and Franziska Pfeifer (who are coordinating the scenario project) briefly described our method and results to the audience. The publication with our 18 (!) brief scenarios is available via the Dahrendorf blog: European Union in the World 2025 – Scenarios for EU relations

The results are interesting and I really encourage you to download the document! Personally, I particularly enjoyed the process. It was a great exercise to think about  basic assumptions we have about transatlantic relations; to identify key drivers relevant for change; and to come up with scenarios that reflect the most relevant combinations of key drivers taking particular directions.

Transatlantic mistrust on tech
Illustrations for the scenario report by Jorge Martin

Let me indulge in a bit of self-promotion and quote the intro to my group’s scenario:

“In the years up to 2025 there will be a situation of balkanised technological regulation in the EU, driven by political debates which emphasise the need to shield national markets and societies against the uncertain effects of technological progress. On the other side of the Atlantic, political leaders will continue to embrace new technologies, with an emphasis on keeping the competitive edge also in terms of offensive capabilities in the cyber and AI realms. Only after a series of trigger events, increasing the pressure on decision-makers, will transatlantic leaders be willing to invest in a new institutional framework to manage the political problems associated with technological progress.” (‘Transatlantic Frankenstein’ scenario)

Then, of course, there was the Dahrendorf Symposium, which included a couple of workshop sessions (that I couldn’t attend) and two round-table panels on the final day. I will put my summary of these discussions into a separate post.

Mathis Lohaus

IR Blog Anniversary #1

Birthday cake (image credit: Wikimedia Commons)
Birthday cake (image credit: Wikimedia Commons)

We’re celebrating one year of IR Blog with some virtual cake and, unless you’re underage, sparkling wine. Many, many thanks to all contributors and readers!

This is a heat map indicating where our readers came from:

visitor-heatmap-year1

Not surprisingly, almost two thirds of our traffic originated in Germany, the U.S., and Canada. Still, it’s nice to see that there is some diversity in the remaining third…

And these are our top-10 post by visits:

  1. “A North American Perspective on Doing a PhD in Europe”
  2. “Impostor Syndrome as a PhD Student”
  3. “Paper Stacks vs. Android Apps”
  4. “Elections in Germany: Forecasts and Polls”
  5. “Nap Your Way to a PhD!”
  6. “The Toddler-Thesis Nexus”
  7. “German Foreign Policy Bingo”
  8. “Protests in Brazil and Turkey: Not Yet Social Movements”
  9. “About ‘The Gender Gap in IR and Political Science'”
  10. “The Amateur Forecaster’s Diary”

We’re looking forward to the next year(s)! Please consider spreading the word if you (occasionally) like what you see here.

Mathis Lohaus

Links: Drones; Forecasting; Ranking Researchers; Surveillance Logic

A combat drone, via Wikimedia commons
A combat drone, because that’s the most photogenic of all topics covered here today… (Wikimedia commons)

I hope you’re having a great week so far! My fellow bloggers have other obligations, so you’ll have to tolerate my incoherent link lists for the time being…

At the Duck of Minerva, Charli Carpenter makes a crucial point regarding the debate on military drones (emphasis added):

In my view, all these arguments have some merit but the most important thing to focus on is the issue of extrajudicial killing, rather than the means used to do it, for two reasons. First, if the US ended its targeted killings policy this would effectively stop the use of weaponized drones in the war on terror, whereas the opposite is not the case; and it would effectively remove the CIA from involvement with drones. It would thus limit weaponized drones to use in regular armed conflicts that might arise in the future, and only at the hands of trained military personnel. If Holewinski and Lewis are right, this will drastically reduce civilian casualties from drones.

I’d like to recommend a couple of links on attempts to forecast political events. First, the always excellent Jay Ulfelder has put together some links on prediction markets, including a long story in the Pacific Standard on the now defunct platform Intrade. Ulfelder also comments on “why it is important to quantify our beliefs”.

Second (also via Ulfelder), I highly recommend the Predictive Heuristics blog, which is run by the Ward Lab at Duke University. Their most recent post covers a dataset on political conflict called ICEWS and its use in the Good Judgment Project, a forecasting tournament that I have covered here on the blog as well. (#4 of my series should follow soon-ish.)

A post by Daniel Sgroi at VoxEU suggests a way for panelists in the UK Research Excellence Framework (REF) to judge the quality of research output. Apparently, there is a huge effort underway to rank scholars based on their output (i.e., publications) — and the judges have been explicitly told not to consider the journals in which articles were published. Sgroi doesn’t think that’s a good idea:

Of course, economists are experts at decision-making under uncertainty, so we are uniquely well-placed to handle this. However, there is a roadblock that has been thrown up that makes that task a bit harder – the REF guidelines insist that the panel cannot make use of journal impact factors or any hierarchy of journals as part of the assessment process. It seems perplexing that any information should be ignored in this process, especially when it seems so pertinent. Here I will argue that journal quality is important and should be used, but only in combination with other relevant data. Since we teach our own students a particular method (courtesy of the Reverend Thomas Bayes) for making such decisions, why not practise what we preach?

This resonates with earlier debates here and elsewhere on how to assess academic work. There’s a slippery slope if you rely on publications: in the end, are you just going to count the number of peer-reviewed articles in a CV without ever reading any of them? However, Sgroi is probably right to point out that it’s absurd to disregard entirely the most important mechanism of quality control this profession has to offer, despite all its flaws.

Next week, the Körber-Stiftung will hold the 3rd Berlin Foreign Policy Forum. One of the panels deals with transatlantic relations. I’m wonder if any interesting news on the spying scandal will pop up in time. Meanwhile, this talk by Dan Geer on “tradeoffs in cyber security” illustrates the self-reinforcing logic of surveillance (via Bruce Schneier):

Unless you fully instrument your data handling, it is not possible for you to say what did not happen. With total surveillance, and total surveillance alone, it is possible to treat the absence of evidence as the evidence of absence. Only when you know everything that *did* happen with your data can you say what did *not* happen with your data.

Mathis Lohaus

Links: Voting reform, Forecasting, PRISM, Germany

gerrymandering-smbc
Detail from “A Simple Proposal to Stop Gerrymandering”, Saturday Morning Breakfast Cereal

Summer break has begun in Germany. Wherever you are, enjoy your time in the sun! In case you’re stuck inside (or using a handheld device instead of just relaxing in the park), here are some links:

  • One of my favorite web comics has an episode on how to reform voting disctricts; it involves strict rules, is based on incentives and public scrutiny, and leaves little room for corruption.
  • The forecasting competition in which I take part (Good Judgment Project) is about to kick off season 3. I plan to cover the next steps here on the blog, in particular because I have now been promoted to “super forecaster” status. Please consider reading part 1 and part 2 of my coverage so far.
  • Edward Snowden’s fate is still undecided and the news about U.S./UK surveillance will probably keep going. For Germany, there is a new angle to the whole story in the aftermath of interior minister Friedrich’s visit to Washington: “many were critical of his trip, saying he was given little information and came across like an obedient school boy” (SPIEGEL).
  • Friedrich is now under fire for suggesting that several terrorist attacks on German soil have been avoided thanks to PRISM; a statement that was not backed up by facts. He also neatly summarized the ‘let’s give up civil liberties for counter-terrorism’ logic: “The noble intention of saving lives in Germany justifies working with our American friends and partners …” (my translation; via law blog)
  • Chancellor Merkel, on the other, is extremely careful not to say anything at all in her recent interviews on the topic.