Date of Original Version



Working Paper

Abstract or Description

Forecasting the future hinges on understanding the present. The web—particularly the social web—now gives us an up-to-the-minute snapshot of the world as it is and as it is perceived by many people, right now, but that snapshot is distributed in a way that is incomprehensible to a human. Much of this data is encoded in text, which is noisy, unstructured, and sparse; yet recent developments in natural language processing now permit us to analyze text and connect it to real-world measurable phenomena through statistical models. We propose text-driven forecasting as a challenge for natural language processing and machine learning:

  • Given a body of text T pertinent to a social phenomenon, make a concrete prediction about a measurement M of that phenomenon, obtainable only in the future, that rivals the best-known methods for forecasting M.

We seek methods that work in many settings, for many kinds of text and many kinds of measurements.

Accurate text-driven forecasting will be of use to the intelligence community, policymakers, and businesses. The use of statistical models is the norm of natural language processing methods, making it straightforward to develop models that provide posterior probabilities over measurements. Evaluation and comparison of forecasting algorithms is straightforward and inexpensive. We present encouraging recent results across several domains, emphasizing that a broad suite of forecasting problems and text sources will best support progress on this task.

Further, advances in text-driven forecasting will have broad impact in natural language processing, giving a concrete, theory-independent platform that encourages exploration of new ideas for tackling various aspects of text-oriented computational intelligence.