Projects‎ > ‎

Computational Journalism

Journalism is at the crossroads. In the past, we have come to rely on investigative reporting by traditional news organizations to hold governments, corporations, and individuals accountable to society. In recent years, there has been an alarming trend in the increasing amount of misinformation, compared with stagnant resources and talents devoted to investigative reporting. This trend has a profound impact on the well-being of democracy. At the same time, there is also an opportunity. With technological advances and the movement towards transparency, the amount of data available to the public is ever increasing. However, the potential of this "democratization of data" cannot be realized with the widening divide created by the growth of data far outpacing the investment in investigative journalism.

Computing is a key to bridge this divide. Computational journalism aims at developing computational techniques and tools to increase effectiveness and broaden participation for journalism---especially public interest journalism---to help preserve its watchdog tradition. In this project, we consider how we, as computer science researchers, can contribute to journalism and help promote computational journalism as an emerging discipline.

At Duke, UTexas at Arlington, Stanford, and Google, our interdisciplinary team of computer scientists, journalists, and public policy researchers is currently focusing on computational fact-checking, to help guard against "lies, damned lies, and statistics"---claims that are factually incorrect, or correct but still misleading. We seek to quantify various measures of "goodness" of claims over data, and develop techniques for computing these measures and rebuking misinformation. We are also working on lead-finding, which helps uncover patterns from data that can lead to interesting, robust claims or news stories.

This project is a collaboration with the IDIR Lab at UTexas at Arlington (project website) and the DeWitt Wallace Center for Media and Democracy at Duke University (project website).


Computer scientists:

Journalism & public policy collaborators:
Graduate students:
Undergraduate students:
  • At Duke: [We are looking for interested undergraduate researchers!]
High school students:
  • At Duke: Dylan Dsouza (Enloe High, 2017), Brandon Wu (Enloe High, 2017)
  • From Duke:
    • PhD: You (Will) Wu (2015; first employment: Google)
    • MS: Rohit Paravastu (2012), Rozemary Scarlat (2012)
    • Undergraduate: Emre Sonmez (2013-2017), Seokhyun (Alex) Song (2014-15), Eric Wu (2014), Kevin Wu (2014), Peggi Li (2014), Andrew Shim (2014), Yubo Tian (2016), Charles Xu (2016), Yuxiang He (2016-17), Yuansong Feng (2017), Dhrumil Patel (2017)
  • From UTexas Arlington:
    • PhD: Hassan Naeemul (2016; first employment: University of Mississippi)

Websites and Demos

  • iCheck analyzes the voting records of the U.S. Congress and lets you compare how legislators vote with party majorities and the President, and more importantly, explore how the comparison stacks up under different contexts---over time, among groups of peers, and for "key votes" identified by lobbying/political organizations.
  • ClaimBuster is a machine learning tool that helps find political claims from text sources to fact-­check. It has been watching the 2016 elections in the U.S. as well as the Australian Parliament.
  • FactWatcher monitors sports and weather data and automatically finds interesting or newsworthy factoids. It won the Excellent Demonstration Award at VLDB 2014.


For a general introduction to computational journalism, see the article below by Cohen, Hamilton, and Turner in Communications of the ACM2011. Our CIDR 2011 paper (which was the third-place winner of the Best Outrageous Ideas and Vision Track Paper Awardoutlined some research challenges from the standpoint of database researchers.

Journalists may find our papers in the Computational+Journalism series over the years helpful.

Database researchers interested in our approach to checking "correct but misleading" claims can read our PVLDB 2014 paper on computational fact-checking and PVLDB 2016 on perturbation analysis.

(If you are having trouble seeing the publications above, please try this link instead.)

Past Efforts

  • Data+Journalism: A Duke Speaker Series (2013-3014). This series was jointly organized by the Department of Computer Science and DeWitt Wallace Center for Media and Democracy, with support from the Office of the Dean of the Faculty of Trinity College of Arts & Sciences, and from the Knight Foundation. Our speakers were:
    • Oct. 7, 2013: Derek Willis, Interactive News Developer, The New York Times
    • Nov. 4, 2013: Brendan Nyhan, Assistant Professor of Government, Dartmouth College, and Media Critic, Columbia Journalism Review
    • Jan. 27, 2014: David R. Karger, Professor, MIT Computer Science and Artificial Intelligence Laboratory
    • Apr. 14, 2014: Jeffrey Heer, Associate Professor, Computer Science & Engineering, University of Washington, and Co-Founder, Trifacta
  • Project course on computational journalism (Spring 2012 at Duke).

     See course website for additional details.

  • FirstPass (2012). Journalists often need to sift through gigantic files containing scanned images of thousands of pages without document breaks (often results of a FOIA request, for example). We developed a system that combines intelligence of humans (through crowdsourcing) and machines to help journalists make first passes over these files.
  • Collaborative query formulation (2012). While in theory we can answer many questions and check many claims by querying public databases, doing so requires significant effort as well as a multitude of knowledge and expertise that are difficult to find in any one individual. We built a system that helped people pool their efforts and skills together in  formulating complex, meaningful queries.

    Funding Acknowledgments

    Our work has been supported by funding from Google, HP, Knight Foundation, and National Science Foundation (on perturbation analysis of data queries).

    Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding organizations.