Dollars for Docs

By Guido Romeo

Journalists always fed on data. Sometimes they even choke on them to the point of losing sight of the overall sense of what they're trying to describe.

In this sense, the most interesting twist brought on by data journalism is not the use of digital data per se, as the development of new and original sets of data. These are de facto a new piece of information that didn't exist before and is originated rather than discovered by reporters.

A wonderful example is the "Dollars for Docs" database developed at ProPublica, the non profit chaired by Paul Steiger, who also sits on Ahref's scientific committee.  

The marvelous work of  Dan NguyenCharles Ornstein and Tracy Weber (to which the claim  "Journalism in the public interest" fits perfectly) have lined up the data on 320 million dollars of payments from eight Big Pharma companies (as Gsk, Eli Lilly, Astra Zeneca, Roche, Novartis e Pfizer) to more than 17.000 doctors across the US.

The result is an amazingly rich and searchable database with the names of practitioners from all over the Union.  The database obviuosly does not cover all doctors and all companies, but the database brings on great benefits that go beyond the news. It's priving itself a great education tool for citizens and, given its impact in the media, many companies are trying to make their compensations and collaborations more transparent.

The project, launched last November, is now an "ongoing investigation" attracting more and more readers who are also able to provide fresh new data and expand the original investigation.
So here's the question that pops to the mind of those willing to play with data themselves: how did the ProPublica reporters lay their hands on such precious material?

The short answer is that last year US courts forced the eight companies to make all those date public.

How this was turned in a database was a much longer process as data was provided in formats that often made very akward scraping it from websites or agregating it (eg Jpeg sheets instad of Excel or Pdf.  Charles Ornstein e Tracy Weber talk about the hurdles they faced in this podcast .

blog comments powered by Disqus
© 2012 Fondazione <ahref | Sede legale: Vicolo Dallapiccola 12 - 38122 Trento - Italy | P. IVA 02178080228 Creative Commons License