DataBlog
By Guido Romeo
Data journalism scores a Pulitzer
This year's Pulitzer jury has once more shown a keen attention for innovative applications and approaches to journalism. The investigative journalism award went to Paige St. John, from the Sarasota Herald-Tribune (By Guido Romeo).
This year's Pulitzer jury has once more shown a keen attention for innovative applications and approaches to journalism. The investigative journalism award went to Paige St. John, from the Sarasota Herald-Tribune for her wonderful work exposing the weakness of insurance contracts in Florida putting the assets of million of homeowners at risk.
St. John put two years of work in this investigation producing a great series of stories as well as a unique database that turns out to be an essential resource for understanding what insurance companies are actually doing.
This resource has been given even more added value by the Herald's interactive applications () showing the local risk of different hurricane exposed areas and the differences among contracts.
The context of the investigation is striking for those not familiar with the US home insurance market. In the last five years (in spite of the lack of significant hurricanes) Florida's policy on insurance companies has taken many different stances but the most significant fact is the 350% rise in the pricing of contracts on the coast and the cancellation of more than two million policies. As a consequence, thousands of homeowners are relying on state insurance agencies that many believ too weal to confront a real disaster, leaving many without compensation.
The prize assigned at St John is not only very much deserved, but also stands as a wonderful example of how high quality journalism, digital media and innovation can produce very valuable content able to attract a high readership and affimr the brand of a media.
Last but not least, kudos to ProPublica for its second Pulitzer in two years (and the 18th for its Director Paul Steiger)
Dollars for Docs
By Guido Romeo
Journalists always fed on data. Sometimes they even choke on them to the point of losing sight of the overall sense of what they're trying to describe.
In this sense, the most interesting twist brought on by data journalism is not the use of digital data per se, as the development of new and original sets of data. These are de facto a new piece of information that didn't exist before and is originated rather than discovered by reporters.
A wonderful example is the "Dollars for Docs" database developed at ProPublica, the non profit chaired by Paul Steiger, who also sits on Ahref's scientific committee.
The marvelous work of Dan Nguyen, Charles Ornstein and Tracy Weber (to which the claim "Journalism in the public interest" fits perfectly) have lined up the data on 320 million dollars of payments from eight Big Pharma companies (as Gsk, Eli Lilly, Astra Zeneca, Roche, Novartis e Pfizer) to more than 17.000 doctors across the US.
The result is an amazingly rich and searchable database with the names of practitioners from all over the Union. The database obviuosly does not cover all doctors and all companies, but the database brings on great benefits that go beyond the news. It's priving itself a great education tool for citizens and, given its impact in the media, many companies are trying to make their compensations and collaborations more transparent.
The project, launched last November, is now an "ongoing investigation" attracting more and more readers who are also able to provide fresh new data and expand the original investigation.
So here's the question that pops to the mind of those willing to play with data themselves: how did the ProPublica reporters lay their hands on such precious material?
The short answer is that last year US courts forced the eight companies to make all those date public.
How this was turned in a database was a much longer process as data was provided in formats that often made very akward scraping it from websites or agregating it (eg Jpeg sheets instad of Excel or Pdf. Charles Ornstein e Tracy Weber talk about the hurdles they faced in this podcast .
Killing Roads
By Guido Romeo
«So what are you doing with the data... statistics!?!» I've been getting a lot of this in the last couple of days speaking of the iData project and I've understood that, in particular with journo colleagues, the best answer is an example of what has been done.
A beatiful piece of work - for once not from overseas... - is Killing Roads, a project on road casualties developed for Bergens Tidende, the main daily paper in Bergen, Norway (kudos to Michele Kettmaier for pointing me to Beta Tales ).
The beauty of the project, on top of its impact, is the multitude of levels tipical of data stories that rapidly turns it into a multiplier of possible stories at the local level.
The starting point for Tidende's reporters, teamped up with develppers, has been the data from the Norwegian Road Authority releasing data on 11.400 road accidents all over Norway. The first move of the norwegian daily has been a massively data rich Google Map allowing endless browsing esperiences at national as well as local level.
Every signpost on the map details the exact position, date and info as number of victims and weathr conditions. Reporters quickly understood that the only detail missing were the names of the victims, canceled to protect their privacy.
That was obviously no match for a team of reporters who are used to find names for a living, expecially after they openede the database to colleagues from local outlets even more interestd in digging out theri angle of the story. The result is a disturbingly impacting collective portrait of the victims of road casualties.
Tidende's multimedia journalist Lasse Lambrechts (interviewed here on YouTube), has been one of the main actors of the Killing Road project (which, in his own admission, turned out to be "much much more work rthan expected").
And this is when your listener always throws in that casual question: «Could this be done in Italy as well?!?».
The data is certainly there. Here is an example of what Parma's Province bureau of statistics has done with its local data. (pdf scaricabile) Every casuality is very data rich: metadata range from the kind of vehicle to climate conditions.. But getting them in a usable form (e.g. a nice and tidy excel sheet!) is a different story.
But this is a story project to which we'll come back to explain the issues and possibilities of accessin and scraping darta through legal requests and web tools.
One thing to keep in mind is that Italian roads are the scen of an everyday manslaughter: about 500 accidents avery day causing more than 10 dead and hundreds of wounded.


