DataBlog

By Guido Romeo

Data journalism: a mailing list for Italian users

We are pleased to announce a useful resource for Italy’s data journalism community.  The mailing list <datajournalismitaly@googlegroups.com> has already more than a dozen subscribers and a great discussion plan. To be clear, this initiative is mostly due to the pressing support of Maurizio Napolitano, researcher at Fbk in Trento and Italian ambassador for the Open Knowledge Foundation, and to the extremely positive experience of the Spaghetti Open Data list.
Here is a basic list of topics we plan to cover in the near future:
- best and most inspiring practices in data journalism in Italy and abroad;
- reviews of tools for scraping, analysis and visualization techniques;
- sources and ways of obtaining data when not readily available;
- info on conferences and relevant events worldwide.
Above all, we do hope this initiative will stimulate new contributions and collaborative data-based narratives.
For more info and to subscribe: http://groups.google.com/group/datajournalismitaly
Guido Romeo

The image "My C: Drive", used for the home page, is by bsimser and released under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic (CC BY-NC-SA 2.0)License.

Nov 21, 2011 09:45 AM

New ebooks on data journalism

Will data save journalism? Of course not, but to create innovative information in the public interest we cannot underestimate these tools that are producing valuable results for both media outlets and citizens in the English-speaking world.

In our continuing effort to support this trend, the <ahref Foundation is gathering a sort of "data toolkit”, currently in its beta version and aimed at reporters (in a broader sense) interested in applying these techniques.

Given the few resources available in Italian, we are pleased to announce the release of an ebook titled "Open Data e Data Journalism by Lsdi, an organization devoted to freedom of information. Produced by Andrea Fama, this release takes into account both open data and data journalism – thus integrating two different but convergent strategies that are taking their baby steps in Italy. This project also reveals a large need to increase the level of training and awareness both in the journalism community and in the society at large. (Full disclosure: the ebook also includes positive reviews of the iData research project  promoted by the <ahref Foundation.

Finally, the non-profit Open Knowledge Foundation and the European Journalism Center are working on a comprehensive handbook on data journalism. Aimed at explaining “how you can approach data journalism from scratch with no prior knowledge,” this project is a direct result of a series of collaborative sessions held at the recent 2011 Mozilla Festival in London.

Stay tuned for more details about these and other exciting projects!

Nov 14, 2011 11:20 AM

The Emilia-Romagna Region launches its open data portal: dati.emilia-romagna.it

Joining Piemonte, Lombardia, Veneto and other Italian regions, Emilia Romagna is now also making its own data available to the general public in an open format.

Joining Piemonte, Lombardia, Veneto and other Italian regions, Emilia Romagna is now also making its own data available  to the general public in an open format. Part of the Regional Information Plan 2011-2013, this step is aimed at «increasing the transparency of local governance and enhancing the information assets of Public Administration, in order to strengthen citizen access and participation.»

At the moment, only raw data can be downloaded, but soon a variety of linked data will also be available. Current datasets cover mostly mapping and demographic statistics. After agreeing to the terms of use, users can download them, post comments and suggest new ways for their usability. «These open data initiatives also include an agreement with the Piemonte Region  to share technical elements, information indexing and organizational structures related to both portals – thus pursuing a common evolution of these projects. The Emilia-Romagna Region is also actively involved in the CISIS Inter-regional project “Open Data Italia”, along with several Italian regions. Other activities aimed at further expansion of the public's right to data access, include a regional portal for geographical data (“Geo-portal”) – based on an open data format and fully integrated with current datasets.»

Are these first datasets easy to manipulate by citizens and media professionals? The initial impression is that much more work needs to be done to ensure such practices. However, this is another positive signal confirming the overall trend toward broader openness and transparency in Italy's Public Administration.

(Marco Trotta)

Oct 18, 2011 10:03 AM

How NOT to Lie with Visualization

Data and information visualization has become an experimentation trend in the publishing and artistic fields, resulting in beautiful and high-impact products.  While bending some rules of graphics, many visualization efforts certainly deserve our attention -- such as the artwork by David McCandless and educational productions by Hans Rosling.


However, before managing new recipes, we must be familiar with the various ingredients, Nathan Yau explains on his Flowing Data website and in his beautiful “Visualizing Data”, a great book both for data-viz beginners and experts. We should take therefore into consideration the most useful reference papers on the whole issue – starting with the essential research “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods” by Cleveland and McGill.

In spite of its original publishing date of 1985 in ‘Science’ Magazine, thus seeming to belong in the pre-web age, that paper is still very current. It explains how users tend to visually interpret graphic information, underlying an “understanding” classification based on experimental data -- according the the following list by Nathan Yau:

1. The most immediate and simple visualization is the good old “scatter plot” , which has become Rosling’s flagship.

2. Multiple scatter plots that use axis which are identical but not aligned with each other

3. Histograms such as http://flowingdata.com

4. The ‘reviled’ sliced pie that resembles a business slide-show

5. Visualizations based on such popular charts as bubbles or other shapes.

6. Heatmaps, that is, a table with lighter and darker colors in place of numbers

7. Newsmap charts where each box size is proportional to the news item coverage.

In addition to Cleveland and McGill’s work, Enrico Bertini, data-viz researcher at the University of Konstanz, Germany, lists six more scientific papers on his ‘Fell in love with data’ website. Among those papers, the following two are simple and effortless for beginners in the visualization field:

1. “How NOT to Lie with Visualization” by Bernice E. Rogowitz, Lloyd A. Treinish. As suggested by its descriptive title, the paper details the human eye’s reaction to colors and how to avoid to misleading users, providing the basic skills of building effective color shades that can also be applied to other graphic characteristics.

2. “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations” by Ben Shneiderman. It provides a classification of different visualization techniques based on the various kinds of data available, something particularly useful for publishing projects. Shneidermann also gives an in-depth explanation of what has now become a data-viz mantra: “overview first, zoom and filter, details on demand”— a growing trend in a digital realm pointing more and more toward dynamic visualizations.

Sep 16, 2011 10:30 AM

iData tutorial: Document Cloud

DocumentCloud is a free and open platform for sharing vetted documents developed by three journalists (Aron Pilhofer of NYTimes, Scott Klein and Eric Umansky now at ProPublica) and running on Open Calais.

Here's a quick “How To” on DocumentCloud. This is a free and open platform for sharing vetted documents developed by three journalists (Aron Pilhofer of NYTimes, Scott Klein and Eric Umansky now at ProPublica) and running on Open Calais.

Who can use and access it? “Journalist” here is used in the broad sense and includes whoever takes the time and effort to retrieve and vet a primary source document. This means that among the many dozens of contributors to DC you will find the NYTimes and other big outlets as 60 Minutes, as well as many independent reporters and Ngos. And since a few weeks ago, Ahref Foundation as well.

The technology behind DC is truly powerful and makes it a breeze to annotate and share documents and is paricularly suited to collaborative factchecking excercises.

Free and open do not mean the platform you may just log on and start working. You must register, be verified by the managers ad subscribe their conditions.

Inside you'll find a bit of everything: from Wikileaks docs edited by NYTimes reporters to many other materials, mostly originaying from US sources as well as a few dozens specifically referred to Italy.


Schermata001.png

DC's strong point is its internal semantic search engine and the optic reader that lets you benefit of many analitical functions and allows to annotate collaboratively the same document.

The platform is still in English but editions in other languages are on the way. This means that, at the moment, you may load docs in languages other than English and work on them, but you won't benefit from the most sofisticated semantic tools.


Schermata002.png

The document management interface has two main tabs:

- documents

- entities

In “documents” you may upload new docs and attribute them to new projects in a simple and straight manner as well as share them amon collaborators before making them public to other DC users or pubblishing them on the web

This section also lets you search with serveral operators. “Source: Library of Congress”, for instance wil recall all materials attributed to that source.

Last but not least you may upload new docs, create new projects and invite others to work on them.


Schermata004.png

A double click will open a given document allowing you to chose from different formats and to annotate it privatly or publicly.

Another important and very well designed function is the redaction tool allowing you to cover names and details you want to protect. This is actually much more than a simple patch, because once you're done, the sustem optically scans the document once more and the underlying text disappers from the file, just to make sure nobody can retrieve it (save a copy of your docs somewhere else).


Schermata003.png

The other tab, “Entities” lets you in on another set of functions. The software extracts all relevant details from the documents you selected (people's names, locations, emails figures and important terms).


Schermata005.png

One important function is “Timeline” under the “Analyse” menu that builds a timeframe of the events in the documents helping you to frame the info they contain.


Schermata006.png

The last step is publication,


Schermata007.png

DC proposes different options: download the docs, visualize them as pdf or embed them in a public viewer.

Did I mention it's journalist proof? Anybody can learn to work with it in a few minutes and it's truly fun and powerful.

Enjoy!

 

(By Guido Romeo)

Aug 02, 2011 07:45 PM

The incredibly shrinking Italian schools

Lacking investments and with no reliable evaluation system, Italian schools' education capacity seem bound to shrink ever more rapidly.

With no reliable evaluation system and a persistent lacking of funds, the educational capacity of Italian schools seem bound to shrink even more rapidly.  In its eagerness to contain Italy's enormous debt, the government has compounded the problem with its latest budget cuts to schools and support for student families.

One of the most worrisome consequences is likely to be a further decreased emphasis on tracking the school dropout rate, a widely underreported and very complex issue now at the core of one of Ahref Foundation's collaborative inquiries:  «La scuola abbandonata»..

The project is aimed at uncovering the many aspects of school dropout, both through traditional journalistic reporting and participatory storytelling of those directly involved in the school system, as well as an array of data retrieval. The combination of these elements will be able to more accurately describe what is happening to one of Italy's most vital resources in remaining competitive and fair in the future.

Data on school dropout from Italian provinces provided by  Istat; OECD and Bank of Italy's research center, highlight a high ratio of dropout, with many students failing to graduate from secondary school and often loosing access to any kind of higher education forever.

 

The many reasons behind this dramatic educational bleeding are carefully addressed in this project, but one consequence is already very clear: only 1 out of 5 young Italians between the age of 18 to 24 years hold a secondary educational diploma. This ratio is bound to have dire and profound consequences on Italy's future capacity for innovation and performance in that knowledge economy at the center of Lisbon strategy for Europe.

Finding useful data to describe this situation in detail is the task of the  iData project in the coming weeks. It is not an easy task, as there is no comprehensive database and figures often have to be pooled and linked in order to draw a consistent picture at National level.

In this perspective, a useful and greatly inspiring piece of work is provided by ProPublica's recent investigation on investments and the quality of schools.

ProPublica's Jenny La Fleur and her team scoured a rich government database to produce an impressive series of reports pinpointing which institutions across the United States offer better courses and have the best performing students.

ProPublica's work is very valuable, as their reporters are good at explaining the methods and rationale behind the database construction. Moreover, exploiting the current positive trends toward data journalism and citizen involvement, they have developed an interactive app ("The Opportunity Gap”) which enables citizens to compare the performance of schools (and their districts) across the country. This simple tool provides a greater awareness and an accurate information about specific school districts, including which schools offer the best courses and which schools need improvement.

This is another good example of how an original inquiry, combined with today’s digital technologies, might turn journalism's first rough draft of history into a longer lasting tool of progress and development.

(By Elisabetta Tola)

Jul 19, 2011 10:42 AM

Notes from news: rewired – noise to signal

Journalists, developers, media executives and data specialists from The Guardian, the BBC, Thomson Reuters and other news outlets met to discuss data journalism and social media.

Held at Thomson Reuters’ London offices, the "News: Rewired" event could signal the dawn of a new, promising step for data-driven journalism in both the editorial and business side of the news industry.

Keynote speech

Heather Brooke kicked off the event with a keynote on data journalism, based on her experience as a freedom of information campaigner advocating for more data transparency from UK government.

In Brooke's opinion the main problem facing data journalism in the UK is the lack of availability of meaningful data. For example, it was illegal to disclose the details of fire inspections, gather court information is a lengthy process and arrest reports are not publicly available documents. There is a reluctancy from the public authorities to release data as it is considered crucial to control public opinion.

Data can be the new dawn of journalism in the digital age as in a society overloaded with a constant stream of information coming in from multiple sources time-strapped people would need someone to signpost what is important - and that's what journalists and data specialists should concentrate on.

Data journalism isn't just about learning to use web tools and sofware, it's about having something meaningful to put on the table. This is what juornalists can offer because of their ability to sift through huge amounts of data for what is both important and true. According to Brooke, these skills and access to resources such as time and money to perform this tasks is the only thing that marks a professional journalist out from a citizen.

Asked about justified exceptions to the Freedom of Information Act, Brooke replied that we should think about the costs and dangers of keeping information secret rather than worry about the costs and dangers of making it public.

Some highlights from the sessions:

Resources

As highlighted in the following sessions, the digital revolution cut the cost of producing information opening up the newsmaking process to contribution of new players and offering new tools. However, human resources such as language skills are essential to verify information coming from the social media during the coverage of the japanese earthquake as demonstrated by the work of BBC Monitoring and Guardian.

Use and abuse of statistics

Powerful presentation of James Ball, ex Wikileaks now at the investigative team at the Guardian.

He highlighted how easy is to get data wrong if not managed carefully and how often wrong data gets published, in particular when using eye catching infographics.

Business model

OWNI’s Federica Cocco described as they had put together developers, designers and journalists to do great storytelling and innovative interactive content within a business model which sustains the not-for-profit activities by providing paid multimedia services to clients.

Tracking down eyewitnesses with social media
Interesting talk of Nicola Hughes from DataMinerUK explained how she had used social media tools such as Trendsmap, Tweetdeck and Topsy at CNN to track down eyewitnesses to events and get them on air.

Tips on tools in the data journalist’s toolkit

Tools suggested by speakers to help journalists develop their data stories:

-Google Docs

-Google Fusion Tables and ManyEyes as a data visualisation tool

-Outwit Hub a Firefox plugin which allows you to pull in and export links

-ThinkMap

-Zeemaps to create interactive maps

-Tableau as a data analysis and visualisation software

-Dipity a tool to create timelines

-OpenCalais, the Thomson Reuters’ toolkit of capabilities that allow to incorporate state-of-the-art semantic functionality within blog, content management system, website or application.

(By Andrea Menapace)

 

news:rewired - noise to signal from John Thompson on Vimeo.

Jun 15, 2011 06:10 PM

Nuclear referendum: the power of data

This coming Sunday and the following Monday morning Italians will vote on their governement's decision to restart Italy's nuclear power production, entirely dismantled after the 1987 referendum.

Voters are called to decide not simply on a technology, but also on the long term consequences for the country's health and environment as well on their trust in their delegates (which, at the 150th anniversary of the Country seems at an historical low).

Supporters of the "Sì" (yes) aim at abolishing paragraphs 1 and 8 of article 5 of the governement's proposal while the "No" campaign for pursuing a nuclear infrastructure.

The issue seems slightly less ideological than 14 years ago, as many hardcore environmentalists, including Stuart Brand, author of "The Whole Earth Catalogue" and  more recently "The Whole Earth Discipline"  have expressed themselves in favor of fission, and it is interesting to see how the two fronts have made, at points, very different communication choices.

Forum Nucleare, the non profit association launched to stimulate a debate on atomic energy, has  invested in a video communication aiming to address the issue in a simple and straightforward way. This has backfired as the clip has been widely perceived power and not as balanced as initially announced and quite biased in favour of nuclear.



More innovative is the work of the 12 information designers leading to the Atlante Nucleare (Nuclear atlas). This collaborative project  supporting the Sì front was coordinated by
Gianni Sinni and Cristiano Lucchi, who also conceived it.

This is openly grass-roots communication. Those data are correct, well reported and selected to support the argument. Result: the "Say-No-to-Nuke" line comes across loud and clear.

Among  the 12 infographics released under a Creative Commons licence to encourage republication in print and on the web, it is worth pointing out the first one, describing the growth of opposers to atomic energy as measured by Ipsos; as well as number seven, reacting to Professor Umberto Veronesi's communication blunder. Veronesi, oncologist, former health minister and now president of Italy's Nuclear safety agency, said on tv nuclear waste canisters are so safe you could store them in your bedroom.

Last but not least, we have a map of the country's eligible nuclear plant sites (table 6), drafted by Enea (enea.it), earlier known as Cnen, as well as the comparative grid (table 4) on the actual price of kilowatts produced from alternative sources, fossil fuels, and uranium.

A more in depth view fo the choices to be made comes from Marco Cattaneo, Le Scienze's editor in chief  in a recent blogpost.

The problem,says Cattaneo (who also explains why the referendum is technically useless), is not how much we like nuclear, but those 29.000 MegaWatt. This, as shown by Terna's homepage. is the lowest average level of energy demand reached in Italy and those KW must always be produced.
Some arresa of italo are well endowed with sun and wind as data fremo Jrc and Italy's wind alta show. But not all, and those 29.000 MW must be produced even at 4:00 am of a night with no wind.

Maybe, as Luca De Biase suggested a few weeks ago  in the wake of Fukushima's disaster, it's time to discuss Italy's strategic choices on energy supply with a more open mind.
The first step is improving information on what is actually at stake if Italians don't want to end up as gas-dependent citizens. In fact, according to the Word Energy Outlook this is the fastest growing energy source.

(By Guido Romeo)

Jun 09, 2011 10:37 AM

Data always confess

Preparing your data for a media story is a a tricky and complex task, often quite different from what scientifc researchers do.

«If you torture them long enough, data always confess» was the mantra of one omy statistics professors in College. The ones feeling on the waterboard, I should say, where much more often his students, but the line keeps popping up when I collect and assemble a dataset for a story.
Preparing your data for a media story is a a tricky and complex task, often quite different from what scientifc researchers do. Sometimes it's smooth and quick, some others it's a real pain.
Marc McCormick and others from Guardian's DataBlog have summarized what thety do to their data before readers see them. This is worth showing as it's a useful blueprint of the steps to take in working your data.
The Guardian team, who presented wonderful things at the Festival di Perugia, is also opening up its toolkit reavealing some goodies in their Open Platform. As for iData, although we don't have the same resources, we're not sitting still.
Here's a first round of tool benchmarking completed by Elisabetta Tola (twitter) for the project.
This blog will come back on single tools and how to use them. Comments on these and others are very welcome.
May 16, 2011 11:45 AM

Data journalism scores a Pulitzer

This year's Pulitzer jury has once more shown a keen attention for innovative applications and approaches to journalism. The investigative journalism award went to Paige St. John, from the Sarasota Herald-Tribune (By Guido Romeo).

This year's Pulitzer jury has once more shown a keen attention for innovative applications and approaches to journalism. The investigative journalism award went to Paige St. John, from the Sarasota Herald-Tribune for her wonderful work exposing the weakness of insurance contracts in Florida putting the assets of million of homeowners at risk.

St. John put two years of work in this investigation producing a great series of stories as well as a unique database that turns out to be an essential resource for understanding what insurance companies are actually doing.

This resource has been given even more added value by the Herald's interactive applications () showing the local risk of different hurricane exposed areas and the differences among contracts.

The context of the investigation is striking for those not familiar with the US home insurance market. In the last five years (in spite of the lack of significant hurricanes) Florida's policy on insurance companies has taken many different stances but the most significant fact is the 350% rise in the pricing of contracts on the coast and the cancellation of more than two million policies. As a consequence, thousands of homeowners are relying on state insurance agencies that many believ too weal to confront a real disaster, leaving many without compensation.

The prize assigned at St John is not only very much deserved, but also stands as a wonderful example of how high quality journalism, digital media and innovation can produce very valuable content able to attract a high readership and affimr the brand of a media.

Last but not least, kudos to ProPublica for its second Pulitzer in two years (and the 18th for its Director Paul Steiger)

Apr 21, 2011 09:10 AM
© 2012 Fondazione <ahref | Sede legale: Vicolo Dallapiccola 12 - 38122 Trento - Italy | P. IVA 02178080228 Creative Commons License