iData tutorial: Document Cloud

DocumentCloud is a free and open platform for sharing vetted documents developed by three journalists (Aron Pilhofer of NYTimes, Scott Klein and Eric Umansky now at ProPublica) and running on Open Calais.

Here's a quick “How To” on DocumentCloud. This is a free and open platform for sharing vetted documents developed by three journalists (Aron Pilhofer of NYTimes, Scott Klein and Eric Umansky now at ProPublica) and running on Open Calais.

Who can use and access it? “Journalist” here is used in the broad sense and includes whoever takes the time and effort to retrieve and vet a primary source document. This means that among the many dozens of contributors to DC you will find the NYTimes and other big outlets as 60 Minutes, as well as many independent reporters and Ngos. And since a few weeks ago, Ahref Foundation as well.

The technology behind DC is truly powerful and makes it a breeze to annotate and share documents and is paricularly suited to collaborative factchecking excercises.

Free and open do not mean the platform you may just log on and start working. You must register, be verified by the managers ad subscribe their conditions.

Inside you'll find a bit of everything: from Wikileaks docs edited by NYTimes reporters to many other materials, mostly originaying from US sources as well as a few dozens specifically referred to Italy.


Schermata001.png

DC's strong point is its internal semantic search engine and the optic reader that lets you benefit of many analitical functions and allows to annotate collaboratively the same document.

The platform is still in English but editions in other languages are on the way. This means that, at the moment, you may load docs in languages other than English and work on them, but you won't benefit from the most sofisticated semantic tools.


Schermata002.png

The document management interface has two main tabs:

- documents

- entities

In “documents” you may upload new docs and attribute them to new projects in a simple and straight manner as well as share them amon collaborators before making them public to other DC users or pubblishing them on the web

This section also lets you search with serveral operators. “Source: Library of Congress”, for instance wil recall all materials attributed to that source.

Last but not least you may upload new docs, create new projects and invite others to work on them.


Schermata004.png

A double click will open a given document allowing you to chose from different formats and to annotate it privatly or publicly.

Another important and very well designed function is the redaction tool allowing you to cover names and details you want to protect. This is actually much more than a simple patch, because once you're done, the sustem optically scans the document once more and the underlying text disappers from the file, just to make sure nobody can retrieve it (save a copy of your docs somewhere else).


Schermata003.png

The other tab, “Entities” lets you in on another set of functions. The software extracts all relevant details from the documents you selected (people's names, locations, emails figures and important terms).


Schermata005.png

One important function is “Timeline” under the “Analyse” menu that builds a timeframe of the events in the documents helping you to frame the info they contain.


Schermata006.png

The last step is publication,


Schermata007.png

DC proposes different options: download the docs, visualize them as pdf or embed them in a public viewer.

Did I mention it's journalist proof? Anybody can learn to work with it in a few minutes and it's truly fun and powerful.

Enjoy!

 

(By Guido Romeo)

blog comments powered by Disqus
© 2012 Fondazione <ahref | Sede legale: Vicolo Dallapiccola 12 - 38122 Trento - Italy | P. IVA 02178080228 Creative Commons License