Gephi visualisation
Here is the Facebook network I created with the Netviz App and Gephi
Testpost for Open Science podcasts
Here is the first open science podcast
The Digital American Public Library
The Digital Public Library of America (DPLA) is an initiative that has the goal to make the cultural and scientific heritage of humanity available, free of charge, to all. Where Google Books is caught up in an everlasting legal battle, a group of Harvard-led scholars have decided to launch their own project to put all of history online.
When Google launched its Google Books project in 2004 with the goal to scan all the world’s books into its database, it was both praised and critisised heavily. Praised for its bold attempt to make it technically possible to digitise books on a scale never seen before. Critisised over the fact that a private company would control all of the worlds knowledge. In 2008, after being sued for copyright infringement for years, Google agreed to pay large sums to authors and publishers in return for permission to develop a commercial database of books. Under the terms of the deal, Google would be able to sell subscriptions to the database to libraries and other institutions while also using the service as a means for selling e-books and displaying advertisements. This led to even more controversy and several authors and libraries demanded to be excluded from Google’s database.
In a response to this, Robert Darnton, one of the biggest critics of Google Books, proposed to build a true ‘digital public library of America’ which would be ‘truly free and democratic’. Here, libraries and universities would work together to establish a distributed system aggregating collections from many institutions. Harvard’s Berkman Centre of Internet and Society accepted Darton’s ideas and is incubating it now. The project has several similarities with that other project that comes forth out of a response to Google Books: Europeana, and the two giants have already forged partnerships. Google still has to decide what their next steps are.
The vision of the DPLA is to provide one click access to many different resource types, with the initial focus on producing a resource that gives full text access to books in public domain, e.g. from Hathi Trust, the Internet Archive, and U.S and international research libraries. Most of its board members, including Brewster Kahle from the Internet Archive, favor a de-centralised network of different public libraries instead of building a centralised organisation which is responsible for all of its content, but this is still being discussed
In April 2013 the Harvard funded research program ends and the digital library has to be operational. A lot of progress has been made in the last year by organising several meetings and workshops and many volunteers have been recruited. Still, there are a lot of obstacles that have to be overcome.
As Google has also noticed, the technical implementation is not the hardest part, it is the copyright. Today, copyright for a work extends for 70 years after the death of the author and is applied by default to any created work. This means that it is now almost impossible to publish a work from the last century. Even when the copyright holders either are unknown or can’t be found, so called ‘Orphan Works’, the work can not be published online because the copyright law was automatically applied on all works retroactively, so without the copyright holder having to register it.
Many copyright experts argue that without a proper revision of the current copyright act, it will be very hard to include these orphan works in a digital database. Robert Danton however, believes that Congress might grant a non-commercial public library the right to digitise orphan books, which would make thousands of books available and an enormous step forward in the copyright debate.
The Digital Public Library is an ambitious project with great promise. In the next year they will continue to address the challenges that lie before them. A daunting task but with a potentially great outcome, where everybody with an internet connection can enjoy millions of books from America’s history.
ePSI Platform Conference 2012
On Friday the 16th of March, the European Public Service Information (ePSI) Platform conference was held in Rotterdam. More than 300 guests from all over the world gathered for what turned out to be a very busy and interesting day. The big turnout of the conference showed the huge current interest in Open Data.
The ePSI platform is an organisation working to stimulate and promote Public Service Information (PSI) re-use and open data initiatives. They work to achieve the goals of the PSI Directive, which was created in 2003, and encourages EU member states to make as much public sector information available for re-use as possible. Now, almost 10 years later, there is still a lot of work to be done. Instead of embracing the idea of open data, many large public organisations are fighting to maintain the right to charge costs for their information. It is in response to this that the European Commission proposed its ‘Open Data Strategy‘ in December 2011. It includes the following proposed changes to the European PSI Directive:
- All data made available by government institutions must be able to be generally used for commercial and non-commercial purposes;
- In principle, the costs charged by government institutions may not exceed the costs involved in the individual request for information (marginal costs – in practice usually free of charge);
- an obligation for government institutions to provide data in common machine-readable formats to ensure that information can actually be re-used;
- Member States must introduce regulatory supervision to monitor compliance with the aforementioned principles;
- information from libraries, museums and archives will also be eligible for re-use.
From the Open GLAM perspective, the last change of the directive is of course very interesting. It would mean that all the European cultural memory institutions have to make their publicly funded work freely and openly available. It is important to notice here that this will only include their metadata, that is the data about the actual cultural objects that they hold. This includes author/year/location etcetera. By making this data freely available for re-use, data from cultural institutions can be linked to other collections and also be reused in new and innovative applications. A lot of traditional institutions are still anxious about this idea since they fear that they will lose control over their data, and this is just one of the concerns. The whitepaper “The problem of the Yellow Milkmaid” shows a more thorough study about the potential benefits and perceived risks of open metadata for cultural institutions.
When cultural heritage institutions are included under the purview of the PSI Directive, this will improve citizens access to our shared knowledge and culture and should increase the amount of digitized cultural heritage that is available online. At the end of 2011 however, the Dutch government expressed some concerns about the idea of including libraries, archives and museums. The main reason for this is that they believe that it will become too much of an administrative burden for the institutions to conform to. The Dutch government suggested instead that institutions should make their data available on a more voluntary base through for example the Europeana project.
During the presentation sessions about cultural heritage, Richard Sweetenham (Head of Unit, Access to Information at the European Commission), gave his response to Dutch government’s line on the matter. He said that he could not think of a reason why cultural institutions should not be included in the directive; the data is already there and the institutions are not only are funded with public money, but also have a public mission. The content of an archive, museum or library only has value when it is found and used. It gets even more value when the data is formatted in such a way that it can be linked with data from other cultural institutions from around Europe and all over the world.
After his talk, Harry Verwayen, business development director at Europeana and David Haskiya, product developer at Europeana, showed the value proposition of open cultural heritage metadata. To make the most out of this data, institutions should not be afraid to publish their metadata under a CC0 license. Waiving away all rights of the data sounds scary, but it actually enables them pursue their public mission more successfully, while still controlling the copyright of the actual digitised object. A more thorough study about the impact of the proposed amendments of the PSI directive has been done by the Communia association and can be found here.
The next couple of months will be crucial for the PSI directive. All updates can be found on the ePSI platform website.
German National Library releases more Linked Open Data under a more Open License
This is a direct copy of my blogpost on openGLAM. Soon there will be some original content here…
“In 2010 the German National Library (DNB) started publishing authority data as Linked Data. The existing Linked Data service of the DNB is now extended with title data. In this context the licence for linked data is shifted to “Creative Commons Zero.
The bibliographic data of the DNB’s main collection (apart from the printed music and the collection of the Deutsches Exilarchiv) and the serials (magazines, newspapers and series of the German Union Catalogue of serials (ZDB) have been converted. This is an experimental service that will be continually expanded and improved.”
The release of the bibliographic data as Linked Open Data means that the DNB joins a host of other cultural heritage institutions such as the British Library and the Dutch National Archive who have taken a similar course.
Linked Open Data makes sure that information from one cultural dataset can be linked with information from another dataset in a meaningful way. This could be two datasets from different institutions, or, indeed, two datasets from the same organisation. The possibilities are endless as long as everybody uses unique URI’s for their data. More information about how Linked Open Data works can be found here.
Now that more and more cultural institutions see the importance of Linked Open Data, Richard Wallis, from the Data Liberate blog, predicts that this will be the first of many such announcements this year.
Open GLAM in Germany
This post is the original blogpost I wrote for the open GLAM website. The final (edited) version can be found on www.openglam.org.
Here in Germany, it appears that the open data debate is not quite as far ahead as in the UK or in the Netherlands (although they are working on it). All the more reason for different groups to set up new initiatives in order to fire up the discussion about making digital heritage available under an open license. Especially concerning the major role Germany has played in the history of Europe, amazing achievements can be obtained when the data can be freely (re)used by anybody. With the millions of paintings, photos, videos, maps, sculptures and archives available, the possibilities will be endless. Imagine watching any event during WWII through the eyes of both a German and a British Soldier, or to see the famous Pergamon Altar being enriched with objects from Greek institutions. New stories can be told and new insights in history can be found.
Different projects are being organized in different parts of Germany with GLAM (Galleries, Libraries, Archives and Museums) institutions. Goal is to bring different groups of people together and help each other to get as much open-access, freely-reusable cultural content available for the public.
A great example is the cooperation between Wikimedia Germany and the German Federal Archives (Deutsches Bundesarchiv). In 2008, the archive donated 100.000 photos out of its huge collection to Wikimedia under an open license. The photos made it possible for the Wikipedia volunteers to enrich the Wikipedia articles with images and this way bring them to life. The archive itself also benefited greatly from their donation. This cooperation led to dramatically increased visibility of their holdings and at the same time and the metadata and descriptions of the photos were constantly improved by volunteers.
The cooperation between Wikimedia and the German Federal Archives has since then been one of the prime examples of how successful releasing digital heritage under an open license can be. The full case study can be found here
The Wikimedia Foundation is currently the driving force behind most of the Open GLAM projects. Not only in Germany, but in many other countries as well, as for example the wikilovesmonuments project.
Organizing more successful GLAM projects is all about bringing people together. Lots of people at institutions are thinking about opening up their data but do not have the expertise. Both technical and legal. Others do not see the use of opening up their data or are skeptical towards it. By showing the rich scale of possibilities and letting programmers create new tools and visualizations with their data, we can show the advantages when cultural data is available under an open license.
In the future, the Open Knowledge Foundation will work together with different organizations to organize even more Open GLAM projects in Germany and help them making it easier for everyone to add, find and reuse cultural works which are under an open license.
Wikimedia Conference 2011: Cultural Heritage, Commons and lots of Data. Pt. 2
This is part 2 of my report on the Wikimedia conference 2011. The first part can be found here
Teun Lucassen – Wikipedia is reliable [Citation needed]
Almost all the sessions in the cultural heritage track were presentations about a certain project. This was interesting but not something I had not heard yet. It therefore chose to go to a presentation by Teun Lucassen (@tlucassen) about how users experience the reliability of Wikipedia. Lucassen is a PhD student at the faculty of Behavioral Sciences at the University of Twente. The first question Lucassen asks in his research is if Wikipedia users need help deciding if an article is reliable. The problem with Wikipedia is the that it is hard to find out who the authors of an article are and if they can be considered a reliable source. Throughout the history of Wikipedia, several attempts have been made to help the user deciding. Lucassen first showed WikiViz, a datavisualization tool developed by IBM. The tools adds a bar to the article which shows a number of statistics about the article. For example that it has been edited 87 times, by 23 users. The problem with this kind of information is, what does it say about the reliability? Especially when you realize that most of the edits are made by automated bots. Lucassen told that he always uses this tool as a bad example. In this however, I do not totally agree with him. His research reminded me about my own research I did in the Digital Methods class about Wikipedia last year. Here I analyzed how different articles were build. This showed that most articles have been created by several different users, but the majority of the text was written by only one or two persons. All the others edits made by human editors were mainly grammatical and linguistic improvements. This is a problem for an encyclopedia who’s goal it is to show a neutral point of view. Showing how many people are actually responsible for the text can therefore be a useful way give an indication about the reliability of the article. My full report can be found on my blog.
Lucassen studied three methods that would help the user to decide if the article is reliable. The first is a user based rating system, which is implemented at the moment in the English language Wikipedia. The second one was an easy algorithm that shows a rating depending on the amount of edits and users the article has. The third one is what Lucassen calls an ‘Adaptive Neural Network Rating system’. This uses a difficult algorithm that is impossible for the user to understand. Lucassen did not tell his testing group that this system was complete nonsense. He gave the testing group the same articles to read with different ratings in order to see how this would influence their idea of trustworthiness. His test results showed that people considered the article less reliable when the user based rating system was used. People did not trust the opinion of other people or thought that there were not enough votes. The simple algorithm made people more positive about the article. All test users agreed that this mark that is created by this algorithm is not able to give much useful information about the article. The third, made up, algorithm showed both positive and negative results. This can be explained by a phenomena called ‘over-reliance’ . This is when people start making their own assumptions about what the algorithm means. It was funny to see how people had started to believe an algorithm which was completely made up.
Lucassen concludes his research that because of the ambiguous quality of Wikipedia, helping the users can be a good strategy in order to make Wikipedia more reliable, but that there are many pitfalls in how to achieve this. Lucassen proposes a user based rating system where he voter has to add a small piece of text that explains why he gave the grade. I found Lucassen’s presentation extremely interesting and I think this kind of research can definitely be used in combination with the research that is done at the Digital Methods course at the MA New Media at he University of Amsterdam. More information about Lucassen’s research can be found on his blog.
Ronald Beelaard – The lifecycle of Wikipedia
Ronald Beelaard did extensive research to the lifecycle of Wikipedia users. The reason for this is the article written by the Wall Street Journal which concluded that Wikipedia editors were leaving Wikipedia on a larger scale than ever. Beelaard started his own research in order to find out how many Wikipedia users ‘die’ each month and how many are ‘born’. He also took in concern the phenomena called a ‘Wikibreak’, where editors stop editing Wikipedia for a while, to come back later. Beelaard showed a big bulk of numbers which weren’t always easy to understand and concluded that the dropout rate is only a fraction as big as the numbers that are mentioned in the Wall Street Journal. It is however true that less people start editing Wikipedia and the young editors die earlier than the old ones. The total community is shrinking but the seniors are more vital than ever.
Erik Zachte – Wikipedia, still a world to win
The last presentation of the day was given by Erik Zachte (@infodisiac), a data analyst. He researched Wikipedia’s mission to embrace the whole world. He showed in a graph (Inspired by Hans Roslings Gapminder) that Wikipedia is growing in all languages, but that some of them are relatively small compared to the amount of people who speak the language. The English Wikipedia is off course the biggest, but the Arabic or Hindi Wikipedia is still relatively small, despite the millions of people who speak these languages. This is partly because of the internet penetration in these countries, which is not as high as in Western countries. This is also the reason why the Scandinavian Wikipedia is doing so extremely well. But this is not the only reason. Zachte showed for example that the English language Wikipedia is edited by an very high number of people from India. Zachte also showed that most edits come from Europe, which can be explained by the high amount of languages here. When a big disaster or worldwide event happens, a Wiki page appears of it in all the different languages.
There is a big correlation between the amount of habitants of a country and the size of the Wikipedia in that language. By putting a geographical map with the population density over a map with the sizes of each Wikipedia, Zachte showed interesting outliers. A nice piece of datavizualization. Zachte ended his presentation by addressing the rise of mobile internet. In African countries, not many people own a desktop computer with an internet connection. There is hoewever, a big rise in the use of smart phones. Zachte therefore beleives that the Wikimedia foundation should make their site more accessible for editing the pages with these devices in order to create a larger Wikipedia.
In the end I can look back on a very well organized event with lots of interesting presentations. The cultural heritage track gave a nice overview about what is happening at the moment in the field and how open content and standards can help spread the content. The Wiki-world track was for me however the most fascinating. It reminded me of all the researches that were done last year in the Digital Methods and datavisualization classes of my MA New Media studies and the fact Wikipedia and all its data is such an interesting object of study. I hereby want to thank the organization for a great day and I hope to able to be part of it next year.
Dit werk is gelicenseerd onder een Creative Commons Naamsvermelding-GelijkDelen 3.0 Nederland licentie
Wikimedia Conference 2011: Cultural Heritage, Commons and lots of Data. Pt. 1
On Saturday 5 November, the Wikimedia foundation held a conference in Utrecht. I took the opportunity to go there and write this report about it. Because of the size I decided to split it in two parts. The first is mainly about Cultural heritage and Creative Commons, the second part is about Wikipedia itself.
The Wikimedia foundation is a non-profit organization that is at the top of several open source projects dedicated to bringing free content to the world. Its most famous project is of course Wikipedia itself, but their are several other projects which deserve attention like the Wiktionary and Wikimedia Commons, which was discussed a lot today.
The conference was opened with a speech by a man who introduced himself as the CEO of the Wikimedia foundation and talked about the commercial successes they have reached. This was done by creating ad-space on the Wikipedia pages and receiving sponsor money by for example Neelie Kroes in order to keep her page clean. During his speech it pretty soon became clear that this was all part of a comedy act about exactly everything that Wikimedia is not.
Jill Cousins – Europeana without Walls
After this little piece of comedy theater it was time for Jill Cousins, executive director of the Europeana project, to open the conference with a keynote. Cousins presented the current status of the project and its relation with the Wikimedia foundation. Europeana’s goal is to digitize all of Europe’s heritage and to make it publicly available. Europeana aggregates the objects and its metadata from institutions all over Europe. Here Cousins addressed the copyright problem. Goal is to release all the metdata collected by Europeana under a Creative Commons license which allows commercial use by other parties (CC-0). The institutions are quite anxious towards this because they believe that they lose control of their material and fear a loss of income if others can use their content for free. However, as Cousins mentioned, without the possibility of commercial use, the objects can barely be used. This because the material can not be embedded on sites that exploit commercial activities, like for example put ads on their site. This also means that the objects can not be used in Wikipedia articles, since their regulations prescribe that media content has to be openly available and also for commercial use.
Europeana realizes that most of their objects are not found directly on their own portal website, but on other sites that embed their content, so being able to work together with other sites is vital for this project.
An other issue that Europeana has, is the lack of good metadata (this I also described in my MA thesis which can be found here). In order to make full use of the semantic possibilities of the web, especially with more than 15 million objects, good metadata is essential. Europeana has recently launched several projects and a handbook to encourage the different institutions to fill in their metadata in a correct and unified way. Here Cousins also noted that no matter what the information status is, the metadata should always be in public domain.
After the plenary opening, visitors had the option of choosing three different ‘tracks’. The first was purely focussed on cultural heritage, the second was about the world and data around the different Wikimedia projects and the third, the ‘Incore Wikimedia track’ consisted of different technical sessions for Wikipedia editors. Because of my focus on digital heritage and Europeana, I chose the first.
Maarten Dammers – GLAMWiki Overview
The first speaker was Maarten Dammers (@mdammers), a very active Wikimedia volunteer. He showed the GLAMwiki project. This stands for Galleries, Libraries, Archives, Museums & Wikimedia. Goal of this project is to build a bridge between these cultural institutions and Wikimedia. In order to achieve this several different projects were found. The first project Maarten talked about was Wiki loves Art. In this project users were asked to go to museums and to take pictures of different art objects and upload them to the Wikimedia Commons image bank. Because these pictures are under a CC-BY-SA license, which means commercial use is allowed, the pictures can be embedded in Wikipedia pages about the artist or object itself. By crowdsourcing these images and by making a contest out of it, the image bank quickly became filled with thousands of pictures. Other Wikipedia users started to add metadata to the images and to place them in articles which greatly enriched the Wikipedia pages.
In the second part of the presentation, Lodewijk Gelauff (@effeietsanders) joined Maarten to talk about Wiki loves Monuments. This project is the successor of the Wiki loves Art project. Where the Art project was only in the Netherlands, the monuments project was focussed on monuments from all over Europe. After the project was finished, it had resulted in 165000 photo’s in the Wikimedia Commons image bank.
Maarten Zeinstra – The value of putting photo collections in the public domain
After a short break, Knowledgeland employee Maarten Zeinstra (@mzeinstra) presented the results of his research about what the benefits are for institutions when they put their (photo)collection in the public domain. Maarten analyzed 1200 photo’s that were released by the Dutch National Archive. All pictures were from Dutch political party members in the history of the Netherlands. When these photo’s were put in the Wikimedia Commons image bank, Wikipedia users quickly started to add the pictures to Wikipedia articles. The result of this is that the pictures of the National Archive gained a lot more attention and automatically new metadata was added. To analyze this, Maarten made use of different tools created by members of the Wikimedia foundation. Several of these tools can be very helpful when analyzing Wikipedia, also on an academic level.
Interesting in this presentation was that this analysis actually showed to what extent the materials that are being put in the image bank are used. This information is extremely helpful when institutions are in doubt about if they should put their collection in the public domain. Maarten’s research also showed that it is more likely that the materials are used when a specific set is chosen. Maarten compared the collection of the National Archive with a lot bigger one from The Deutsche Fotothek which was uncategorized. From the 1200 photo’s from the National Archive, 55% was used in different language Wikipedia articles. From the Deutsche Fotothek collection, only around 3,5% was used. The main reason for this is the fact that an uncategorized collection requires more effort from the Wikipedia editors in order to sort them out. The full report can be found on the website of Images for the Future.
Sebastiaan ter Burg – Creative Commons in practice.
Sebastiaan ter Burg (@ter_burg) is a photographer who works independently. When he makes a photo report for a company he has one clear condition: all his work becomes, directly or with a delay, freely available under a CC-BY-SA license on his Flickr account. This means that all his work can be freely used and spread, even for commercial purposes. In his presentation, Sebastiaan talked about the benefits this way of working has for him. First of all, it saves him a lot of paperwork. In the ‘old’ way of making money with photo’s, an invoice and a contract is created for each picture that is sold. By releasing the material under a Creative Commons license, this is no longer necessary. Sebastiaan sets a fixed price for a photoshoot and so their is only one contract. The more important advantage is the fact his work is being spread and is being used in all kind of other different media. He noted that he has a better income than most freelance photographers. It has to be noted however, that Sebastian’s business model is not better than the old one per se. Quality is still the most important aspect when making money in the creative industry. It will however become harder for photographers who are not at the top to generate an income. When more photo’s are released under a Creative Commons license, less photographers are needed to report an event. When a couple of photographers take good pictures, other media can use them. The presentation of Sebastian showed that a business model that works with open data can work, which is a refreshing thought.
Johan Oomen – Open Images
Johan Oomen (@johanoomen) is the head of the research and development department at the Netherlands Institute for Sound an Vision. He presented Open Images, which is a sub-project of the Images for the Future project. This is a Dutch project which has the goal to digitize the Dutch audio-visual heritage and to make it publicly available under an open license. Oomen explained that ‘open’ has to be understood in its broadest meaning: open source, open mediaformats (ogg), open standards (html5) and open content (CC). This way of working stimulates the reuse and remixing of old content. The project will also work together with the Europeana project in order to make the content more easily accessible. The project will continue for two more years and will mainly focus on this reuse and the possibilities of crowdsourcing new information, metadata and products.
Jan-Bart de Vreede – Wikiwijs, use of open content for education purposes.
Jan-Bart de Vreede is active in the Wikiwijs project which has the goal to let teachers use more open source content in the classroom. This content varies from images and videos, to complete lessons. Different objects or parts of lessons can also be combined in order to create new lessons, as long as as these are also shared under a Creative Commons license. In order to guarantee quality, educational institutes and teachers can add their opinion about the material. Interesting to hear was that the number one reason for teachers not to share their content, is that they think their material is not good enough. Which is kind of strange when they have been using it themselves for years.
This is the end of part 1, which is all about presentations from the Cultural Heritage track of the conference. In part 2, presentations from the ‘Wiki-World’ track are being discussed.
Dit werk is gelicenseerd onder een Creative Commons Naamsvermelding-GelijkDelen 3.0 Nederland licentie
MA Thesis: Europeana Building a European Identity
Last month I was finally able to receive my Masters diploma from the University of Amsterdam. My research about Europeana and the European identity has been found interesting enough to let me pass. However, there are still a lot of questions remaining after finishing this research. My supervisor Theo Thomassen commented that in order to really build a complete study, one more year of study is probably required.
Hereby I want to thank Theo Thomassen for supervising me, as well as many other people I have met during this year in the MA New Media at the University of Amsterdam. Beforehand , I actually did not expected it be so interesting and fun. The focus on many different skill sets, like blogging and theoretical analysis, but also on more practical research in the Digital Methods class, really gave me a better understanding in so many different fields. The last course about datavisualization was very extensive, especially working together with so many different people from different studies, but very interesting and it really opened my eyes about the possibilities of this kind of analysis when studying huge datasets.
At this point I can put MA in front of my name, which is a good feeling. I still however believe that I am just only starting to delve into the material and hopefully I will be able to continue exploring a field that has so many interesting aspects.
Anybody who is interested in my MA Thesis: It is freely available under a CC-BY licence and can be found here.