Archive

Posts Tagged ‘Wikimedia’

Open GLAM in Germany

January 22, 2012 Leave a comment

This post is the original blogpost I wrote for the open GLAM website. The final (edited) version can be found on www.openglam.org

Here in Germany, it appears that the open data debate is not quite as far ahead as in the UK or in the Netherlands (although they are working on it). All the more reason for different groups to set up new initiatives in order to fire up the discussion about making digital heritage available under an open license. Especially concerning the major role Germany has played in the history of Europe, amazing achievements can be obtained when the data can be freely (re)used by anybody. With the millions of paintings, photos, videos, maps, sculptures and archives available, the possibilities will be endless. Imagine watching any event during WWII through the eyes of both a German and a British Soldier, or to see the famous Pergamon Altar being enriched with objects from Greek institutions. New stories can be told and new insights in history can be found.

Different projects are being organized in different parts of Germany with GLAM (Galleries, Libraries, Archives and Museums) institutions. Goal is to bring different groups of people together and help each other to get as much open-access, freely-reusable cultural content available for the public.

A great example is the cooperation between Wikimedia Germany and the German Federal Archives (Deutsches Bundesarchiv). In 2008, the archive donated 100.000 photos out of its huge collection to Wikimedia under an open license. The photos made it possible for the Wikipedia volunteers to enrich the Wikipedia articles with images and this way bring them to life. The archive itself also benefited greatly from their donation. This cooperation led to dramatically increased visibility of their holdings and at the same time and the metadata and descriptions of the photos were constantly improved by volunteers.
The cooperation between Wikimedia and the German Federal Archives has since then been one of the prime examples of how successful releasing digital heritage under an open license can be. The full case study can be found here

The Wikimedia Foundation is currently the driving force behind most of the Open GLAM projects. Not only in Germany, but in many other countries as well, as for example the wikilovesmonuments project.

Organizing more successful GLAM projects is all about bringing people together. Lots of people at institutions are thinking about opening up their data but do not have the expertise. Both technical and legal. Others do not see the use of opening up their data or are skeptical towards it. By showing the rich scale of possibilities and letting programmers create new tools and visualizations with their data, we can show the advantages when cultural data is available under an open license.

In the future, the Open Knowledge Foundation will work together with different organizations to organize even more Open GLAM projects in Germany and help them making it easier for everyone to add, find and reuse cultural works which are under an open license.

Those who are interested in GLAM outreach and helping to join the effort to encourage cultural heritage institutions to open up the data they hold on their collections and digital copies of works, please join the discussion on the Open GLAM/Open Heritage mailing list. Or send me an email: joris.pekel@okfn.org

Wikimedia Conference 2011: Cultural Heritage, Commons and lots of Data. Pt. 2

November 8, 2011 1 comment

This is part 2 of my report on the Wikimedia conference 2011. The first part can be found here

Teun Lucassen – Wikipedia is reliable [Citation needed]

Almost all the sessions in the cultural heritage track were presentations about a certain project. This was interesting but not something I had not heard yet. It therefore chose to go to a presentation by Teun Lucassen (@tlucassen) about how users experience the reliability of Wikipedia. Lucassen is a PhD student at the faculty of Behavioral Sciences at the University of Twente. The first question Lucassen asks in his research is if Wikipedia users need help deciding if an article is reliable. The problem with Wikipedia is the that it is hard to find out who the authors of an article are and if they can be considered a reliable source. Throughout the history of Wikipedia, several attempts have been made to help the user deciding. Lucassen first showed WikiViz, a datavisualization tool developed by IBM. The tools adds a bar to the article which shows a number of statistics about the article. For example that it has been edited 87 times, by 23 users. The problem with this kind of information is, what does it say about the reliability? Especially when you realize that most of the edits are made by automated bots. Lucassen told that he always uses this tool as a bad example. In this however, I do not totally agree with him. His research reminded me about my own research I did in the Digital Methods class about Wikipedia last year. Here I analyzed how different articles were build. This showed that most articles have been created by several different users, but the majority of the text was written by only one or two persons. All the others edits made by human editors were mainly grammatical and linguistic improvements. This is a problem for an encyclopedia who’s goal it is to show a neutral point of view. Showing how many people are actually responsible for the text can therefore be a useful way give an indication about the reliability of the article. My full report can be found on my blog.
Lucassen studied three methods that would help the user to decide if the article is reliable. The first is a user based rating system, which is implemented at the moment in the English language Wikipedia. The second one was an easy algorithm that shows a rating depending on the amount of edits and users the article has. The third one is what Lucassen calls an ‘Adaptive Neural Network Rating system’. This uses a difficult algorithm that is impossible for the user to understand. Lucassen did not tell his testing group that this system was complete nonsense. He gave the testing group the same articles to read with different ratings in order to see how this would influence their idea of trustworthiness. His test results showed that people considered the article less reliable when the user based rating system was used. People did not trust the opinion of other people or thought that there were not enough votes. The simple algorithm made people more positive about the article. All test users agreed that this mark that is created by this algorithm is not able to give much useful information about the article. The third, made up, algorithm showed both positive and negative results. This can be explained by a phenomena called ‘over-reliance’ . This is when people start making their own assumptions about what the algorithm means. It was funny to see how people had started to believe an algorithm which was completely made up.
Lucassen concludes his research that because of the ambiguous quality of Wikipedia, helping the users can be a good strategy in order to make Wikipedia more reliable, but that there are many pitfalls in how to achieve this. Lucassen proposes a user based rating system where he voter has to add a small piece of text that explains why he gave the grade. I found Lucassen’s presentation extremely interesting and I think this kind of research can definitely be used in combination with the research that is done at the Digital Methods course at the MA New Media at he University of Amsterdam. More information about Lucassen’s research can be found on his blog.

Ronald Beelaard – The lifecycle of Wikipedia

Ronald Beelaard did extensive research to the lifecycle of Wikipedia users. The reason for this is the article written by the Wall Street Journal which concluded that Wikipedia editors were leaving Wikipedia on a larger scale than ever. Beelaard started his own research in order to find out how many Wikipedia users ‘die’ each month and how many are ‘born’. He also took in concern the phenomena called a ‘Wikibreak’, where editors stop editing Wikipedia for a while, to come back later. Beelaard showed a big bulk of numbers which weren’t always easy to understand and concluded that the dropout rate is only a fraction as big as the numbers that are mentioned in the Wall Street Journal. It is however true that less people start editing Wikipedia and the young editors die earlier than the old ones. The total community is shrinking but the seniors are more vital than ever.

Erik Zachte – Wikipedia, still a world to win

The last presentation of the day was given by Erik Zachte (@infodisiac), a data analyst. He researched Wikipedia’s mission to embrace the whole world. He showed in a graph (Inspired by Hans Roslings Gapminder) that Wikipedia is growing in all languages, but that some of them are relatively small compared to the amount of people who speak the language. The English Wikipedia is off course the biggest, but the Arabic or Hindi Wikipedia is still relatively small, despite the millions of people who speak these languages. This is partly because of the internet penetration in these countries, which is not as high as in Western countries. This is also the reason why the Scandinavian Wikipedia is doing so extremely well. But this is not the only reason. Zachte showed for example that the English language Wikipedia is edited by an very high number of people from India. Zachte also showed that most edits come from Europe, which can be explained by the high amount of languages here. When a big disaster or worldwide event happens, a Wiki page appears of it in all the different languages.
There is a big correlation between the amount of habitants of a country and the size of the Wikipedia in that language. By putting a geographical map with the population density over a map with the sizes of each Wikipedia, Zachte showed interesting outliers. A nice piece of datavizualization. Zachte ended his presentation by addressing the rise of mobile internet. In African countries, not many people own a desktop computer with an internet connection. There is hoewever, a big rise in the use of smart phones. Zachte therefore beleives that the Wikimedia foundation should make their site more accessible for editing the pages with these devices in order to create a larger Wikipedia.

In the end I can look back on a very well organized event with lots of interesting presentations. The cultural heritage track gave a nice overview about what is happening at the moment in the field and how open content and standards can help spread the content. The Wiki-world track was for me however the most fascinating. It reminded me of all the researches that were done last year in the Digital Methods and datavisualization classes of my MA New Media studies and the fact Wikipedia and all its data is such an interesting object of study. I hereby want to thank the organization for a great day and I hope to able to be part of it next year.

Creative Commons Licentie
Dit werk is gelicenseerd onder een Creative Commons Naamsvermelding-GelijkDelen 3.0 Nederland licentie

Wikimedia Conference 2011: Cultural Heritage, Commons and lots of Data. Pt. 1

November 8, 2011 3 comments

On Saturday 5 November, the Wikimedia foundation held a conference in Utrecht. I took the opportunity to go there and write this report about it. Because of the size I decided to split it in two parts. The first is mainly about Cultural heritage and Creative Commons, the second part is about Wikipedia itself.

The Wikimedia foundation is a non-profit organization that is at the top of several open source projects dedicated to bringing free content to the world. Its most famous project is of course Wikipedia itself, but their are several other projects which deserve attention like the Wiktionary and Wikimedia Commons, which was discussed a lot today.

The conference was opened with a speech by a man who introduced himself as the CEO of the Wikimedia foundation and talked about the commercial successes they have reached. This was done by creating ad-space on the Wikipedia pages and receiving sponsor money by for example Neelie Kroes in order to keep her page clean. During his speech it pretty soon became clear that this was all part of a comedy act about exactly everything that Wikimedia is not.

Jill Cousins – Europeana without Walls

After this little piece of comedy theater it was time for Jill Cousins, executive director of the Europeana project, to open the conference with a keynote. Cousins presented the current status of the project and its relation with the Wikimedia foundation. Europeana’s goal is to digitize all of Europe’s heritage and to make it publicly available. Europeana aggregates the objects and its metadata from institutions all over Europe. Here Cousins addressed the copyright problem. Goal is to release all the metdata collected by Europeana under a Creative Commons license which allows commercial use by other parties (CC-0). The institutions are quite anxious towards this because they believe that they lose control of their material and fear a loss of income if others can use their content for free. However, as Cousins mentioned, without the possibility of commercial use, the objects can barely be used. This because the material can not be embedded on sites that exploit commercial activities, like for example put ads on their site. This also means that the objects can not be used in Wikipedia articles, since their regulations prescribe that media content has to be openly available and also for commercial use.
Europeana realizes that most of their objects are not found directly on their own portal website, but on other sites that embed their content, so being able to work together with other sites is vital for this project.
An other issue that Europeana has, is the lack of good metadata (this I also described in my MA thesis which can be found here). In order to make full use of the semantic possibilities of the web, especially with more than 15 million objects, good metadata is essential. Europeana has recently launched several projects and a handbook to encourage the different institutions to fill in their metadata in a correct and unified way. Here Cousins also noted that no matter what the information status is, the metadata should always be in public domain.

After the plenary opening, visitors had the option of choosing three different ‘tracks’. The first was purely focussed on cultural heritage, the second was about the world and data around the different Wikimedia projects and the third, the ‘Incore Wikimedia track’ consisted of different technical sessions for Wikipedia editors. Because of my focus on digital heritage and Europeana, I chose the first.

Maarten Dammers – GLAMWiki Overview

The first speaker was Maarten Dammers (@mdammers), a very active Wikimedia volunteer. He showed the GLAMwiki project. This stands for Galleries, Libraries, Archives, Museums & Wikimedia. Goal of this project is to build a bridge between these cultural institutions and Wikimedia. In order to achieve this several different projects were found. The first project Maarten talked about was Wiki loves Art. In this project users were asked to go to museums and to take pictures of different art objects and upload them to the Wikimedia Commons image bank. Because these pictures are under a CC-BY-SA license, which means commercial use is allowed, the pictures can be embedded in Wikipedia pages about the artist or object itself. By crowdsourcing these images and by making a contest out of it, the image bank quickly became filled with thousands of pictures. Other Wikipedia users started to add metadata to the images and to place them in articles which greatly enriched the Wikipedia pages.
In the second part of the presentation, Lodewijk Gelauff (@effeietsanders) joined Maarten to talk about Wiki loves Monuments. This project is the successor of the Wiki loves Art project. Where the Art project was only in the Netherlands, the monuments project was focussed on monuments from all over Europe. After the project was finished, it had resulted in 165000 photo’s in the Wikimedia Commons image bank.

Maarten Zeinstra – The value of putting photo collections in the public domain

After a short break, Knowledgeland employee Maarten Zeinstra (@mzeinstra) presented the results of his research about what the benefits are for institutions when they put their (photo)collection in the public domain. Maarten analyzed 1200 photo’s that were released by the Dutch National Archive. All pictures were from Dutch political party members in the history of the Netherlands. When these photo’s were put in the Wikimedia Commons image bank, Wikipedia users quickly started to add the pictures to Wikipedia articles. The result of this is that the pictures of the National Archive gained a lot more attention and automatically new metadata was added. To analyze this, Maarten made use of different tools created by members of the Wikimedia foundation. Several of these tools can be very helpful when analyzing Wikipedia, also on an academic level.
Interesting in this presentation was that this analysis actually showed to what extent the materials that are being put in the image bank are used. This information is extremely helpful when institutions are in doubt about if they should put their collection in the public domain. Maarten’s research also showed that it is more likely that the materials are used when a specific set is chosen. Maarten compared the collection of the National Archive with a lot bigger one from The Deutsche Fotothek which was uncategorized. From the 1200 photo’s from the National Archive, 55% was used in different language Wikipedia articles. From the Deutsche Fotothek collection, only around 3,5% was used. The main reason for this is the fact that an uncategorized collection requires more effort from the Wikipedia editors in order to sort them out. The full report can be found on the website of Images for the Future.

Sebastiaan ter Burg – Creative Commons in practice.

Sebastiaan ter Burg (@ter_burg) is a photographer who works independently. When he makes a photo report for a company he has one clear condition: all his work becomes, directly or with a delay, freely available under a CC-BY-SA license on his Flickr account. This means that all his work can be freely used and spread, even for commercial purposes. In his presentation, Sebastiaan talked about the benefits this way of working has for him. First of all, it saves him a lot of paperwork. In the ‘old’ way of making money with photo’s, an invoice and a contract is created for each picture that is sold. By releasing the material under a Creative Commons license, this is no longer necessary. Sebastiaan sets a fixed price for a photoshoot and so their is only one contract. The more important advantage is the fact his work is being spread and is being used in all kind of other different media. He noted that he has a better income than most freelance photographers. It has to be noted however, that Sebastian’s business model is not better than the old one per se. Quality is still the most important aspect when making money in the creative industry. It will however become harder for photographers who are not at the top to generate an income. When more photo’s are released under a Creative Commons license, less photographers are needed to report an event. When a couple of photographers take good pictures, other media can use them. The presentation of Sebastian showed that a business model that works with open data can work, which is a refreshing thought.

Johan Oomen – Open Images

Johan Oomen (@johanoomen) is the head of the research and development department at the Netherlands Institute for Sound an Vision. He presented Open Images, which is a sub-project of the  Images for the Future project. This is a Dutch project which has the goal to digitize the Dutch audio-visual heritage and to make it publicly available under an open license. Oomen explained that ‘open’ has to be understood in its broadest meaning: open source, open mediaformats (ogg), open standards (html5) and open content (CC). This way of working stimulates the reuse and remixing of old content. The project will also work together with the Europeana project in order to make the content more easily accessible. The project will continue for two more years and will mainly focus on this reuse and the possibilities of crowdsourcing new information, metadata and products.

Jan-Bart de Vreede – Wikiwijs, use of open content for education purposes.

Jan-Bart de Vreede is active in the Wikiwijs project which has the goal to let teachers use more open source content in the classroom. This content varies from images and videos, to complete lessons. Different objects or parts of lessons can also be combined in order to create new lessons, as long as as these are also shared under a Creative Commons license. In order to guarantee quality, educational institutes and teachers can add their opinion about the material. Interesting to hear was that the number one reason for teachers not to share their content, is that they think their material is not good enough. Which is kind of strange when they have been using it themselves for years.

This is the end of part 1, which is all about presentations from the Cultural Heritage track of the conference. In part 2, presentations from the ‘Wiki-World’ track are being discussed.

Creative Commons Licentie
Dit werk is gelicenseerd onder een Creative Commons Naamsvermelding-GelijkDelen 3.0 Nederland licentie