Posts Tagged ‘datavisualization’

Wikimedia Conference 2011: Cultural Heritage, Commons and lots of Data. Pt. 2

November 8, 2011 1 comment

This is part 2 of my report on the Wikimedia conference 2011. The first part can be found here

Teun Lucassen – Wikipedia is reliable [Citation needed]

Almost all the sessions in the cultural heritage track were presentations about a certain project. This was interesting but not something I had not heard yet. It therefore chose to go to a presentation by Teun Lucassen (@tlucassen) about how users experience the reliability of Wikipedia. Lucassen is a PhD student at the faculty of Behavioral Sciences at the University of Twente. The first question Lucassen asks in his research is if Wikipedia users need help deciding if an article is reliable. The problem with Wikipedia is the that it is hard to find out who the authors of an article are and if they can be considered a reliable source. Throughout the history of Wikipedia, several attempts have been made to help the user deciding. Lucassen first showed WikiViz, a datavisualization tool developed by IBM. The tools adds a bar to the article which shows a number of statistics about the article. For example that it has been edited 87 times, by 23 users. The problem with this kind of information is, what does it say about the reliability? Especially when you realize that most of the edits are made by automated bots. Lucassen told that he always uses this tool as a bad example. In this however, I do not totally agree with him. His research reminded me about my own research I did in the Digital Methods class about Wikipedia last year. Here I analyzed how different articles were build. This showed that most articles have been created by several different users, but the majority of the text was written by only one or two persons. All the others edits made by human editors were mainly grammatical and linguistic improvements. This is a problem for an encyclopedia who’s goal it is to show a neutral point of view. Showing how many people are actually responsible for the text can therefore be a useful way give an indication about the reliability of the article. My full report can be found on my blog.
Lucassen studied three methods that would help the user to decide if the article is reliable. The first is a user based rating system, which is implemented at the moment in the English language Wikipedia. The second one was an easy algorithm that shows a rating depending on the amount of edits and users the article has. The third one is what Lucassen calls an ‘Adaptive Neural Network Rating system’. This uses a difficult algorithm that is impossible for the user to understand. Lucassen did not tell his testing group that this system was complete nonsense. He gave the testing group the same articles to read with different ratings in order to see how this would influence their idea of trustworthiness. His test results showed that people considered the article less reliable when the user based rating system was used. People did not trust the opinion of other people or thought that there were not enough votes. The simple algorithm made people more positive about the article. All test users agreed that this mark that is created by this algorithm is not able to give much useful information about the article. The third, made up, algorithm showed both positive and negative results. This can be explained by a phenomena called ‘over-reliance’ . This is when people start making their own assumptions about what the algorithm means. It was funny to see how people had started to believe an algorithm which was completely made up.
Lucassen concludes his research that because of the ambiguous quality of Wikipedia, helping the users can be a good strategy in order to make Wikipedia more reliable, but that there are many pitfalls in how to achieve this. Lucassen proposes a user based rating system where he voter has to add a small piece of text that explains why he gave the grade. I found Lucassen’s presentation extremely interesting and I think this kind of research can definitely be used in combination with the research that is done at the Digital Methods course at the MA New Media at he University of Amsterdam. More information about Lucassen’s research can be found on his blog.

Ronald Beelaard – The lifecycle of Wikipedia

Ronald Beelaard did extensive research to the lifecycle of Wikipedia users. The reason for this is the article written by the Wall Street Journal which concluded that Wikipedia editors were leaving Wikipedia on a larger scale than ever. Beelaard started his own research in order to find out how many Wikipedia users ‘die’ each month and how many are ‘born’. He also took in concern the phenomena called a ‘Wikibreak’, where editors stop editing Wikipedia for a while, to come back later. Beelaard showed a big bulk of numbers which weren’t always easy to understand and concluded that the dropout rate is only a fraction as big as the numbers that are mentioned in the Wall Street Journal. It is however true that less people start editing Wikipedia and the young editors die earlier than the old ones. The total community is shrinking but the seniors are more vital than ever.

Erik Zachte – Wikipedia, still a world to win

The last presentation of the day was given by Erik Zachte (@infodisiac), a data analyst. He researched Wikipedia’s mission to embrace the whole world. He showed in a graph (Inspired by Hans Roslings Gapminder) that Wikipedia is growing in all languages, but that some of them are relatively small compared to the amount of people who speak the language. The English Wikipedia is off course the biggest, but the Arabic or Hindi Wikipedia is still relatively small, despite the millions of people who speak these languages. This is partly because of the internet penetration in these countries, which is not as high as in Western countries. This is also the reason why the Scandinavian Wikipedia is doing so extremely well. But this is not the only reason. Zachte showed for example that the English language Wikipedia is edited by an very high number of people from India. Zachte also showed that most edits come from Europe, which can be explained by the high amount of languages here. When a big disaster or worldwide event happens, a Wiki page appears of it in all the different languages.
There is a big correlation between the amount of habitants of a country and the size of the Wikipedia in that language. By putting a geographical map with the population density over a map with the sizes of each Wikipedia, Zachte showed interesting outliers. A nice piece of datavizualization. Zachte ended his presentation by addressing the rise of mobile internet. In African countries, not many people own a desktop computer with an internet connection. There is hoewever, a big rise in the use of smart phones. Zachte therefore beleives that the Wikimedia foundation should make their site more accessible for editing the pages with these devices in order to create a larger Wikipedia.

In the end I can look back on a very well organized event with lots of interesting presentations. The cultural heritage track gave a nice overview about what is happening at the moment in the field and how open content and standards can help spread the content. The Wiki-world track was for me however the most fascinating. It reminded me of all the researches that were done last year in the Digital Methods and datavisualization classes of my MA New Media studies and the fact Wikipedia and all its data is such an interesting object of study. I hereby want to thank the organization for a great day and I hope to able to be part of it next year.

Creative Commons Licentie
Dit werk is gelicenseerd onder een Creative Commons Naamsvermelding-GelijkDelen 3.0 Nederland licentie


Wikipedia and the Utopia of Openness: How Wikipedia Becomes Less Open to Improve its Quality

October 15, 2011 2 comments

I found out today that I have never posted my final paper of the Digital Methods of Internet Research. During my year in the Master New Media at the UvA, this was one of the most interesting researches I have worked on. With a final grade of 8.5, I was also asked to present it on the Digital Methods Conference. In this blog post, I have put down the abstract and the method. If you find it interesting, the full paper can be found here under a CC-BY-SA license.


Wikipedia has become an enormous source of information in the last decade. Because of its ubiquitous presence on the internet and the speed of which it is updated, it has become more than a reference. It becomes ‘a first rough draft of history’. In this study the changing politics of openness are analyzed. By looking at both small articles, as well as one extremely popular, the role of openness and transparency within Wikipedia is discussed. In this study I point out that in order to improve the quality of Wikipedia, it is sometimes necessary to limit the amount of openness, which is not a problem as long as the process remains completely transparent. At the same time, more transparency is needed to improve the smaller articles, which are often created by a single person.


In this paper, I want to take a deeper look inside Wikipedia and the way that the articles are created. Who is responsible for the content that can be found on Wikipedia? What is the consequence of the fact that ‘anyone can edit’ at any time and how is dealt with a project that has become so incredibly large? In the first part I will point out how Wikipedia works. The basics of Wikipedia will be explained and a more in-depth analysis of the politics of Wikipedia is done. By looking at the rules and regulations of Wikipedia, as well as how they are actually regulated by the community I will point out how Wikipedia has managed to control such a large group of editors and created an encyclopedia of high quality in stead of an anarchistic chaos.

In the second part, a closer look is taken to how an article is created and how it develops. Who creates the article? Is it a dedicated member of the community or an anonymous user who believes he can add something to the encyclopedia,? It is also interesting to see what happens after the creation. How does the community respond and what kind of edits are made? By taking a couple of articles as a case study, this will be made clear. This will make clear that a user should look at the average Wikipedia article more critically. Since this is hard for the average not so media-savvy Wikipedia user, Wikipedia should make this process of creation more insightful

In the third part, a more closer look will be taken to articles who are subjected to heavy editing. By taking a more deeper look into the Wiki article about Julian Assange the it will be made clear how the community responds on a topic like this and what this means for the idea of the ‘open’ and collaboration.

From this analysis, I conclude that the role of Wikipedia has changed, it has gone to be more than an encyclopedia, as it functions as an up to date news source. This has implications for the openness of Wikipedia and other ideas from the early days. To make sure Wikipedia can stay and become a more reliable source of information, transparency is the key.


The fact that Wikipedia is becoming bigger everyday, both in size, as in its ubiquitous presence, makes it an important object of study. On a daily base, millions of people use Wikipedia as a source of knowledge. The Wikipedia community is well aware of this and does its utmost best to create articles of better quality. This is not only done by checking new edits by both humans and bots, but also by creating new policies and guidelines. It seems that in the ten years of existence, the ideology of the early days has been abandoned. Rules can in fact be made and changed and the amount of openness can de reduced, as long as it benefits the quality of the content.

Wikipedia has developed from a small and open project, into a huge bureaucracy. This has several implications. It has become harder to start editing Wikipedia, new users often are frustrated by the wall of bureaucracy they run into and are therefore demotivated to become a Wikipedian. The consequence of this is that a declining group of people, is forming one of the biggest sources of knowledge. At the moment this does not affect the popular articles. As showed in the study to Julian Assange’s page, it is checked and discussed more than ever, despite the limited accessibility. It can however, reflect on the quality of smaller articles since more expertise is required and may as well lead to more conflicts between editors.

The increasing bureaucracy has two effects. On the one hand it decreases the amount of transparency. Because of the enormous growth of the policies and guidelines, it becomes harder to get the basic rules of Wikipedia and to see why a decision is made. At the same time, the user can assume that the article is of better quality because the content that is actually in the article, complies to all the rules. This however, does not apply to articles where only one editor created all the content. Most of the rules have to be checked by other users. As this research has shown, the text created in less popular articles is usually not changed much after that. The only edits that were made are text formats or adding categories and inlinks.

Therefore, I suggest that Wikipedia must give more attention to how the specific article is created and make it visible for every visitor. This way it brings back the transparency that has always been so important and improves the knowledge of the reader. It should be shown in the article how many users created it. For example, note a percentage in the top that shows how many of the content of the article is written by the same person and how many edits were made all together. This gives the user a better idea if an article is trustworthy and unbiased. By making the creating process even more transparent, it becomes easier for the user so decide how to approach the given information

It is up to Wikipedia as well as scholars to study better ways of indicating the quality of the article. With more than 3.5 million articles in the English-language Wikipedia, this can not be done efficiently by the human contributors, which numbers are slowly declining. New ways have to be found to automatically identify the quality of an article, as some researchers have already started discussing. This way, Wikipedia can indicate the quality of the article and show this to the user. This does not only make the user more aware of the fact that the content of Wikipedia is not perfect, it makes it also possible to automatically generate lists for the Wikipedians of articles that need to be checked for quality. It might even be possible to regulate the edit options automatically, giving more access when an article has proven to be of less quality, decreasing the amount of bureaucracy for starting editors.

This study has shown that Wikipedia has transformed since it was found, leading to a more bureaucratic organization. This has several implications, mainly on the openness of Wikipedia. As pointed out, these decisions can benefit the quality of Wikipedia, as long as the process remains completely transparent. By making less popular articles also more transparent, not only the quality of the content will improve, but it also notifies the reader how reliable an article is.

Europeana and the possibilities of data-visualization

June 2, 2011 2 comments

I just received my grade for the paper I wrote for the datavisualization course.

In this paper, I discuss Europeana and their problems of showing their objects in a European context. By visualizing the data in different ways, the objects can tell a story about the history of Europe. Here, I also discuss our visualization project and how we tried to achieve this goal of showing the European context of the Europeana project. I left the comments of the lecturer, Bernhard Rieder, in there to show some critique points.

Click here to Download