Figure 1: Number of publications per country (2007)
Figure 2: Number of articles per country in Wikispeedia (2007)
Today, content on the internet is still mostly skewed towards Western societies [1] [2]. Interestingly, those same societies also produce most of the human knowledge, which can be proxied by the number of citable publications [3]. Wikispeedia is an online game built on 4604 Wikipedia articles from 2007, during which players are navigating from a given start to a target end article through the links contained in the articles. In this project we intend to investigate how players navigate through the game and how this navigation is influenced by the production of scientific knowledge in the world. More precisely, we are interested in understanding whether players are attracted towards articles linked to countries producing a lot of scientific knowledge.
As a small appetizer, let us throw a quick look at figure 1 and 2. We can already see the link between the two issues at hand: the distribution of articles per country in the Wikispeedia graph seems very closely related to the distribution of scientific knowledge production in the world. But how strong is this link? And how does it impact the players’ behavior in the game? Let’s dive into the details to find out!
The first step will be to understand the navigation patterns of players in the game. For this, we will compare two hypotheses, namely the “passive” and the “active” hypothesis:
Once this first analysis is done, we will be in a good position to investigate what is the players’ intrinsic bias, and whether it is related in any way to the production of scientific knowledge in the world.
Now, to succeed in this quest we are obliged to meet certain requirements. First, as we are working with a global database and as we intend to investigate worldwide geographical biases, we need to associate each article with a country. Next, to quantify the players’ behavior in the game we will use their clicking patterns, more precisely the number of times each article is clicked. Finally, as we are interested in showing how the production of scientific knowledge impacts players’ behavior, we need to match each of the previously defined countries to the number of publications produced within those countries during the year 2007.
Let’s jump to the next section to gain precious insights into our methods and data preprocessing!