ICT4D Blog

Data Analysis and Visualization: The 15M Movement and Other Case Studies

Notes from the research seminar Data Analysis and Visualization: The 15M Movement and Other Case Studies, organized by the Internet Interdisciplinary Institute in Barcelona, Spain, on October 1st 2012.

Present: Javier Toret, Pablo Aragón and Oscar Marín, members of the group Datanalysis15M.

The research group #datanalysis15m was created to analyse the new movements that emerged in 2011: Arab Spring, Spanish Indignants, Occupy, etc. The main questions being: How can we measure augmented events? How can we measure new ways of organization, of communication, of engagement? How do ideas spread (virally)? How can we characterize network-systems?

About data

In 1969 ARPANET is born as a packet-switching network, which implies a major improvement in communications. With the World Wide Web in 1990, the user can consume information passively online with web browsers and circa 2004 the Web 2.0 is born, where the consumer also becomes a producer. All this activity is increasingly been traced and produces huge amounts of data. This is yet another evolution of the Internet which has been called Big Data.

There are many implications in the generation of such a big amount of data: privacy, security, commoditization of uses and users’ behavior, dematerialization of the economy, information overload, economics of attention, neuromarketing, etc.

Michael Cooley (Architect or Bee? distinguishes between data, information — organized data — knowledge — comprehended and applied information — and wisdom — knowledge put at the service of achieving some specific goals. Wisdom cannot be transmitted and always carries an ethical connotation.

After data acquisition, data analysis is crucial to be able to transform data into information: understanding and structuring data is the core of the information-building process. Last, but still very important, information can be presented in several ways, in what has been called information visualization.

How to organize information:

How technology shapes moods that engage people to act. If we can tell how mood is shaped by technology — or how technology can help in mood-shaping — then technology can help in choosing the appropriate time to invite and engage people to participate.

Engagement is also related to language: the use of the 1st person of plural is much more engaging and viral rather than other alternatives. “We are”, “we can”, etc. has way more punch than “I am” or “they can”.

Network or data laws:

Network Analysis

Network Analysis is deeply rooted in Graph Theory models.

Types of social relationships:

Average distance: number of intermediaries between two different nodes as an average.

Diameter (or effective diameter) of the network is the maximum distance between the most far away nodes. The diameter usually decreases as the network increases (Leskovec, 2007).

Density: proportion of links of a network in relationship with the total of possible links.

Giant component: the biggest connected component in a network. Outside of the giant component, groups are very small.

Clustering coefficient: measures the density of connections between neighbours of a node. Probability of a connection being the connection of another connection. Clusters are linked one to another through weak ties (Granovetter, 1983). Weak ties foster serendipity: weak ties have a higher potential to expose information to their contacts that they would otherwise never discover.

Reciprocity of a directed network measures how many of these relationships are really bi-directional.

Assortativity, associated with haemophilia, is the preference for relationships between users with same or different characteristics. If assortativity (r) is bigger than zero, the network is endogamic; if r

Degree distribution: how connections are distributed. Networks free of scale, where a small group of nodes have a high degree of connections and a long tail of nodes with a small number of connections.

Discussion

There are different social networks that are but different layers of the same reality. The purpose of social network analysis is to try to understand one of these layers and how does a specific layer feedbacks with the rest of layers and reality. One can usually find correlations between different networks and how they sync in emotions, contents, bodies.

It is also interesting to state that we increasingly see online behaviours being translated/transposed into “real life”. Not that online networks (their composition) is replicated offline, but that the practices of sharing, communication, decentralization, etc. are also put into practice even without digital technologies, thus reshaping traditionally organized networks.

We also see how information, communications, contents in online networks transcend the platform and permeate in other (offline) media, such as newspapers or TV news. Thus, even if the former network was not significantly representative of reality, the final message does get to a significantly representative share of people.

Another aspect to take into account is, even if the users of a specific social networking site are not representative of the population, whether their behaviour can be a good proxy to predict the general behaviour of the whole population. While this might sound a contradiction (not representativeness leading to predicting the whole population), the key could be in how this sample shapes the agenda of the whole population and thus, in the short or medium term ends up being a good proxy for prediction.

Share:

Exit mobile version