Searching information on the Internet: legal implications

Notes from the research seminar Searching information on the Internet: legal implications, by Julià Minguillón, held at the Open University of Catalonia, Barcelona, Spain, on April 29th, 2010.

Searching information on the Internet: legal implications
Julià Minguillón, Department of IT, Multimedia and Telecommunications Department

Tim Berners-Lee creates the World Wide Web, based on a structure and protocols that require linking to work. The URL or URI identify documents that can be found on the Internet, creating a directed graph: A points to B, but we (usually) cannot walk the inverse way, the link is not reversible (i.e. you need another link to go from B to A, the initial A to B link does not serve this purpose).

There are two main strategies to explore the Internet and find information within: browsing and searching.


One of the “problems” of the Internet is that, as a graph, it’s got no centre: the Internet as no centre or place that can be considered as its begin.

There are some initiatives to map the Internet, to index it (like the Open Directory Project, but the speed of growth of the Internet have made them difficult to maintain… and even to use.

If you cannot see the presentation please visit


  • a web crawler explores the Internet, retrieving information about the content and the structure of a web site;
  • an index is created where the information is listed and categorized, and
  • a query manager enables the user to ask the index and retrieve the desired information.

Web crawlers require that pages are linked to be able to visit them. Ways to prevent web crawlers to explore a web site (besides unlinking) is protection by username/password, use of CAPTCHAs, use of protocols of exclusion (e.g. in robots.txt files), etc.

Protocol of exclusion (robot.txt):

  • Has to be public;
  • Indication, not compulsory;
  • Discloses sensible information;
  • Google hack: intitle:index.of robots.txt


  • Search engines find sensible information.
  • Content and links are different things. A linked content might not be in the same place as the source content where the link is published.
  • Users can link sensible information/contents.
  • Broken links and permalinks: content might be moved but engines/users might track and re-link that content.
  • Outdated versions (cache): to avoid repeated visiting, search engines save old versions of sites (caches), which stand for a specific time even if some content is deleted.
  • Software vulnerabilities:
  • Browsing patterns (case of AOL): what a user does on the Internet can be tracked and reveal personal information.

Nowadays, most ways to remain anonymous on the Internet is opting out of services like web crawling by search engines.

With the Web 2.0 things become more complicated. Initiallly, “all” content was originated by the “owner” of a website: you needed a hosting and to directly manage that site. When everyone can create or share content in a very easy and immediate way, the relationship server/hosting-manager-content is not as straightforward as it used to be.

Linking and tagging also complicate even more the landscape. And with the upcoming semantic web, cross-search and crossing data from different sources can make it easy to retrieve complex information and find out really sensible information.


  • Users demand more and more services and are willing to give their privacy away for a handful of candies.
  • Personalization is often on a trade-off relationship with privacy, and people demand more personalization.
  • Opt-in should be the default, but it raises barriers to quick access to sites/services, hence opt-out is the default.
  • An increased trend in egosurfing and aim for e-stardom is accompanied by an increasing trail of data left behind by users.


  • The creator of content
  • The uploader
  • The one who links
  • The one who tags
  • Search engines
  • End users
  • ISPs
  • Aggregators
  • Developers
  • Social networking sites
  • etc.


Ramon Casas points at Google cache and, while being not strictly necessary to run the search engine, it represents an ilegal copy and/or access to content that (in many cases) was removed from its original website. In his example the museum closes at 20:00 but Google leaves the back door open until 22:00.


Bruce Kasanoff (2001). Making It Personal: How to Profit from Personalization without Invading Privacy. See a review by Julià Minguillón at UOC Papers.


The micro and macro approaches of ICTs in Education

During April 15th and 16th, 2010, the advisory board of the Horizon Report Latin America 2010 has had a meeting to reach a consensus on what will educational technology look like in the following years, after having been working online for several weeks.

Besides the debates on the technologies that have been mentioned along the online process and during the face-to-face meeting — collaborative environments, social networking sites, social media, open content, mobile technologies, cloud computing, personal learning environments, augmented reality, smart classrooms, the semantic web, gestural-based computing, etc. — one of the main take-aways I bring home has been realizing the huge chasm between the micro and the macro approaches of ICTs in Education.

Approaches in ICTs in Education

As a micro approach of ICTs in Education I understand the analysis of the impact of ICTs on the educational process — teaching and learning, that is, how methodologies and the daily work will change when ICTs enter a specific educational process. A simple example is whether the dynamics of the classroom will change if kids come in with their laptops, in what direction and what will be the extent of the impact (if any).

As a macro approach of ICTs in Education I understand the analysis of the impact of ICTs on Education as an institution (and/or its instititutions). In other words, how the arrival of ICTs will change the role of schools and universities and their teachers, their legitimacy, their added value and “business” plans, etc. A simple example is whether the abundance of (digital) information will reinforce informal education and render formal education out-dated and useless in the end.

In my opinion, most people share the micro approach, less people share the macro approach, and but a very few try and combine both visions. Ironically enough, both the micros and the macros see each other as technophiles, techno-optimists or techno-utopians.

Macros think that micros do not “think out of the box” and just look at the technologies and their role in the tiny universe of the classroom, while forgetting about the wide (socioeconomic) context outside of it, which is what is, in fact ruling all changes.

Micros think that macros forget about pedagogy — which is what the whole thing was about — and focus instead on cool and trendy lucubrations that have little to do with the real life of teachers and students.

Example: digital natives

Let us take as a first example the case of digital natives (for the sake of simplicity, let us use the term to describe a set of students that grew using technology usually and comfortably). A micro approach will consider digital natives worth being taken into account for several reasons: they might have new (digital) competences that can be leveraged for learning; they might be able to retrieve information quicker than the teacher himself (with the related legitimacy issues for the latter); the might have or develop different cognitive strategies, hence teaching methodologies should be revised; etc.

Macros will look at digital natives from a very different point of view: digital natives define their identities and their socialization strategies in new ways, thus affecting the role of all institutions (Education amongst them); their concept of success (enjoy what one is doing at work) might be different from baby-boomers (money and power) or generation-Xers (self-realization), thus requiring from education radically different roles and outcomes; they might have learnt new horizontal and networked communication techniques, then asking for horizontal and non-hierarchical relationships with peers, institutions and leaders (politicians, bosses, teachers…); etc.

Example: personal learning environments

The micro approach will probably compare personal learning environments with portfolios — or e-portfolios in the best case — and consider them a good thing for creativity, a good thing to track students’ progress, but a good piece of mess in the middle run and something that will require a good piece of effort on the teacher’s side to obtain digital skills and get monitoring tools. All in all, a practicality.

For macros, PLEs are the (punk) revolution. PLEs enable autonomy, the richness of non-hierarchical connections, the raise of informal education. Combined with social media and open educational resources, PLEs capsize not the classroom but the entire education system as we know it. Really.

Example: Smart classrooms

Micros find smart classrooms — from digital blackboards to remotely controlling a telescope orbiting the Earth — as the quintessence of ICTs in education. At last, “real” and “cheap” simulations are possible. Rivers of data flow into the classroom and can be managed at will. Is a teaching and learning experience on steroids, rich, visual, hands-on (without the inconvenience of things blowing up in your face or the expensive investments in bricks-and-mortal labs).

For macros, smart classrooms are, in most cases, but the perpetuation of the old-fashioned and out-dated way of teaching in a world that has changed (but in the classrooms). That simple.

Micros and macros

In the best of scenarios (e.g. digital natives), a technology or a technology-based trend or change is acknowledged by both sides. For different reasons, though, but there is an agreement on the importance. In the worst of scenarios, not only disagreement but opposition is found.

From my own experience — though generalizations are always wrong and cruel exercises — the micro approach is more often adopted by older generations, deeply rooted or interested in the hard-core parts of pedagogy and educational methodologies… and sometimes not mastering or even ignoring some of the technologies they are talking about. On the other hand, amongst macros I have mainly found younger people, tech-savvy or simply geek, and often not coming from Pedagogy but Sociology, Communication Science, Economics, Information Science which shifts them towards context because they are not knowledgeable of the core issues.

We absolutely need to bridge these two. In my opinion, the micro approach seriously lacks a good amount of e-awareness: they are many times refurbishing a ship without noticing that it is heading the highest waterfall. The macro approach sometimes surprisingly seems to forget the role itself of institutions and how these are many times more emergent systems than top-down designs, and as emergent systems, they are made of little pieces working with small but stone-written codes.

Daniel Jiménez writes a very interesting commentary on these thoughts of mine, adding very interesting reflections, insights and references: Una postura crítica ante la relación entre tecnología y aprendizaje (comentario crítico)

Also speaking about the advisory board meeting on the Horizon Report Latin America 2010:


About Me