Towards Large Scale Technology Impact Analyses: Automatic Residential Localization from Mobile Phone-Call Data
Type of work: Communication
Studies to understand the impact that demographic and socio-economic factors have in the use of cell phones have been traditionally carried out by social and technical researchers through the use of questionnaires and personal interviews. In recent years, and due to the pervasiveness of cell phones in emerging and developing economies, large datasets with millions of interactions are generated, anonymized and stored in real time by telecommunication and internet companies. However, these datasets do not typically contain any socio-economic information that characterizes the users. As a result, in order to understand the impact of socio-economic parameters on the use of mobile phones at larger scales, researchers have typically correlated the behavioral analyses drawn from the anonymous cell phone usage datasets to aggregated demographic or socio-economic parameters compiled by institutions like the National Statistical Institutes (NSI) or the World Bank (WB). In order to compute these correlations, the approximate residential location of the anonymized users is required. In general, carriers only have such information for users with a contract, which in emerging economies accounts for less than a 5%. In this paper, we propose a new technique to automatically predict the approximate residential location of anonymized cell phone users based on their calling behavior, assuming that we have a small set of users for whom their approximate residential location is known (the subscribers with a contract). Our results indicate that we can correctly predict the residential location of up to 70% of users with a coverage of 50%. By automatically associating cell phone users to geographical areas, we aim to provide a tool that facilitates the analysis – at a national or global scale – of the impact that socio-economic factors might have in the use of cell phones.