Detailed information about the course

[ Back ]

The Wisdom of Crowds at the Service of Big Data


December 5th, 2017


Prof. Philippe Cudré-Mauroux, Université de Fribourg

Dr. Valerio Schiavoni, Université de Neuchâtel


Prof. Paolo Papotti, EURECOM, Nice, France

Prof. Paolo Merialdo, Università degli Studi Roma3, Rome, Italy


Nowadays, we are living an era of data deluge. This is a direct consequence of the widespread adoption of low-cost internet connections, connected smart devices and the ease for end-users to become data "producers". Hence, an increasing amount of such data is continuously pushed on the web for anyone to scrutinize. The analysis of such data calls for novel techniques to improve the quality of the analysis. This doctoral school event will introduce graduate students to state-of-the-art techniques that exploit the wisdom-of-crowds to carefully select relevant data and improve the quality of the results for its analysis. We invite two renowned experts and excellent speakers in the field of advanced data extraction, data mapping, data cleaning, and databases. The teachers will present basic techniques as well as the most recent advances, from advanced data cleansing techniques to crowd-based support (e.g., Yahoo's Mechanical Turk) for data analytics.

Agenda of the event
Lectures of 2 hours, with pause of 15 minutes.


Prof. Paolo Papotti will talk about the following:
Title: Cleaning data with the crowd
Abstract: Data is often dirty in nature because of several reasons, such as typos, missing values, and duplicates. The intrinsic problem with dirty data is that it can lead to poor results in analytic tasks. For instance, Experian reported that poor customer data cost British businesses £8 billion loss of yearly revenue. Therefore, data cleaning is an unavoidable task to have reliable data for final applications, such as querying and mining, but it is a hard problem that requires a great amount of manual work. In the "big data" era, the cleaning process cannot be handled by a single person, as it is impossible to handle the scale of the errors in large datasets. To address this challenge, several systems have been proposed to exploit crowdsourcing platforms, such as Amazon Mechanical Turk and CrowdFlower, in order to involve large numbers of users in the data cleaning activities.

In this lesson, we first describe data cleaning and crowdsourcing, highlighting the main challenges and solutions. We then look at recent results in tackling data cleaning with the help of the crowds. Finally, we discuss how these experiences are pushing several groups to explore new approaches for human-in-the-loop data preparation.

Prof. Merialdo will speak about the following:

Title: Crowdsourcing for data management
Abstract: Crowdsourcing provides access to a pool of human workers who can contribute solutions to tasks that are challenging for computers.Proposals have been made for the use of crowdsourcing in a wide range of data management tasks, including data gathering, query processing, data integration, and cleaning. We provide a classification of key features of these proposals and survey results to date, identifying recurring themes and open issues.

• 09h30-09h45 : welcome and intro
• 10h00-12h00 : Teacher 1, Pr Paolo Papotti
• 14h00-16h00 : Teacher 2, Pr Paolo Merialdo
• 16h00-17h00 : open discussion


UniMail, Université de Neuchatel, Room A-017



Deadline for registration 04.12.2017
short-url short URL

short-url URL onepage