Detailed information about the course

[ Back ]

Winter School 2022: «Data Science in the Digital Era»


31 janvier - 4 février 2022

Lang EN Workshop language is English
Responsable de l'activité

Pascal Felber

  • Prof. Jacques Savoy, Université de Neuchâtel
  • Prof. Pascal Felber, Université de Neuchâtel
  • Prof. Adrian Holzer, Université de Neuchâtel
  • Dr Valerio Schiavoni, Université de Neuchatel 
  • Dr Aris Xanthos, Université de Lausanne
  • Prof. Mike Kestemont, Univ. Antwerp, Belgique
  • Dr Folgert Karsdorp, KNAW Meertens Instituut, Hollande
  • Nava Tintarev, Maastricht University (remotely connected)
  • Prof. Etienne Rivière, Univ. Louvain, Belgique

In a first part, the focus will be on digital humanities. In this context, Python has rapidly emerged as one of the most popular programming languages across education and science. The language is characterized by a potent ecosystem of third-party packages offering wonderful support to researchers in data science, natural language processing and machine learning. We will be teaching selected chapters from a recently published intermediate textbook, Humanities Data Analysis (Princeton University Press, 2021), which we co-authored with Allen Riddell. In this first part, we will cover as many chapters from the book, using the interactive notebook format for teaching, with an emphasis on practical exercises. The course content builds on complex, real-world case studies from Humanities disciplines, such as history, literature and linguistics, organized in 4 sessions:

  • Session 1: Using examples from the world of Shakespeare, we will learn how to parse various structured data sources, such as JSON and XML.
  • Session 2: Using historic baby naming data, we'll perform a quantitative analysis of tabular data to chart diachronic trends.
  • Session 3: We'll cover the vector space model using a collection of French plays and delve into the topic numerical computational.
  • Session 4: In this final session, we will cover the basics of stylometry and authorship attribution, including techniques for unsupervised learning (e.g. clustering).


In a second part, we will focus on measuring viewpoint diversity in news consumption. In this context, the growing volume of digital data stimulates the adoption of recommender systems in different socioeconomic domains, including e-commerce, music, and news industries. While news recommenders help consumers deal with information overload and increase their engagement and satisfaction, their use also raises an increasing number of societal concerns, such as "Matthew effects", 'filter bubbles", and an overall lack of transparency. Considerable recommender systems research has been conducted on balancing diversification of content with relevance, however this work focuses specifically on topical diversity. For readers, diversity of _viewpoint_ on a topic in news is however more relevant. This allows for measures of diversity that are multi-faceted, and not only driven by individual consumption habits. This talk introduces work aiming to find ways to help users explore viewpoint diversity. The talk will describe our first steps toward informing diverse content selection in a way that is meaningful and understandable, to both content providers and news readers.


The third part, titled 'Systems and infrastructures for data science: How to make big data processing efficient, robust, and maybe a little greener.' is as follows. Data science relies on the processing of vast amount of data. Multiple frameworks exist that help data scientist or machine learning experts express their computations, such as Apache Spark, Apache Flink, or TensorFlow. In this lecture we will focus on the efficient management of the infrastructure necessary to run computations using these frameworks (servers, storage, and networking) but also on the robustness of such executions: When so many servers are used to support a long-running computation, how can we ensure that not everything is lost if one server fails? The content of the lecture will be based on recent advances from academia and the industry, examples from prominent open source projects, and will cover use case scenarios from smaller installations such as a data lake used by SMEs to massive scale ones such as found at companies like Google.


Hôtel Suisse, Champéry





Attendees are required to attend the whole duration of the school (5 days, 4 nights).

Partial attendance is not permitted (e.g., you cannot arrive X days later, you cannot leave Y days earlier). We will collect attendance signatures on a daily base. In case of partial attendance, CUSO will invoice the full price of the hotel accomodation to the student.

Exceptions to be considered on a case-by-case.


Lodging and meals.

Accomodation in shared double room at the Hotel Suisse in Champery is included in the registration fee. If you have a preference regarding the person with whom to share your room let us know, we will try to accommodate as best as we can.

Breakfast is included.


Sanitary restrictions

Due to the ongoing sanitary conditions, at the time of the School we will require attendees to follow the OFSP rules to attend in-person lectures inside a (large) class rooms.


CUSO covers the costs of transportation corresponding to a 2nd-class half-fare ticket from your home institution to Champery.

Lunch meals are self-organized and not included in the participation fees.

Dinners, including a special social evening, included and covered by the participation fees.



Deadline for registration 30.01.2022

short-url short URL

short-url URL onepage