Detailed information about the course

[ Back ]
Title

Winter School 2022: «Data Science in the Digital Era»

Dates

31 janvier - 4 février 2022

Lang EN Workshop language is English
Responsable de l'activité

Pascal Felber

Organizer(s)
  • Prof. Jacques Savoy, Université de Neuchâtel
  • Prof. Pascal Felber, Université de Neuchâtel
  • Prof. Adrian Holzer, Université de Neuchâtel
  • Dr Valerio Schiavoni, Université de Neuchatel 
  • Dr Aris Xanthos, Université de Lausanne
Speakers
  • Prof. Mike Kestemont, Univ. Antwerp, Belgique
  • Dr Folgert Karsdorp, KNAW Meertens Instituut, Hollande
  • Prof. Nava Tintarev, Maastricht University (remotely connected)
  • Prof. Etienne Rivière, Univ. Louvain, Belgique
Description

In a first part, the focus will be on digital humanities. In this context, Python has rapidly emerged as one of the most popular programming languages across education and science. The language is characterized by a potent ecosystem of third-party packages offering wonderful support to researchers in data science, natural language processing and machine learning. We will be teaching selected chapters from a recently published intermediate textbook, Humanities Data Analysis (Princeton University Press, 2021), which we co-authored with Allen Riddell. In this first part, we will cover as many chapters from the book, using the interactive notebook format for teaching, with an emphasis on practical exercises. The course content builds on complex, real-world case studies from Humanities disciplines, such as history, literature and linguistics, organized in 4 sessions:

  • Session 1: Using examples from the world of Shakespeare, we will learn how to parse various structured data sources, such as JSON and XML.
  • Session 2: Using historic baby naming data, we'll perform a quantitative analysis of tabular data to chart diachronic trends.
  • Session 3: We'll cover the vector space model using a collection of French plays and delve into the topic numerical computational.
  • Session 4: In this final session, we will cover the basics of stylometry and authorship attribution, including techniques for unsupervised learning (e.g. clustering).

 

In a second part, we will focus on measuring viewpoint diversity in news consumption. In this context, the growing volume of digital data stimulates the adoption of recommender systems in different socioeconomic domains, including e-commerce, music, and news industries. While news recommenders help consumers deal with information overload and increase their engagement and satisfaction, their use also raises an increasing number of societal concerns, such as "Matthew effects", 'filter bubbles", and an overall lack of transparency. Considerable recommender systems research has been conducted on balancing diversification of content with relevance, however this work focuses specifically on topical diversity. For readers, diversity of _viewpoint_ on a topic in news is however more relevant. This allows for measures of diversity that are multi-faceted, and not only driven by individual consumption habits. This talk introduces work aiming to find ways to help users explore viewpoint diversity. The talk will describe our first steps toward informing diverse content selection in a way that is meaningful and understandable, to both content providers and news readers.

 

The third part, titled 'Systems and infrastructures for data science: How to make big data processing efficient, robust, and maybe a little greener.' is as follows. Data science relies on the processing of vast amount of data. Multiple frameworks exist that help data scientist or machine learning experts express their computations, such as Apache Spark, Apache Flink, or TensorFlow. In this lecture we will focus on the efficient management of the infrastructure necessary to run computations using these frameworks (servers, storage, and networking) but also on the robustness of such executions: When so many servers are used to support a long-running computation, how can we ensure that not everything is lost if one server fails? The content of the lecture will be based on recent advances from academia and the industry, examples from prominent open source projects, and will cover use case scenarios from smaller installations such as a data lake used by SMEs to massive scale ones such as found at companies like Google.

Program
Monday 31.01 Conference room next to the Hotel Suisse    
morning Arrival of participants    
13:50 Intro/welcome day-1 10 min.  
14:00 Lecture 1.1: Data Science in Python: An introduction using historic cookbooks 90min. Pr Mike Kestemont (in person)
15:30 Break 15 min  
15:45 Lecture 1.2: Parsing structured data in Python: the case of Shakepeareana 90min. Pr Mike Kestemont (in person)
17:15 Cake and Tea Time (@ hotel's lounge) 15min  
17:30 Lecture 1.3: A dramatic vector space: exploring French theatre with distance metrics 90min. Pr Mike Kestemont (in person)
19:00 Dinner (At'Home, included)    
Tuesday 01.02 Conference room next to the Hotel Suisse    
08:25 Welcome day-2 5 min.  
08:30 Lecture 2.1: Pandas: revealing trends in baby naming (1) 90min. Pr Folgert Karsorp (in person)
10:00 Break 15min  
10:15 Lecture 2.2: Pandas: revealing trends in baby naming (2) 90min. Pr Folgert Karsorp (in person)
11:45 Free Time    
16:30 Cake and Tea Time (@ hotel's lounge) 30min  
17:00 Lecture 2.3: Stylometry in Python: the unreasonable effectiveness of the bag of words model 90min Pr Folgert Karsorp (in person)
19:00 Social Dinner (Cantine sur Coux)    
Wed 02.02 Conference room next to the Hotel Suisse    
08:50 Welcome day-3 10 min.  
09:15 Students activity (Elevator Pitch Session 1) 60min.  
10:15 Break 15 min  
10:30 Students activity (Elevator Pitch Session 2) 60min.  
11:30 Free Time    
16:00 Cake and Tea Time (@ hotel's lounge)    
17:00 Free Time and informal discussions    
19:00 Dinner (At'Home, included)    
Thursday 03.02 Conference room next to the Hotel Suisse    
08:50 Welcome day-4 10 min.  
09:00 Lecture 3.1: Toward Measuring Viewpoint Diversity in News Consumption (1) 45min. Pr Nava Tintarev (online)
09:45 Break 15 min  
10:00 Lecture 3.2: Toward Measuring Viewpoint Diversity in News Consumption (2) 45min. Pr Nava Tintarev (online)
10:45 Free Time    
16:30 Cake and Tea Time (@ hotel's lounge) 30min  
17:00 Lecture 4.1: Systems and infrastructures for data science (1) 90min Pr Etienne Riviere (in person)
18:30 Free Time 30min  
19:00 Dinner (Le Nord, included)    
Friday 04.02 Conference room next to the Hotel Suisse    
08:50 Welcome day-5 10 min.  
09:00 Lecture 4.2: Systems and infrastructures for data science (2) 90min. Pr Etienne Riviere (in person)
10:30 Break 15 min  
10:45 Lecture 4.3: Systems and infrastructures for data science (3) 90min. Pr Etienne Riviere (in person)
12:15 Closing  10 min.
Location

Hôtel Suisse, Champéry

Map

Map

Information

Attendance.

Attendees are required to attend the whole duration of the school (5 days, 4 nights).

Partial attendance is not permitted (e.g., you cannot arrive X days later, you cannot leave Y days earlier). We will collect attendance signatures on a daily base. In case of partial attendance, CUSO will invoice the full price of the hotel accomodation to the student.

Exceptions to be considered on a case-by-case.

 

Lodging and meals.

Accomodation in shared double room at the Hotel Suisse in Champery is included in the registration fee. If you have a preference regarding the person with whom to share your room let us know, we will try to accommodate as best as we can.

Breakfast is included. Lunch meals are self-organized and not included in the participation fees.

Dinners (excluding alchoolic beverages), including a special social evening event, are included and covered by the participation fees.

 

Sanitary restrictions

Due to the ongoing sanitary conditions, at the time of the School we will require attendees to follow the OFSP rules to attend in-person lectures inside a (large) class rooms.

Expenses

CUSO covers the costs of transportation corresponding to a 2nd-class half-fare ticket from your home institution to Champery.

Lunch meals are self-organized and not included in the participation fees.

Dinners, including a special social evening, included and covered by the participation fees.

Places

32

Deadline for registration 30.01.2022
Contact

[email protected]

short-url short URL

short-url URL onepage