09:45am - 10:00am

Professor Julien Longhi

Title: Welcoming and presentation of Arenas project.

Bio: Julien is a full professor at CY cergy Paris Université (Agora laboratory). He specializes in political discourse analysis and digital humanities applied to social networks. He is the overall coordinator of the ARENAS project and leader of WP1, which is responsible for coordination and project management. Julien is also a member of the WP2 team which examines the characterization and detection of extremist narratives where he is working on task 2.4. As a member of WP4 he will be examining the impact of extremist narratives while his work on WP6 will focus on Ethics, norms and Politically sensitive Science.

10:00am - 11:00am

Professor Farah Benamara

Title: Hate Speech Detection @IRIT-Melodi: An overview

Resume: In this talk, I'll provide an overview of my research activities on hate speech detection, with a focus on sexism, racial stereotypes as well as models generalization across abusive language datasets.

Bio: Farah Benamara" is Professor in computer science at Toulouse University Paul Sabatier. She is member of IRIT laboratory and co-head of the MELODI group . Her research concerns Natural Language Processing and focuses on the development of semantic and pragmatic models for language understanding with a particular attention on evaluative language processing, discourse processing and information extraction from texts. She published more than 100 publications in peer-reviewed international conference and journal papers. She has been designed to be area chair at ACL 2019, EACL 2021, 2024 and Senior Area Chair at NAACL 2024 and ECAI 2024. She is also associate editor of IEEE Affective Computing and member of the editorial board of Traitement Automatique des Langues (TAL) and Dialogue and Discourse (D&D). She co-edited a special issue on contextual phenomena in evaluative language processing in the journal of Computation Linguistics. She is PI of several projects among which DesCartes at CNRA@CREATE Singapore on hybrid IA for NLP, Sterheotypes an EU project on the detection of racial stereotypes, QualityOnto an ANR-DFG project on fact-checking for knowledge graph validation and finally INTACT, a CNRS prematuration project on NLP-based crisis management from social media.

11:00am - 11:30am

Coffee break

11:30am - 12:15pm

Dr. Senja Pollak

Title: Detecting offensive speech and analysing migration-related discourse

Resume: The talk will first present selected natural language processing methods for detecting offensive comments, with a focus on cross-lingual methods, active learning and interpretability of results. Next, we will cover a range of computational social science methods for analysing migration-related discourse on Twitter, parliamentary data and news media, including the methods for analysing the differences in discourse of left- and right-wing politicians and in media coverage of Syrian and Ukrainian migration periods. Special focus will be put on analysing dehumanisation perspective.

Bio: Senja Pollak is a researcher at the Department of Knowledge Technologies, Jožef Stefan Institute, Slovenia. Her research focus is on natural language processing for media analysis. She was coordinator of the H2020 project EMBEDDIA on cross-lingual news analysis, co-lead RobaCOFI project on comment filtering, and acted as the leader of national project CANDAS on multilingual computational news discourse analysis. She participates in several other projects, including national projects Embeddings-based techniques for Media Monitoring Applications and Hate speech in contemporary conceptualizations of nationalism, racism, gender and migration. Between 2018 an 2019 she was a research fellow at University of Edinburgh.

12:15pm - 1:15pm

Professor Julien Velcin

Title: DIKé project: some attempts to improve fairness in (compressed) language models

Resume: Large language models (LLMs) have caused a significant upheaval in practices regarding automated processing of textual data. As a researcher, LLMs represent both a powerful tool that we can utilize to expand the capabilities of the models we develop and an exciting object of study. Research has, for instance, shown that LLMs exhibit numerous biases that need to be accounted for to ensure their proper utilization. In this presentation, I will illustrate several examples drawn from projects conducted recently at the ERIC laboratory.

Bio: Julien Velcin is a Professor of Computer Science at Lumière Lyon 2 University and a member of the ERIC laboratory. After completing a thesis in Artificial Intelligence in Paris under the supervision of J.G. Ganascia, he was recruited in 2007 as an Assistant Professor to develop activity around text mining at Lyon 2 University and later as a full Professor starting in 2018. His research focuses on the analysis of information networks, developing new models and machine learning algorithms capable of processing structured textual data in the form of graphs. He has participated in several national research projects (ANR ImagiWeb, ANR LIFRANUM, ANR DIKé), some of which have a multidisciplinary dimension to address research questions in humanities and social sciences. He has published the results of his work through more than 100 articles, including 80 in the proceedings of international conferences such as IJCAI, WWW, ICDM. Additionally, he is the coordinator of the Computer Science Master's program at Lyon 2 University, responsible for the first year of the Master's program, and co-organizer of the HuNIS specialty pole (Digital Humanities, Connected Individuals and Societies). For more information, feel free to visit the website.

1:15pm - 2:45pm

Lunch break

2:45pm - 3:15pm

Dimitra Niaouri (Ph.D. ETIS - CY Agora)

Title: Machine Learning is heading to the SUD (Socially Unacceptable Discourse) analysis: from Shallow Learning to Large Language Models to the rescue, where do we stand?

Resume: During the rapid growth of social media, the prevalence of Socially Unacceptable Discourse (SUD) on online platforms presents a substantial challenge. SUD encompasses offensive language, controversial narratives, and distinctive grammatical features, necessitating a precise approach to characterization and detection within online discourse. In this talk we introduce a comprehensive corpus with manually annotated texts from various online sources, facilitating a global assessment of SUD detection solutions over 12 different classes. Our investigation distinguishes between Masked Language Models (MLM) and Causal Language Models (CLM), exemplified by models such as BERT, Llama 2, Mistral, and MPT, and takes a step towards their interpretability. Traditional ML models, including Support Vector Machines (SVM), are also examined. We will note that SUD classification is promising, but its susceptibility to class imbalance highlights the need for improved discriminant power. Our analysis emphasizes the nuanced trade-off between bidirectional contextual awareness (favoring MLMs) and sequential dependency modeling (advantageous for CLMs). We further underscore the necessity for consistent efforts in the ML community and broader implications for linguistics, discourse analysis, and semantics, advocating for formal guidelines. In fact, by focusing on interpretability, we avoid the black-box effect: we can learn about specific features of different aspects of our object, which can create fruitful scientific interactions with research on CMC corpora from corpus linguistics. Such scientific exchange is necessary for future annotation campaigns on SUD corpora or extremist narratives to ensure good interoperability between projects and works necessitating a precise approach to characterization and detection within online discourse.

Bio: As a newly enrolled Ph.D. candidate within the ETIS and AGORA labs, Dimitra’s academic journey is deeply rooted in linguistics and natural language processing (NLP). She is driven by a profound passion for advancing the frontiers of understanding within the domain of extremist narrative detection. Throughout her research path, Dimitra is committed to delving into the forefront of machine learning and deep learning methodologies, with the aim of unveiling intricate narratives in the digital sphere. Her primary objective is to develop innovative tools that can effectively delineate extremist narratives within diverse corpora, spanning various contexts such as social media and political discourse.

AI4EN Update Program - Tuesday March 26 - 2024 - All details here