We propose a set of experiments with the general objective of ensuring a better understanding of technical health documents. Various experiments address different steps of this complex and ambitious process: (1) categorization of documents according to their complexity; (2) detection of complex passages within documents; (3) acquisition of resources for the lexical and semantic simplification of documents; (4) alignment of parallel sentences from comparable corpora for generating rules for syntactic transformation. According to the steps and tasks, various methods are exploited (rule-based, machine learning, with and without linguistic knowledge). In addition to text simplification, the results and resources can be used for other NLP applications and tasks (e.g., information retrieval and extraction, question-answering, textual entailment).


Natalia Grabar obtained her PhD from Université Paris 6. Her main research domain is Natural Language Processing applied to specialized languages, including medicine, biology, electrical engineering, chemistry, ideological texts, computer sciences. She worked for over a year at the NGO called Health on the Net in Geneva, Switzerland. She held a university and hospital position (AHU) for three years at Hôpital Européen Georges Pompidou (HEGP) and a research position at Inserm lab, working in medical informatics. Currently, Natalia is a researcher at CNRS (chargé de recherche classe 1). She is affiliated with UMR 8163 STL in Lille. Her research topics are related to building and use of terminologies; information retrieval and extraction; document typology; and quality, understanding and reliability of health and medical information.