Abstract: Collecting high-quality annotated data is a major bottleneck in developing multilingual NLP applications, as it often demands formulating and adhering to rigorous linguistic annotation guidelines which require considerable effort and expertise from task developers and recruited annotators. In this talk, I will present a line of work which obviates the need for formal linguistic guidelines through simple QA annotation, shown intuitive enough for large-scale, non-expert annotation, leading to state-of-the-art models. I will outline the implementation of this approach in an ongoing work towards datasets and models for 3 longstanding NLP tasks in 6 diverse languages (including Hebrew, Arabic, and Yiddish), and a promising model capable of transcribing missing parts in cuneiform tablets in extinct languages (Akkadian and Sumerian).
Bio: Dr. Stanovsky is a senior lecturer at the Hebrew University of Jerusalem. He did his postdoctoral research at the University of Washington and the Allen Institute for AI in Seattle, working with Prof. Luke Zettlemoyer and Prof. Noah Smith, and his Ph.D. with Prof. Ido Dagan at Bar-Ilan University. He is interested in developing text-processing models that exhibit facets of human intelligence with benefits for users in real-world applications. His work has received awards at top-tier conferences, including ACL and CoNLL, and recognition in popular journals such as Science and The New York Times.