Dr Kat

Wominjeka!

I am a Lecturer at the School of Computing and Information Systems, University of Melbourne. I am also affiliated with Complex Human Data Hub. Most of my current research is in the field of computational linguistics. More specifically, I am using technology to explore systematicity and regularities in Language, and vice versa, utilizing the regularities in linguistic structures to improve NLP models and learning algorithms, with a focus on under-resourced languages and their documentation.

The ultimate goal of my research is to find universal principles and underlying structures in the organization of Human Language in general and languages in particular. Using statistical models, I aim to understand the constraints that led to the shared properties observed in most languages around the world.

Research Areas

— Technology for Under-resourced Languages and Field Linguistics:
     PhD students: Raphael Merx, Chris Guest, Aso Mahmudi
    Frequent Collaborators: Trevor Cohn, Nick Thieberger, Hanna Suominen, Nick Evans, Andreas Shcherbakov, Jey Han Lau, John Mansfield, Borja Herce
— Linguistic Typology and Cognition:
    PhD students: Zheng Wei Lim, Temuulen Khishigsuren , Demian Aaron Inostroza Amestica
    Frequent Collaborators: Charles Kemp, Trevor Cohn, Mae Carroll, Terry Regier, Mel Mistica
— Computational Social Science/AI in Education:
    PhD students: Naomi Baes , Noor De Bruijn, Pagnarith Pit
    Frequent Collaborators: Nick Haslam, Yoshi Kashima, Christine de Kock, Ed Hovy, Tanya Linden, Simon D'Alfonso
— Computational Approaches to Linguistic Morphology:
    PhD students: Aso Mahmudi
   Frequent Collaborators: Ryan Cotterell, David Yarowsky, Khuyagbaatar Batsuren, Jason Eisner, Mans Hulden, Chris Kirov, Sabrina J. Mielke, Eleanor Chodroff, Elizabeth Salesky, Omer Goldman, Reut Tsarfaty

I co-organize SIGTYP workshops and shared tasks (2019--) and the SIGMORPHON shared tasks on morphological reinflection (2017--). Since 2022, I am also co-running FieldMatters and LoResMT. I am an active member of the UniMorph Project.

Most Recent Publications

Merx, R., Suominen, H., Hong, L., Thieberger N., Cohn, T., Vylomova, E. (2025). Tulun: Transparent and Adaptable Low-resource Machine Translation. To Appear at ACL 2025 (System Demonstration). [Demo] [Video] [Code]

Baes, N., Merx, R., Haslam, N., Vylomova, E., & Dubossarsky, H. (2025). A General Framework to Evaluate Methods for Assessing Dimensions of Lexical Semantic Change Using LLM-Generated Synthetic Data. To Appear at ACL 2025 (Findings).

Merx, R., Correia, A. J. G., Suominen, H., & Vylomova, E. (2025). Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service. In Proceedings of the 8th Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025) @ NAACL (Nominated for the Best Paper Award). [Talk]

Khishigsuren, T., Regier, T., Vylomova, E., & Kemp, C. (2025). A Computational Analysis of Lexical Elaboration Across Languages. In Proceedings of the National Academy of Sciences, 122(15) [OSF] [Demo] Discussed on The Conversation, Language Log, Anthropology.Net

Mahmudi, A., Herce, B., Améstica, D. I., Scherbakov, A., Hovy, E., & Vylomova, E. (2025). Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection. In Proceedings of the 18th Workshop on Building and Using Comparable Corpora (BUCC) (pp. 62-72).

Lim, Z. W., Vylomova, E. , Kemp, C., & Cohn, T. (2024). Predicting Human Translation Difficulty with Neural Machine Translation. Transactions of the Association for Computational Linguistics, 12, 1479-1496. [Code] Discussed on Slator

Merx, R., Vylomova, E. , & Kurniawan, K. (2024). Generating Bilingual Example Sentences with Large Language Models as Lexicography Assistants. In Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association (Best Paper Award).

Baes, N., Haslam, N., & Vylomova, E. (2024). A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1390-1415). [Code]

Lim, Z. W., Vylomova, E., Cohn, T., & Kemp, C. (2024). Simpson’s Paradox and the Accuracy-Fluency Tradeoff in Translation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 92-103). [Code] Discussed on Slator

Lim, Z. W., Stuart, H., De Deyne, S., Regier, T., Vylomova, E., Cohn, T., & Kemp, C. (2024). A Computational Approach to Identifying Cultural Keywords Across Languages. Cognitive Science, 48(1). [OSF] [Code]

Please visit my Google Scholar profile for the full list.