DIG4EL: Computer-assisted generation of didactic grammars for endangered languages
Presenter(s)
Affiliation
Topic
Intersection of research, theory, and practice
Type
Papers
Abstract
This communication aims at presenting an effort supporting the teaching and learning of endangered languages by providing a dedicated method and software to facilitate the creation of didactic grammars. Over 60% of the world’s languages are documented only by basic grammatical sketches, making such pedagogical material a rare occurrence. Developing explicit grammatical competence is important for second-language learners, as relying solely on naturalistic learning approaches may not yield advanced language proficiency. Efforts to design systems that produce automated grammatical descriptions have been well-received by teachers. However, these systems are primarily designed for linguists and can only support languages with substantial corpora and existing documentation.
Our method and software, Digital Inferential Grammars for Endangered Languages (DIG4EL), enable the creation of didactic content with minimal resources, even for languages without a pre-existing digital corpus. This is achieved through a combination of:
• A robust theoretical framework based on the Radical Construction Grammar of William Croft.
• The guided collection of rich grammatical data using the Conversational Questionnaires method of Alexandre Francois.
• The wealth of descriptive data accumulated by linguists, beginning with the data contained in the World Atlas of Language Structures (WALS) and Grambank.
• Insights from neuroscience and second-language acquisition research, particularly concerning endangered languages.
• Dedicated Natural Language Processing algorithms combining Abstract Grammatical Representation, Bayesian Networks and Loopy Belief Propagation.
An initial focus on the description of canonical word order demonstrates that DIG4EL correctly infers 75% of unknown parameters related to word order across the 116 languages in WALS that provide sufficient parameters for testing. An experimental version of DIG4EL is already online and used with Oceanic languages by linguists and language teachers.
All algorithms, methods, and software code are made publicly available and adhere to the principles of FAIR data and the CARE principles for Indigenous data governance.