DSpace Repository

A Corpus of Modern Spoken Māori

Show simple item record

dc.contributor.author Boyce, Mary Teresa
dc.date.accessioned 2008-07-30T02:22:48Z
dc.date.accessioned 2022-10-26T00:54:28Z
dc.date.available 2008-07-30T02:22:48Z
dc.date.available 2022-10-26T00:54:28Z
dc.date.copyright 2006
dc.date.issued 2006
dc.identifier.uri https://ir.wgtn.ac.nz/handle/123456789/23844
dc.description.abstract The Māori Broadcast Corpus (MBC) is a representative corpus of contemporary spoken Māori. The corpus was designed and compiled, then used to identify and describe various aspects of the lexicon of modern spoken Māori. The corpus contains approximately one million words of running text across several text categories, selected and transcribed from Māori-medium broadcasts in 1995 and 1996. The broadcast sources were Te Reo Irirangi o te Upoko-o-te-ika, Radio New Zealand, and Television New Zealand. The corpus files with accompanying explanatory and descriptive information and word lists are available on the compact disk, which accompanies this document. Initial analysis of the corpus identified 10,289 different word types in the 1,005,364 tokens, or running words of text. The particular focus of the analysis was on high frequency vocabulary, and on patterns of distribution. A small number of high frequency words provide most coverage of texts: 165 word types make up approximately 80% of all the words in the texts in the corpus; 200 word types give 82.4% coverage, 2000 give 97.62% coverage. This has implications for learners of Māori. Knowing the most frequent word types, their meanings and their uses, is crucial to the comprehension of Māori broadcast texts but also of other texts. The analysis extended beyond the identification of word types, and explored word sense and word sense distribution in selected high frequency word types, using concordance data from the MBC. This analysis revealed that, in those instances where word types could be used as both function words and content words, the function word uses were far more frequent. There was a degree of polysemy in the word types examined, with some meanings far more frequent than others. Word senses were identified that have yet to be recorded in dictionaries. The analysis showed the potential of the MBC for adding to what is already known about the lexicon of Māori by providing frequency and other distributional information together with new words senses, currently absent in the available dictionaries and grammars of Māori. Implications of the MBC for the learning and teaching of Māori were discussed, and some applications to language learning and teaching were outlined. Future corpus-based research was suggested. en_NZ
dc.language en_NZ
dc.language.iso en_NZ
dc.publisher Te Herenga Waka—Victoria University of Wellington en_NZ
dc.subject Māori language en_NZ
dc.subject Spoken Māori en_NZ
dc.subject New Zealand en_NZ
dc.subject Discourse analysis en_NZ
dc.subject Reo Māori mi_NZ
dc.subject Tātari whakatakotoranga reo mi_NZ
dc.title A Corpus of Modern Spoken Māori en_NZ
dc.type Text en_NZ
vuwschema.type.vuw Awarded Doctoral Thesis en_NZ
thesis.degree.discipline Applied Linguistics en_NZ
thesis.degree.grantor Te Herenga Waka—Victoria University of Wellington en_NZ
thesis.degree.level Doctoral en_NZ
thesis.degree.name Doctor of Philosophy en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account