Corpus Linguistics
语料库语言学
Presented by:
Song Chao
Wang Zeyu
Li Zhanyu
Outline
Chapter I: Introduction
Chapter II: Analyzing Corpus Data
Chapter III: Current Issues in Corpus Linguistics
Chapter I: Introduction
What is corpus?
Formal: a large number of articles, books, magazines, etc. that have been deliberately collected together for some purpose(为某一目的而收集在一起的)大批资料(如文章、书记、杂志等);文集;全集
Technical: a large collection of written or spoken language ,that is used for studying the ,语料汇编
What is corpus linguistics?
Corpus linguistics :the study of machine-readable spoken and written language samples that have been assembled in a principled way for the purpose of linguistics research. It is concerned with language use in real contexts.
语料库语言学主要研究机器可读自然语言文本的采集、存储、检索、统计、语法标注、句法语义分析。
Types of Corpora
Specialised corpus(专业语料库): texts that belong to a particular type eg: academic prose
General corpus(通用语料库):different types of texts assembled with the aim to serve as reference resources for linguistic research or to produce reference materials such as dictionaries.
Historical corpora(历史语料库): texts from different periods of time, allow for the study of language change pared with corpora from other periods.
Monitor corpora(监控语料库):focus on current changes in the language.
Parallel corpora(平行语料库):texts in at least two languages that have either been directly translated, or produced in different languages for the same purpose.
Learner corpora(学习者语料库):texts produced by learners of a language.
History of corpus design
A distinction made:
One:1950s-1970s
Two:1980s~
1950s-1970s:1)London-Lund of Corpus of Spoken English (LLC)
2)Brown Corpus based on American written English
3)Lancaster-Oslo/Bergen Corpus based on written British English
1980s~:
1)Collins and Birmingham University International Language Database (COBUILD)← Bank of English
2)British National Corpus (ps: COBUILD and BNC are two major corpora)Many publishing houses developed their own corpora:
语料库语言学 来自淘豆网m.daumloan.com转载请标明出处.