下载此文档

语料库语言学.ppt


文档分类:高等教育 | 页数:约37页 举报非法文档有奖
1/37
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/37 下载此文档
文档列表 文档介绍
Corpus Linguistics
语料库语言学
Presented by:
Song Chao
Wang Zeyu
Li Zhanyu
Outline
Chapter I: Introduction
Chapter II: Analyzing Corpus Data
Chapter III: Current Issues in Corpus Linguistics
Chapter I: Introduction
What is corpus?
Formal: a large number of articles, books, magazines, etc. that have been deliberately collected together for some purpose(为某一目的而收集在一起的)大批资料(如文章、书记、杂志等);文集;全集
Technical: a large collection of written or spoken language ,that is used for studying the ,语料汇编
What is corpus linguistics?
Corpus linguistics :the study of machine-readable spoken and written language samples that have been assembled in a principled way for the purpose of linguistics research. It is concerned with language use in real contexts.
语料库语言学主要研究机器可读自然语言文本的采集、存储、检索、统计、语法标注、句法语义分析。
Types of Corpora
Specialised corpus(专业语料库): texts that belong to a particular type eg: academic prose
General corpus(通用语料库):different types of texts assembled with the aim to serve as reference resources for linguistic research or to produce reference materials such as dictionaries.
Historical corpora(历史语料库): texts from different periods of time, allow for the study of language change pared with corpora from other periods.
Monitor corpora(监控语料库):focus on current changes in the language.
Parallel corpora(平行语料库):texts in at least two languages that have either been directly translated, or produced in different languages for the same purpose.
Learner corpora(学习者语料库):texts produced by learners of a language.
History of corpus design
A distinction made:
One:1950s-1970s
Two:1980s~
1950s-1970s:1)London-Lund of Corpus of Spoken English (LLC)
2)Brown Corpus based on American written English
3)Lancaster-Oslo/Bergen Corpus based on written British English
1980s~:
1)Collins and Birmingham University International Language Database (COBUILD)← Bank of English
2)British National Corpus (ps: COBUILD and BNC are two major corpora)Many publishing houses developed their own corpora:

语料库语言学 来自淘豆网m.daumloan.com转载请标明出处.

相关文档 更多>>
非法内容举报中心
文档信息
  • 页数37
  • 收藏数0 收藏
  • 顶次数0
  • 上传人分享精品
  • 文件大小2.44 MB
  • 时间2018-01-05