下载此文档

2_清华云计算课件--MapReduce原理和应用.ppt


文档分类:IT计算机 | 页数:约36页 举报非法文档有奖
1/36
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/36 下载此文档
文档列表 文档介绍
Mass Data Processing Technology on Large Scale Clusters Summer, 2007, Tsinghua University All course material (slides, labs, etc) is licensed under the Creative Commons Attribution License . Many thanks to Aaron Kimball & Sierra Michels-Slettvet for their original version 12 Some Slides from : Jeff Dean, Sanjay Ghemawat http://labs./papers/ Motivation 3 ? 200+ processors ? 200+ terabyte database ? 10 10 total clock cycles ? second response time ?5¢ average advertising revenue From: /~bryant/presentations/DISC- Motivation: Large Scale Data Processing ? Want to process lots of data ( > 1 TB) ? Want to parallelize across hundreds/thousands of CPUs ?… Want to make this easy 4 "Google Earth uses TB : 70 TB for the raw imagery and 500 GB for the index data." From: http://googlesystem./2006/09/how- much-data-does-google- MapReduce ? Automatic parallelization & distribution ? Fault-tolerant ? Provides status and monitoring tools ? Clean abstraction for programmers 5 Programming Model ? Borrows from functional programming ? Users implement interface of two functions: ? map (in_key, in_value) -> (out_key, intermediate_value) list ? reduce (out_key, intermediate_value list) -> out_value list 6 map ? Records from the data source (lines out of files, rows of a database, etc) are fed into the map function as key * value pairs: ., (filename, line). ? map() produces one or more intermediate values along with an output key from the input. 7 reduce ? After the map phase is over, all the intermediate values for a given output key bined together into a list ? reduce() combines those intermediate values into one or more final values for that same output key ?(in practice, usually only one final value per key) 8 Architecture 9 Parallelism ? map() functions run in parallel, creating different intermediate values from different input data sets ? reduce() functions also run in parallel, each working on a diff

2_清华云计算课件--MapReduce原理和应用 来自淘豆网m.daumloan.com转载请标明出处.

相关文档 更多>>
非法内容举报中心
文档信息
  • 页数36
  • 收藏数0 收藏
  • 顶次数0
  • 上传人2982835315
  • 文件大小0 KB
  • 时间2016-06-08
最近更新