GFS: The Google File System
Brad Karp
UCL Computer Science
CS Z03 / 4030
30th October, 2006
1
Motivating Applicay)
Access control information (per-file)
Mapping from files to chunks
Current locations of chunks (chunkservers)
Manages chunk leases to chunkservers
Garbage collects orphaned chunks
Migrates chunks between chunkservers
Holds all metadata in RAM; very fast operations on file system metadata
8
Chunkserver
Stores 64 MB file chunks on local disk using standard Linux filesystem, each with version number and checksum
Read/write requests specify chunk handle and byte range
Chunks replicated on configurable number of chunkservers (default: 3)
No caching of file data (beyond standard Linux buffer cache)
9
Client
Issues control (metadata) requests to master server
Issues data requests directly to chunkservers
Caches metadata
Does no caching of data
No consistency difficulties among clients
Streaming reads (read once) and append writes (write once) don’t benefit much from caching at client
10
Client API
Is GFS a filesystem in traditional sense?
Implemented in kernel, under vnode layer?
Mimics UNIX semantics?
No; a library apps can link in for storage access
API:
open, delete, read, write (as expected)
snapshot: quickly create copy of file
append: at least once, possibly with gaps and/or inconsistencies among clients
11
Client Read
Client sends master:
read(file name, chunk index)
Master’s reply:
chunk ID, chunk version number, locations of replicas
Client sends “closest” chunkserver w/replica:
read(chunk ID, byte range)
“Closest” determined by IP address on simple rack-based network topology
Chunkserver replies with data
12
Client Write
Some chunkserver is primary for each chunk
Master grants lease to primary (typically for 60 sec.)
Leases renewed using periodic heartbeat messages between master and chunkservers
Client asks server for primary and secondary replicas for each chunk
Client sends data to replicas in daisy chain
Pipeli
《google文件系统》课件 来自淘豆网m.daumloan.com转载请标明出处.