A highly scalable, eventually consistent, distributed, structured key-value store. outline ? Enter Cassandra ? Related concepts & techniques ? General architecture ? Highlights ? Limitations ? Data model ? Storage model ? Partitioning & repication ? Cluster membership ? Accrual failure detector ? API ? Hadoop support ? Performance benchmark ? configuration ? References and links 2 /63 history <<enter Cassandra ? Created at facebook Avinash Lakshman ( committer to Dynamo) Prashant Malik ? Open-sourced in 2008 ? Apache incubator in early 2009 ? Graduation in March 2010 ? Apache license ? Current release: (-beta1) 3 /63 4 /63 The largest production cluster has over 100 TB of data in over 150 machines. user <<enter Cassandra ? Consistency – all nodes see the same data at the same time ? Availability – node failures do not prevent survivors from continuing to operate ? Partition Tolerance – the system continues to operate despite arbitrary message loss Can Only Choose Two From Above Three – Eric Brewer 2000 PODC 5 /63 CAP (1/3) <<related concepts & techniques 6 /50 CAP (2/3) <<related concepts & techniques 7 /63 CAP (3/3) <<related concepts & techniques ? Why AP? – CA-corruption possible if live nodes can ’ municate – pletely essible if any nodes are dead – AP-always available, but may not always read most recent Cassandra trade-off strong C in favor of high A choose AP but allows them to be tunable to have more C. ? eventual consistency (weak consistency) – when no updates occur for a long period of time, eventually all updates will propagate through the system and all the replicas will be consistent 8 /63 consistency level ( 1/2 ) <<related concepts & techniques ? write Level Behavior ZERO Ensure nothing. A write happens asynchronously in background. Until CASSANDRA-685 is fixed: If too many of these queue up, buffers will explode and bad things will happen. ANY (Requires ) Ensure that the write has been writ