Apache Cassandra Tutorial

Apache Cassandra is a key/value type NoSQL (A NoSQL database is a database system which provides storage and retrieval of data that uses looser consistency models .When comparing with traditional relational databases , NoSQL is more suitable for storing large volume of data .Cassandra database was developed by facebook. Now the development of Cassandra is under Apache Foundation.As of now , Cassandra is used by many enterprises around the globe. More and more customers are accepting Cassandra for their enterprise application back end.This chapter gives a basic Apache Cassandra Tutorial

Apache Cassandra vs Traditional Databases

A brief comparison is shown below.

1)Big data is the data which exceeds the capacity of traditional relational databases. Cassandra is the suitable database system for Big data applications.

2)Read/Write performance is higher for Cassandra

3)Chances of failure is very very minimal in Cassandra when comparing with relational databases.This is attained by replication of same data in multiple data centers in a cluster.

4)In case of Cassandra ,every node in a cluster has the same role. There is no master/slave mechanism in a cluster.So every node can service each request.

5)Cassandra has Key/value type architecture.It is entirely different from architecture of relational databases.

Apache Cassandra Architecture

Apache Cassandra violates the legacy master/slave architecture. In case of Cassandra each node in a cluster has the same responsibility. A cluster can have number of nodes in it. Each piece of data in a node is replicated in some other node. So chance for a failure is very minimal.

Cassandra Data Model

Column

A column is the smallest increment of data in Apache Cassandra. It has a column name , value and a time stamp.

Column Family

A Column family is a unit in which data is stored in Cassandra database. Each column family has a unique id known as row key.There will be multiple columns against a row key in a column family.Each column has its own column name and value.

Super Column

A super column has a key and a number of sub columns as values.There can be any number of sub columns in a super column.The columns will be in the sorted order of column names.

Super Column family

A super column family is almost similar to a column family.The difference is instead of columns here it is super columns.There can be any number of super columns in a super column family.Each super column can accommodate any number of columns.The super columns are arranged in sorted order of super column names.

keyspace

It is the container for our data.It is similar to schema in Relational Databases.

Write Process in Cassandra

1)When data is writing to Cassandra , first the data is going to a commit log.

2)Sending data to responsible nodes . Corresponding node is writing the data to memtable.

3)If the responsible node is down at the time when write attempt is happening, then data will be simply written to another node .It holds the data.Once the intended node comes up , then it is updating from the node which holds the data temporarily.

4)From the memtable data will be written to sstable.

Read Process in Cassandra

Reading of data is taking place in parallel across all nodes in a cluster. If the node with requested data is down then the data will be read from the node which is holding replica of the data required.

Cassandra Query Language(CQL)

The CQL is now widely used in cassandra client applications. It is simply providing an SQL like alternative to query the Cassandra database. The important keywords in CQL are :

SELECT,UPDATE ,INSERT ,TRUNCATE ,DROP etc.

Also there are few special statements in CQL. They are :

1)CREATE KEYSPACE

2)CRETE COLUMNFAMILY

3)CREATE INDEX

We will be using these things in our coming discussions

Data Types in Apache Cassandra

Internally Cassandra stores column names and values as hex byte array.When we create schema , we can provide the data type too.But it is not required.Cassandra treats everything as hex byte array.The data type of row key is called as validator and of column name is comparator.

In the coming sections we will be discussing various operations on Apache Cassandra with suitable examples.

See Related Topics

Configuring Apache Cassandra in local machine

Inserting data into Apache Cassandra using Java

Reading data from Apache Cassandra database using Java

Listing columns in a column family using Java

Inserting objects into Apache Cassandra using Hector API

Reading object data from Apache Cassandra using Hector API

Deleting a column from Apache Cassandra using Hector API

Apache Cassandra

6 thoughts on “Apache Cassandra Tutorial”

dakshina January 22, 2014 Reply

Excellent tutorial. Can you talk about how to set up a cluster?
- dakshina January 22, 2014 Reply
  
  Found some useful info at
  
  http://www.datastax.com/2012/01/how-to-set-up-and-monitor-a-multi-node-cassandra-cluster-on-linux
  
  It uses python and linux boxes… prefer to see Java based setup/monitoring
jkrishna February 12, 2015 Reply

Excellent tutorial.
Neha February 17, 2015 Reply

Could you please elaborate the reading and writing process and include the information about shards,racks, routing etc.
Carlo April 20, 2015 Reply

There are also a lof of interesting tutorials and training videos on Parleys.com
- Bijoy Post authorMay 15, 2015 Reply
  
  Hello Carlo.It looks very good.Thank you for sharing the link.

Simplest Java/J2EE, JavaScript Tutorials