Apache Cassandra Tutorial

Apache Cassandra is a key/value type NoSQL (A NoSQL database is a database system which provides storage and retrieval of data  that uses looser consistency models .When comparing with traditional relational databases , NoSQL is more suitable for storing large volume of data .Cassandra database was  developed by facebook. Now the development  of Cassandra is under Apache Foundation.As of now , Cassandra is used by many enterprises around the globe. More and more customers are accepting Cassandra for their enterprise  application  back end.This chapter gives a basic Apache Cassandra Tutorial

Apache Cassandra vs Traditional Databases

A brief comparison is shown below.

1)Big data is the data which exceeds the capacity of traditional relational databases. Cassandra is the suitable database system for Big data applications.

2)Read/Write performance is higher for Cassandra

3)Chances of failure is very very minimal in Cassandra when comparing with relational databases.This is attained by replication of same data in multiple data centers in a cluster.

4)In case of Cassandra ,every node  in a cluster has the same role. There is no master/slave mechanism in a cluster.So every node can service each request.

5)Cassandra has  Key/value type  architecture.It is entirely different from architecture of relational databases.

Apache Cassandra Architecture

Apache Cassandra violates the legacy master/slave architecture. In case of Cassandra  each node in a cluster has the same responsibility. A cluster can have  number of nodes in it. Each piece of  data in a node is replicated in some other node. So chance for a failure is very minimal.

Cassandra Data Model

Column

A column is the smallest increment of data in Apache Cassandra. It has a column name , value and a time stamp.

column

Column Family

column family

A Column family is a unit in which data is stored in Cassandra  database. Each column family has a unique id known as row key.There will be multiple columns against a row key in a column family.Each column has its own column name and value.

Super Column

A super column has a key and a number of sub columns as values.There can be any number of sub columns in a super column.The columns will be in the sorted order of column names.

supcolumn3

Super Column family

A super column family is almost similar to a column family.The difference is instead of columns here it is super columns.There can be any number of super columns in a super column family.Each super column can accommodate any number of columns.The super columns are arranged in sorted order of super column  names.

keyspace

It is the container for our data.It is similar to schema in Relational Databases.

Write Process in Cassandra

1)When data is writing to Cassandra , first the data is going to a  commit log.

2)Sending data to responsible nodes . Corresponding node is writing the data to memtable.

3)If the responsible node is down at the time when write attempt is happening, then data will be simply written to another node .It holds the data.Once the intended node comes up , then it is updating from the node which holds  the data temporarily.

4)From the memtable  data will be written to sstable.

Read Process in  Cassandra

Reading of data is taking place in parallel across all nodes in a cluster. If the node with requested data is down then the data will be read from the node  which is holding replica of the data required.

Cassandra Query Language(CQL)

The CQL is now widely used in cassandra client applications. It is simply providing an SQL like alternative to query the Cassandra database. The important keywords in CQL are :

SELECT,UPDATE ,INSERT ,TRUNCATE ,DROP etc.

Also there are few special statements in CQL. They are :

1)CREATE KEYSPACE

2)CRETE COLUMNFAMILY

3)CREATE INDEX

We will be using these things in our coming discussions

Data Types in Apache Cassandra

Internally Cassandra stores  column names and values as hex byte array.When we create schema , we can provide the data type too.But it is not required.Cassandra  treats everything as  hex byte array.The data type of row key is called as validator and of column name is comparator.

In the coming sections we will be discussing various  operations on Apache Cassandra with suitable examples.

See Related Topics

Configuring Apache Cassandra in local machine

Inserting data into Apache Cassandra using Java

Reading data from Apache Cassandra database using Java

Listing columns in a column family using Java

Inserting objects into Apache Cassandra using Hector API

Reading object data from Apache Cassandra using Hector API

Deleting a column from Apache Cassandra using Hector API

6 thoughts on “Apache Cassandra Tutorial

  1. Neha Reply

    Could you please elaborate the reading and writing process and include the information about shards,racks, routing etc.

  2. Carlo Reply

    There are also a lof of interesting tutorials and training videos on Parleys.com

    • Bijoy Post authorReply

      Hello Carlo.It looks very good.Thank you for sharing the link.

Leave a Reply

Your email address will not be published. Required fields are marked *