What is Cassandra?
Cassandra is one of the most favored NoSQL distributed database management systems by Apache. With open source technology, Cassandra is efficiently designed to store and manage large volumes of data without any failure. Highly scalable for Big Data models and originally designed by Facebook, Apache Cassandra is written in Java comprising flexible schemas.
What are the advantages of Cassandra?
Advantages of Cassandra:
- Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure.
- Cassandra delivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.
- It provides extensible scalability and can be easily scaled up and scaled down as per the requirements.
- It is fault tolerant and consistent.
- It is a column-oriented database.
- It has no single point of failure.
- There is no need for separate caching layer.
- It has flexible schema design.
- It has flexible data storage, easy data distribution, and fast writes.
- It supports ACID (Atomicity, Consistency, Isolation, and Durability) properties.
- It has multi-data center and cloud capable.
How Cassandra stores data?
Cassandra stores all data as bytes. When you specify validator, Cassandra ensures that those bytes are encoded as per requirement and then a comparator orders the column based on the ordering specific to the encoding.
What are the main components of Cassandra data models?
Following are the main components of Cassandra data model:
- Cluster
- Keyspace
- Column
- Column & Family
What are the main components of Cassandra Data Model?
The main components of Cassandra Data Model are
- Cluster
- Keyspace
- Column
- Column & Family
What are the other components of Cassandra?
Some other components of Cassandra are:
- Node
- Data Center
- Commit log
- Mem-table
- SSTable
- Bloom Filter
What is keyspace in Cassandra?
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster contains of one keyspace per node.
What is the syntax to create keyspace in Cassandra?
Syntax for creating keyspace in Cassandra is
CREATE KEYSPACE <identifier> WITH <properties>
What are the values stored in the Cassandra Column?
In Cassandra Column, basically there are three values
- Column Name
- Value
- Time Stamp
What is a column family in Cassandra?
In Cassandra, a collection of rows is referred as “column family”.
How does Cassandra perform write function?
Cassandra performs the write function by applying two commits:
- First commit is applied on disk and then second commit to an in-memory structure known as memtable.
- When the both commits are applied successfully, the write is achieved.
- Writes are written in the table structure as SSTable (sorted string table).
What is memtable?
Memtable is in-memory/write-back cache space containing content in key and column format. In memtable, data is sorted by key, and each Column Family has a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.
What are the management tools in Cassandra?
DataStaxOpsCenter: It is an internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter.
SPM: SPM primarily administers Cassandra metrics and various OS and JVM metrics. It also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms besides Cassandra.
What is Cassandra-Cqlsh?
Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things
- Define a schema
- Insert a data and
- Execute a query
What are the main features of SPM in Cassandra?
The main features of SPM are:
- Correlation of events and metrics
- Distributed transaction tracing
- Creating real-time graphs with zooming
- Detection and heartbeat alerting
What is cluster in Cassandra?
In Cassandra, the cluster is an outermost container for keyspaces that arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.
What are the differences between a node, a cluster, and datacenter in Cassandra?
Node: A node is a single machine running Cassandra.
Cluster: A cluster is a collection of nodes that contains similar types of data together.
Datacenter: A datacenter is a useful component when serving customers in different geographical areas. Different nodes of a cluster can be grouped into different data centers.
What is Cassandra-CQL collection?
Cassandra-CQL collection is used to store multiple values in single variable. Cassandra facilitates you to use CQL collections in following ways:
List: List is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements).
SET: SET is used for group of elements to store and returned in sorted orders (holds repeating elements).
MAP: MAP is a data type used to store a key-value pair of elements.
What is the use of Bloom Filter in Cassandra?
A bloom filter is a space efficient data structure that is used to find whether an SSTable has data for a particular row. In Cassandra a Bloom Filter is used to save IO when performing a KEY LOOKUP.
How does Cassandra delete data?
SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.
Explain what is SStable consist of?
SStable consist of mainly 2 files
- Index file (Bloom filter & Key offset pairs)
- Data file (Actual column data)
What is SuperColumn in Cassandra?
In Cassandra, SuperColumn is a unique element containing similar collection of data. They are actually key-value pairs with values as columns.
What is Replication Factor in Cassandra?
Replication Factor is the measure of number of data copies existing. It is important to increase the replication factor to log into the cluster.
What is Thrift?
Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.
What is difference between Column and Super Column?
Both elements work on the principle of tuple having name and value. However, the former ‘s value is a string while the value in latter is a Map of Columns with different data types.
Unlike Columns, Super Columns do not contain the third component of timestamp.
Â