This tutorial is going to cover how to using Group By in Apache Cassandra. This feature is supported from Apache Cassandra 3.10.

1. Prerequisite

Apache Cassandra version 3.10 or newer (See how to install Cassandra  on Ubuntu 16.04)

2. Using Group By in Apache Cassandra

2.1. Syntax

From Apache Cassandra 3.10, it is possible for us to group either at the partition level or at the clustering column level. The general syntax can be described as the following:

2.2. Group By Example

Firstly, let’s create a Cassandra keyspace for our example:

Secondly, let’s see a table in the time series data model examples to store temperature data collected from a weather station.

The partition key includes (weatherstation_id, date) and the clustering key is event_time. For this table, we want to store data in row per day, per weather station.

Next, let’s insert some rows into the table:

Lastly, let’s practice using Group By in Apache Cassandra by executing a query such as:

The output is as following:

Group By in Apache Cassandra

Group By in Apache Cassandra (Partition Key)

In the above query we have used group by with the partition key (weatherstation_id, date), we can use the group by in Cassandra with both partition key and clustering key. Let’s execute the following command on the CQLSH:

The output is as below:

Group By in Apache Cassandra (Clustering Key)

Group By in Apache Cassandra (Clustering Key)

3. Conclusion

The tutorial has just illustrated about using Group By in Apache Cassandra. Currently, Apache Cassandra 3.10 only supports group by with Partition Key or Partition Key and Clustering Key.

 

 

3 2 votes
Article Rating