Using Group By in Apache Cassandra

This tutorial is going to cover how to using Group By in Apache Cassandra. This feature is supported from Apache Cassandra 3.10.

1. Prerequisite

Apache Cassandra version 3.10 or newer (See how to install Cassandra on Ubuntu 16.04)

2. Using Group By in Apache Cassandra

2.1. Syntax

From Apache Cassandra 3.10, it is possible for us to group either at the partition level or at the clustering column level. The general syntax can be described as the following:

SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable 
                                             GROUP BY partitionKey, clustering0, clustering1;

SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;

SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable

GROUP BY partitionKey, clustering0, clustering1;

2.2. Group By Example

Firstly, let’s create a Cassandra keyspace for our example:

CREATE KEYSPACE smart_home
     WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

1 2	CREATE KEYSPACE smart_home WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Secondly, let’s see a table in the time series data model examples to store temperature data collected from a weather station.

CREATE TABLE temperature_by_day (
   weatherstation_id text,
   date text,
   event_time timestamp,
   temperature float,
   PRIMARY KEY ((weatherstation_id,date),event_time)
);

CREATE TABLE temperature_by_day (

weatherstation_id text,

date text,

event_time timestamp,

temperature float,

PRIMARY KEY ((weatherstation_id,date),event_time)

);

The partition key includes (weatherstation_id, date) and the clustering key is event_time. For this table, we want to store data in row per day, per weather station.

Next, let’s insert some rows into the table:

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234WXYZ','2016-04-03','2016-04-03 07:01:00',73);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234WXYZ','2016-04-03','2016-04-03 07:02:00',70);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234WXYZ','2016-04-04','2016-04-04 07:01:00',73);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234WXYZ','2016-04-04','2016-04-04 07:02:00',74);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES ('1234WXYZ','2016-04-03','2016-04-03 07:01:00',73);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES ('1234WXYZ','2016-04-03','2016-04-03 07:02:00',70);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES ('1234WXYZ','2016-04-04','2016-04-04 07:01:00',73);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES ('1234WXYZ','2016-04-04','2016-04-04 07:02:00',74);

Lastly, let’s practice using Group By in Apache Cassandra by executing a query such as:

SELECT weatherstation_id, date, MAX(temperature) FROM temperature_by_day 
                                 GROUP BY weatherstation_id, date;

1 2	SELECT weatherstation_id, date, MAX(temperature) FROM temperature_by_day GROUP BY weatherstation_id, date;

The output is as following:

Group By in Apache Cassandra (Partition Key)

In the above query we have used group by with the partition key (weatherstation_id, date), we can use the group by in Cassandra with both partition key and clustering key. Let’s execute the following command on the CQLSH:

SELECT weatherstation_id, date, MAX(temperature) FROM temperature_by_day 
                               GROUP BY weatherstation_id, date, event_time;

1 2	SELECT weatherstation_id, date, MAX(temperature) FROM temperature_by_day GROUP BY weatherstation_id, date, event_time;

The output is as below:

Group By in Apache Cassandra (Clustering Key)

3. Conclusion

The tutorial has just illustrated about using Group By in Apache Cassandra. Currently, Apache Cassandra 3.10 only supports group by with Partition Key or Partition Key and Clustering Key.

3 2 votes

Article Rating

Using Group By in Apache Cassandra

1. Prerequisite

2. Using Group By in Apache Cassandra

2.1. Syntax

2.2. Group By Example

3. Conclusion

howtoprogram