This tutorial is going to cover how to using Group By in Apache Cassandra. This feature is supported from Apache Cassandra 3.10.
1. Prerequisite
Apache Cassandra version 3.10 or newer (See how to install Cassandra on Ubuntu 16.04)
2. Using Group By in Apache Cassandra
2.1. Syntax
From Apache Cassandra 3.10, it is possible for us to group either at the partition level or at the clustering column level. The general syntax can be described as the following:
1 2 3 |
SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey, clustering0, clustering1; |
2.2. Group By Example
Firstly, let’s create a Cassandra keyspace for our example:
1 2 |
CREATE KEYSPACE smart_home WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; |
Secondly, let’s see a table in the time series data model examples to store temperature data collected from a weather station.
1 2 3 4 5 6 7 |
CREATE TABLE temperature_by_day ( weatherstation_id text, date text, event_time timestamp, temperature float, PRIMARY KEY ((weatherstation_id,date),event_time) ); |
The partition key includes (weatherstation_id, date) and the clustering key is event_time. For this table, we want to store data in row per day, per weather station.
Next, let’s insert some rows into the table:
1 2 3 4 5 6 7 8 9 10 11 |
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234WXYZ','2016-04-03','2016-04-03 07:01:00',73); INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234WXYZ','2016-04-03','2016-04-03 07:02:00',70); INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234WXYZ','2016-04-04','2016-04-04 07:01:00',73); INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234WXYZ','2016-04-04','2016-04-04 07:02:00',74); |
Lastly, let’s practice using Group By in Apache Cassandra by executing a query such as:
1 2 |
SELECT weatherstation_id, date, MAX(temperature) FROM temperature_by_day GROUP BY weatherstation_id, date; |
The output is as following:
In the above query we have used group by with the partition key (weatherstation_id, date), we can use the group by in Cassandra with both partition key and clustering key. Let’s execute the following command on the CQLSH:
1 2 |
SELECT weatherstation_id, date, MAX(temperature) FROM temperature_by_day GROUP BY weatherstation_id, date, event_time; |
The output is as below:
3. Conclusion
The tutorial has just illustrated about using Group By in Apache Cassandra. Currently, Apache Cassandra 3.10 only supports group by with Partition Key or Partition Key and Clustering Key.