Cassandra at scale and beyond

It was quite warm and comfy, inside the belly of the beast

RDBMS operations are tricky on both small and large scales, especially long term. So, let’s clear things up a bit for column-oriented databases and respective datasets.

Is it any good ?

Well, yes. While there are simply no real long-term reliability guarantees, Cassandra actually can be pretty useful for some purposes. Let’s clear some common biases and focus on the good stuff for now.

Good stuff

Actor persistence

Cassandra, as a Column store, is great for actor models persistence. It can be a huge game-changer when used with Akka Projections and event-sourced behavior Cluster Sharding. In this case, Cassandra can be a journal store for incoming events and pending Actor Behavior Snapshots.


Cassandra schema development is a bit tricky. It requires getting all the application queries that will be performed beforehand, to decide on how the Clustering Key columns will be actually sorted.

Chebotko diagram example


AWS Keyspaces is a managed Cassandra service with quite good enough performance and low operational requirements. It’s a decent choice when you want to try it but don’t want to get into all the bugs, lags, corruption, sleepless nights, and other nasty things. Keyspaces is not a silver bullet — there is a lot of limitations, but without Compaction, Repairs, and Backup hassle. AWS Keyspaces service has a pretty decent point in time recovery.

Plan B

Well, and there’s always ScyllaDB, which is an exceptional replacement for Cassandra with a much more well-developed life-cycle — there’s even a decent k8s operator. So, after some struggle, you also might find out that Scylla covers most of the Cassandra design flaws and plain bugs. And there’s a lot of bugs present.

Not great stuff


First and foremost, “schema-less” doesn’t mean that changing the schema will be free. Additional DB accessor abstraction can help to mitigate added operation costs during migrations.

Flaky Eventual consistency

You’ll need at least 5 replicas of your production data to make it actually available for more or less decent operation. Thus when we’re talking about Time-series stuff or just plain old analytical data — ask yourself: can your company or clients really afford having 5 copies of it ?

Data kinds

Understanding what kinds of data are being stored and for what purpose they’re being indexed is essential — all your data has some form of deprecation, and managing data deprecation is crucial for ensuring database performance and proper resource utilization.


There are two major machanism of the data backup in both Cassandra and ScyllaDB — incremental backups and Snapshots. But there’s not much difference between them: incremental backups are taken between table flushes and can be deleted securely after the actual flush, while snapshot is a full copy of the table. Usually snapshots are being created before compaction, because of possible compaction failure, which essentially doubling the disk space requirements. Using both snapshots and incremental backups helps to reduce the disk space requirements, but most of the time they’re simply ignored and deleted if there were no major issues for the last ten days or so (it’s the time hinted handoffs are not being discarded).


Well, there are few approaches for Cassandra streaming, but none actually reliable enough
• trigger based streaming — you have no control over internal ThreadPools and trigger impl may cause performance loss
• Change Data Capture (CDC) streaming — in simple words, you can stream a copy of all your flushed commit log entries, but there’s an added hassle of merging and compacting commit logs manually. If consumer is too slow — Cassandra refuses to accept new transactions.
• Incremental backups streaming — it could work just great, except it streams not quite up to date and there’s a potential data loss. That’s the approach Netflix actually have been using for quite a while.


As you might already know, Cassandra uses append-only type of storage data structure called Log Structure Merge Tree (LSM-tree) and a Disk Commit Log.
Thus all transaction are being written onto the disk first, before being stored and processed in-mem. The downside to this is that Commit Log should be stored at the fastest IOPS-wise drive available, even better if it’s managed IOPS devices. The other caveat is that LSM-tree is an append only tree, so it needs to be compacted after getting too many deletes stored inside of it.


The rule of thumb: “In case of doubt and fear of consistency loss — run full repairs”. Repairs are fairly frequent and can be compared to “Full Cluster Scans” if we’re talking in relational DB performance terms. Even though there’s an Incremental repairs mechanism present, running full cluster repairs node by node, after restoring from a Snapshot or just bootstrapping a replacement node, is a MUST. Cassandra may lose it’s consistency by itself due to multiple, relatively unpredictable factors. If a full repair had failed — it’s more convinient to recover the failed node from the snapshots, than run repairs once again due to excessive cluster gossiping and network overhead.

Too sum it up

This all essentially means that
1) You have to keep 50% free disk space, always
2) You have to keep 50% of your available IOPS, always
3) You have to be able to double the networking bandwidth on demand
4) You have to be prepared that during repairs at least one node will become inaccessible and you may lose Handoffs, which would require an additional repairs run
5) You have to be prepared to Allocate a Mirror cluster from scratch in case of major schema migrations or Hadoop integration


Surely, you can “just get things done” with Cassandra and it will work perfectly fine, if you’ll control the growth of your Cluster and fine-tune most of the stock parameters.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store