Databases are software, but they depend on hardware. Although the database itself is software code that has been developed and deployed to classify, corral, control and subsequently manage various types of data, the foundation of any database is hardware. This reality is (obviously, of course) because databases generally sit on servers, that are basically just pieces of hardware.
A database server sits in a box in an organization’s own on-premises data room or datacenter. Alternatively, it sits in on a ‘blade’ in a datacenter’s larger-scale installation that customers tap into for on-demand cloud computing services. Optionally, the database straddles the on-premises and public cloud datacenter world, in a hybrid combination of the two.
The point is, whatever form, shape and type of database we chose to use, the software that drives it fundamentally depends upon a chunk of hardware to exist, wherever that may be.
How hardware waste happens
If we accept all the above home truths to be so, then we might also assume that database software should also be intelligent enough to know what data to put where, when and why, right? Yes, obviously, database software is smart, that’s why there’s so much of it. But with so many tasks to perform… and with so much data to serve… and with so many different ways of building database query applications, not every database tidies its shoe cupboard (i.e. stores its data) and makes use of the hardware that it runs on as efficiently as others. Databases written in high-level languages (like Java) talk to the machine through a middle-man, so they necessarily leave a certain amount of server performance on the table – and that is arguably ‘wasted’ in the context of this discussion.
Israel-originated NoSQL database company ScyllaDB thinks that it has a way to build databases that can be tidier, faster, better managed and self-optimizing. The company advocates a close-to-the-metal approach for its Scylla database, but what does this mean in simple terms?
In the world of databases, close-to-the-metal refers to database software that has an intimate knowledge of the hardware it runs on (the hardware RAM addresses and wider ‘instruction set’, if you want to get technical). The ‘intimacy’ of close-to-the-metal means that the database can squeeze more power out of the server hardware that it runs on. The trade-off is less flexibility and a degree of ‘lock-in’ to the hardware in question, but this precision engineering does deliver greater speed. So how much speed?
Millions and billions
ScyllaDB CEO of Dor Laor explains that the Scylla database can perform millions of Operations Per Second (OPS) on a single node (a node can mean a lot of things in computing, but in this case it’s a server). The company cites independent tests which show a cluster of Scylla database servers reading 1-billion Rows Per Second (RPS). The sum total here is more speed and more power from each server. This technology proposition won’t be the most efficient path for every use case — a smaller data set can use a traditional relational database — but for big data database workloads with huge numbers of data points, Scylla fits.
“For 99.9% of applications, Scylla delivers all the power a customer will ever need, on workloads that other NoSQL databases can’t touch – and at a fraction of the cost of an in-memory solution.”
CEO Laor asserts that Scylla is a good choice for high-throughput software applications (i.e. ones that channel a lot of data activity) and for database scenarios where the Service Level Agreement (SLA) dictates that it has to perform on a low-latency basis (i.e. with very little delay). Scylla is also good with high-density nodes i.e. servers that are tightly packed with a huge amount of data.
Node sprawl is when you’re using database instances in several places to segment and allocate different information workloads in different areas. Higher-density nodes means that there is lots of data in one place, which is convenient, but a bigger chunk to chew at any given moment in time.
The company is now looking to provide what it calls ‘high-density support’, meaning older parts of data that are fragmented across different temporary areas of storage are moved to long term storage nodes where they can reside more comfortably, more accurately… and for longer. This means being able to buy less overall short-term storage.
Comcast tunes into Scylla
Phil Zimich, Comcast senior director of software development and engineering explains his company’s move from the Cassandra database to Scylla. The company uses its X1 Platform to drive firmware to devices to upgrade them for the next television and voice services that it wants to deliver. Comcast manages 31 million devices in 15 million households, all managed at an individual account level. There are 21 different web services in the Comcast X1 Scheduler that deliver recordings to users when they want them played. Users also need to be able to cancel or change recordings and get reminders… this all takes 25 million account calculations per day, so this is the sort of use case that Scylla was created for. Comcast has, as result of its migration to Scylla gone from 962 nodes to 78 today.
For the Scylla database to be able to do what it does and ‘waste’ less hardware it takes maximum advantage of all the processing (CPU) power and RAM available to it in the given computing environment. One additional techniques here is the use of what is known as incremental compaction. This is the process by which data is updated and deleted so that an organization can take advantage of the most efficient method of storage. Different compaction processes run on different compaction strategies. But ultimately, an inefficient compaction process might see the database reserving as much as 50% of its space for this process, rather than using it more efficiently and dedicating it to actual storage.
“It’s also important to remember that the more data you write to a database, the more ‘data debt’ you create in terms of the data that has to go into the queue for analytics and/or ultimate storage. So if this scenario leads to too much data sitting in the queue, then the database itself suffers a loss of performance. We know that different workloads will exist for the database, so the database has to be able to prioritize processing of more mission critical real-time processes over other less critical ones,” said Scylla CEO Laor.
So have we really been ‘wasteful’ in terms of our software’s use of hardware in the past? In some cases yes… and these will now show themselves more prevalently in data consumption cases where the need to elevate for scale (sometimes massive scale) comes to the fore. Think about your smart home heating, air-conditioning or security system with its sensors and in-home cameras. These types of apps typically offer one day of video playback snapshots for free, but users have to pay for 30-days or more. When users start to sign up for these services in huge numbers, then the real scale up challenge occurs.
“Servers will fail. So if your software system is distributed over 365 server nodes, then let’s say that there is a failure potential of one node per day. If your software system is distributed over 10 nodes in a massively more dense environment, then the mean time between failure of each server is logically less, because the server estate is smaller. Failures do still happen, but with modern software tools to assist in backup and recovery, the whole process can be managed in a more tightly controlled environment,” said Laor.
In the past, we didn’t always think about working to optimise systems because another processor would come along and deliver such a large incremental increase in processing power that it wasn’t worth the human developer time. Now, with Moore’s Law on its death bed, it’s more important to think about creating more efficient, less wasteful software architectures.
Wake-up call for ALL-efficiencies?
Could this drive be coming at the right time? After all, Generation Z is ‘woke’ enough to make sure that we address climate change and global waste in ways that we might not have considered a decade or so ago. Perhaps it’s only appropriate that we also think about software’s hardware consumption habit and make that more eco-friendly?
Even if it doesn’t quite cut down the carbon footprint in the same way as fewer airline flights and recycling plastic bags, more efficient use of any resource is ultimately still a good thing, surely.