AWS has announced the general availability of Amazon Aurora PostgreSQL Limitless Database, with the “limitless” factor here stemming from its serverless horizontal scaling (sharding) capability. This data service exists as a part of Amazon Aurora, the hyperscaler’s database with MySQL and PostgreSQL compatibility that pledges to cost one-tenth of other “commercial databases” with up to 99.99% availability.
As fond as AWS is of Amazon Aurora (it can scale to hundreds of thousands of transactions in a fraction of a second) Aurora PostgreSQL Limitless Database is a route to “scale beyond the existing Aurora limits” for write throughput and storage. How? This data service extends extra muscle by distributing its database workload over multiple Aurora writer instances while maintaining the ability to use it as a single database.
Writer and Reader Instances
A writer database instance in this case is responsible for handling write operations as data is laid down. Within Amazon Aurora DB clusters, the primary writer instance also has read capabilities to perform all of the data modifications needed to a cluster volume. Aurora Replica is the reader DB instance to balance.
As AWS author Channy Yun detailed at the initial preview of this technology one year ago, the compute and storage capacity that is used for Amazon Aurora PostgreSQL Limitless Database is in addition to and independent of the capacity of the writer and reader instances in the cluster.
Yun also noted that this database offering uses a two-layer architecture consisting of multiple database nodes in a database shard group. This way, either routers or shards can be used to scale based on the workload.
Routers here are clearly not for WiFi connectivity; in the cloud database universe a router is a node (a data storage and management server, basically) that accepts SQL connections from clients, sends SQL commands to shards, maintains system-wide consistency and returns results to client applications. Shards are nodes that store a subset of database tables alongside full copies of data, which accept queries from routers.
Three Types of Tables
There are three types of tables that can contain data here: Sharded, reference and standard. Sharded tables are (as they sound) distributed across multiple shards. Data is split among the shards based on the values of designated columns in the table, called shard keys.
Reference tables copy data in full on every shard so that “join queries” can work faster by eliminating unnecessary data movement. They are commonly used for infrequently modified reference data, such as product catalogs and zip codes. Standard tables are like regular Aurora PostgreSQL tables.
“Standard tables are all placed together on a single shard so join queries can work faster by eliminating unnecessary data movement. You can create sharded and reference tables from standard tables,” explained Yun, in an AWS blog. “Once you have created the DB shard group and your sharded and reference tables, you can load massive amounts of data into Aurora PostgreSQL Limitless Database and query data in those tables using standard PostgreSQL queries.
Why Use a Cloud Database?
AWS’s Yun and team are community development-spirited data science developer purists. As such, there’s no major attempt to sell the sizzle on this sausage i.e. interested parties are meant to read the technology preview pages, access the Aurora Limitless Database reference in the Amazon Aurora user guide and make their own minds up.
If this division of AWS did want to leave a warm feeling with users approaching this technology, it might point to the compute capacity that stems from the serverless nature of this offering and the storage girth here as each shard has a maximum capacity of 128 TiB (that’s a tebibyte, which is equal to nearly 1.1 terabytes) and the monitoring functionality, which can be performed using Amazon CloudWatch, Amazon CloudWatch Logs, or AWS Performance Insights.