Amazon Web Services - AWS

When people talk about the cloud, the chances are that they are referring to AWS. They are currently the largest hosting company with an estimated revenue of $33B in 2019.¹

The “cloud” is a marketer’s term for shared hosting. Cloud providers make it relatively easy to set up and connect to virtual servers, making it possible to quickly scale up or down the number of server resources needed; however, this abstraction causes some bottlenecks that aren’t present on bare metal servers.

Elastic Cloud Compute (EC2)

EC2 is a service that allows people to provision various sized virtual servers ranging from a few dollars a month to several thousand depending on the resources available like CPU cores, available memory, and network speeds. These servers are ephemeral, meaning, if they are stopped and then restarted, they may boot up on a different physical host server. When this happens, their IP address changes, and any data on a local drive is gone.

Stopping an EC2 instance is more destructive than a simple reboot, where the virtual server generally remains on the same underlying host.

Amazon makes it possible to move a public IP to a new host by assigning an Elastic IP.

EC2 Instance Types

The list of different instance types² is ever-growing and changing; however, classes of instance types are worth discussing.

Since most queries scan over tables or indices and return the results, memory is the first bottleneck most small to medium-sized databases hit. Keeping most, if not all, data in memory can give a database a dramatic speed boost. Hard drives, especially EBS drives on the network, are far slower than RAM; therefore, the memory-optimized class of instances is an excellent choice, with the “R” class being the most common. These instances provide an increased amount of available RAM for the price.

Once the amount of data is too large to fit in memory and EBS bottlenecks are reached, the Storage Optimized classes become valuable. These provide ephemeral Local Storage drives, which are currently fast NVMe drives. When a VM crashes or is stopped and restarted, it will be booted on a new host and reset the hard drive; therefore, proper safeguards must be in place to avoid problems.

Databases that can cluster together, such as ElasticSearch, Dgraph, and Cassandra, can take advantage of these with relative safety. The databases themselves handle sharding and load balancing. For relational databases like Postgres or MySQL, set up at least one synchronous follower to minimize risk.

Another excellent use case is for databases that transform rather than store Source of Truth data. The data is temporary and therefore benefits significantly from this type of server.

Using these storage-optimized classes isn’t for everyone, but they are an excellent option for those who need every last IOPS available or GB/s throughput. Currently, I3en instances are cheaper than R5 instances with a large EBS drive.

Some databases use GPUs to process queries, such as OmniSci³. These enable high-speed table scans, making it possible to have databases that don’t require indices to speed up queries. While OmniSci has a community edition, the actual instance type can be relatively expensive.⁴

Elastic Block Store (EBS)

Because EC2 instances are ephemeral, people need a different way to persist data to disks that will survive a virtual host crash. EBS is Amazon’s solution to this, which is essentially a Storage Area Network (SAN). EBS is a large array of disks that live on the network, making it possible for people to create a virtual drive with however many GBs they want. The current upper limit is 16TB per drive for general purpose drives and 64 TB for Provisioned IOPS.⁵

EBS currently has four types of drives available with different costs associated with them⁶. Cold HDD is the cheapest and slowest. On the other hand, provisioned IOPS is currently the most expensive SSD volume.

For databases, I almost always use general-purpose SSD (GP3) drives because you get the same uptime guarantees, and currently, you can receive 1000 MB/s throughput, up to 16k IOPS, and a size of up to 16 TB. If you need more than this, you can switch to Provisioned IOPS, but it is costly. So instead of using Provisioned IOPS, if the server allows it, I prefer to add a second GP3 EBS data drive. In the case of Postgres, where it only allows a single data drive, I’ll add another read replica before using Provisioned IOPS.

IOPS Limits

IOPS means Input/Output Operations Per Second. This limit becomes a bottleneck more often on transactional databases that are running many small queries.

With Provisioned, you specify how many IOPS you need and pay for the privilege. Currently, the maximum is 64,000.

With GP2, it’s dependent on how big your hard drive is. Each GB gives you 3 IOPS with a current minimum of 100 and a maximum of 16,000. So if your hard drive is 500 GB, your limit is 1500 IOPS. There is also the concept of “Burst Credits” if your drive is smaller than 1 TB; therefore, credits are applied to your drive if you don’t use your maximum. If you need more IOPS than your baseline, up to a max of 3000, you use these credits until they run out, which is great for short bursts.

GP3 drives are a newer generation of the general-purpose SSD drive, which did away with the burst credits and speed boost for getting a larger size. Instead, they are a combination of how GP2 and Provisioned IOPS handles things. You get 3000 IOPS free, and you pay for any IOPS over this limit; however, the cost per GB is cheaper, often making a comparable GP3 drive cheaper than its GP2 counterpart.

Throughput Limits

The throughput limit becomes a bottleneck when running queries that scan tons of data on the hard disk, which is common when queries do large or recursive table scans, use large CTEs (Common Table Expressions), or large subqueries.

With GP2, the max speed is 250 MB/s⁷ with a drive of 334 GB or larger and an I/O of 256 KB. Smaller hard drives have a decreased max throughput using the formula: ((Volume size in GiB) × (IOPS per GiB) × (I/O size in KiB)). Since I/O is often smaller than 256KB, the actual maximum will be somewhat less than this number or require a drive that is quite a bit bigger.

With Provisioned, the top throughput limit can increase to 500 MB/s on drives with less than 32,000 IOPS, increasing to 1000 MB/s as it approaches 64,000 IOPS. In theory, based on the formula, it can achieve 500 MB/s at around 2,000 Provisioned IOPS if packets are 256KB. However, these are often smaller, so the actual IOPS would need to be higher to get close to the 500 MB limit.

Simple Storage Service (S3)

S3 is a block storage service with a price based on used storage vs. provisioned storage. It’s excellent for backup files and storing blobs like images and videos. In addition, Amazon provides a service called Athena⁸, which makes it possible to run SQL over files in an S3 bucket, charges based on the amount of data scanned, and does not charge for servers. I haven’t needed to try it yet; however, running a few queries on archived data like logs or backup files may be practical.

Amazon Relational Database Service (RDS)

RDS⁹ is a good option for people that need a database but don’t want to manage it themselves. Amazon provides functionality to upgrade the database software, change instance types, failover to a cold standby and set up an asynchronous follower with a few clicks.

RDS backups are just a snapshot of the disk. When the data is large (i.e., a few TB), I’ve seen backups take more than 24 hours to complete and lock you out of being able to edit any config because the next backup starts immediately after the first one finishes.

Instances are more expensive than their EC2 counterparts, and you can’t query a failover server, doubling the price for high availability without providing any load balancing. For load balancing, you need to set up an additional follower.

RDS does not allow SSH access, nor do they enable custom plugins. Cloudwatch metrics are available like other EC2 instances, and they provide some unique database-related metrics. RDS saves logs as many small files on S3. It’s up to you to compile and parse these for query log analysis.

RDS provides the ability to change some of the database configs, but not everything is editable.

The bottom line is that RDS is an excellent place to start to get up and running quickly, but when Milliseconds Matter, managing the server yourself is worth the effort.