When you’re weighing the options for a database solution for your site or app, there’s a lot to consider. How much data do you have, and how much do you anticipate it to grow? How structured is your data? What are your demands for that data? All of these will help you determine what you’ll need in the way of scalability, table structures, provisioning, and capacity.
For unstructured data that’s being gathered at high velocity, you want a reliable, low-latency database solution that you won’t easily outgrow. One database that’s able to promise that kind of uncompromising scalability is Amazon’s DynamoDB, a nonrelational (NoSQL) database originally built to handle the immense amount of queries Amazon.com was experiencing. It’s used by high-traffic sites like Capital One, Netflix, and Airbnb, and also happens to be the database behind some critical components of Upwork’s own infrastructure, including the Time Tracker backend, the desktop app that lets freelancers track hours in their Work Diary.
Upwork’s database architects chose DynamoDB for its scalability features—so, is it the right database for your project? We spoke with Upwork Backend Developer Ilya Obshadko to hear more about DynamoDB’s design, when and why to choose it, and how it’s helped Upwork handle millions of new Time Tracker records every day.
First, let’s look at some DynamoDB basics to learn more about its design. (For a full rundown of its benefits and features, check out the AWS documentation here.)
- Each table uses a document-oriented data structure. Basically, tables are a series of JSON maps where only the hash and range keys must be defined in advance. Other attributes may be added as you go. Learn more about its Core Components.
- DynamoDB has a strict universal requirement for table primary keys. What you get in performance you will trade in flexibility—a trade-off that’s common in most data storage options. In DynamoDB, a table must contain a mandatory primary key, which can consist of a hash (partition) key, or combined hash and range key. “This is DynamoDB’s primary challenge, but also it’s main benefit,” Obshadko notes. “If you need a to query your table by some criteria not included in the primary key, you’ll need to create secondary indexes, which in turn consume provisioned capacity (and cost you money). That means these have to be very carefully selected.” Secondary indexes can be added or removed at any time.
- Estimating requests will help you provision for tables—and optimize performance. Obshadko explains, “DynamoDB’s performance is based on the idea of provisioned capacity per table. Meaning, if you know you’ll need it to handle 1000 read requests per seconds to table A and 500 read requests per second to table B, you’ll provision 1000 capacity units for table A and 500 capacity units for table B, respectively, and you will be billed according to that capacity.”
- All data requests are HTTP-based RESTful requests—both reads and writes.
Why DynamoDB Is a Good Fit for Time Tracker
Obshadko explains why DynamoDB was a natural choice. “The main reason for choosing DynamoDB for the time tracker backend was virtually unlimited scalability.” The team migrated from a PostgreSQL-based server that had utilized a highly sophisticated table partitioning schema to handle the workload of around 2.4 billion records it held at its end of life.
“By the time when we migrated, we had about 50K active contracts on the platform daily. The Time Tracker records data in 10-minute intervals, so if we assume that each freelancer works 5 hours per day, that would mean 6 * 5 * 50K, which would give us around 1.5M new records every day. This is a very rough estimate of course, but it provides some understanding of the magnitude.”
That and, the primary data structure mentioned above happens to be a great fit for time tracking data. “Time cells fits DynamoDB concept of hash/range keys just perfectly. That’s because each tracking cell contains a contract ID (hash) and cell timestamp (range key), and various other time tracking data.”
Why Else Would You Choose DynamoDB?
It’s fully managed. With DynamoDB, you don’t have to worry about maintenance or administration of operating and scaling. That means less effort with setting up, configuration, hardware provisioning, cluster scaling, and more. Just create tables that store and retrieve your data, and you can scale up to any traffic requirements you have.
DynamoDB handles partitioning automatically. Obshadko says, “This means we don’t have to think about manually partitioning the database to be able to handle the traffic.”
The scalability of NoSQL. “With SQL databases, it’s very easy to change and refactor things, but they can’t horizontally scale,” Obshadko notes. “In fact, if you hit that ceiling, you’ll be limited to purchasing more powerful hardware as the only upscaling option.” With DynamoDB, you’re locked into your data structures once they are in production, but performance is unlimited, “provided you have designed your structures right.”
It’s built right into the AWS console. It’s easy to get started by creating a table, then add keys for how you’ll sort the data.
Built-in security. Have a mission-critical workload? DynamoDB offers encryption at rest and guarantees reliability with a service level agreement (SLA) and support for VPN-encrypted connections. This plus on-demand backup and point-in-time recovery are can help with meeting any regulatory compliance requirements.
DynamoDB is great for:
- Serverless apps, with an AWS Lambda integration
- Mobile projects
- Gaming apps