It was discussed many times before on many forums. The databases grow in number of instances, in size, in variety of flavors and in number of places where they run. They also get more expensive to run. Let’s review each of the problems.
Like kids the database starts small and in a blink of an eye grows up until unmanageable. Optimal size for a MySQL instance is about a terabyte. Beyond that problems start piling up. Performance. The smaller and smaller fraction of data fits into the memory and InnoDB resorts to serving queries from disk. Updating a bigger B+ tree requires disproportionately more writes. The bigger database increases chances for lock contention. Besides, InnoDB has limits that manifest themselves when the index size gets too big. Operational problems. Managing multi-terabytes instance is a nightmare. Taking a backup is difficult, restore time goes through the roof, replication desperately falls behind.
If it’s so bad, why don’t companies shard data early? In my opinion, to win some time to enter market, very important now. Also, managing ten times more instances is a challenge itself that is often underestimated. (Disclaimer: do not shard your children!)
We split the database, now there are many smaller instances. Will it make our lives easier? We wish! It just shifts the focus. Now we need to ensure schema consistency, MySQL users consistency. Passwords rotation is difficult. Even measuring availability is difficult. Thing about it, if we want to ensure 99.99% availability, it’s about an hour of downtime per year, five minutes per month, 10 seconds a day. This is how often we need to ping every database instance to achieve accuracy required to talk about four nines. Now, imagine there are 10,000 instances. It’s impossible to measure that from one host. The distributed system to do that would be a whole project.
Each service or micro service tends to create their own database. And that’s a right thing to do. Yet, that adds up number of instances. Multiply that by number of environments: prod, development, staging, testing, QA, UAT.
What about security? And yet again the security is an afterthought. We need to ensure MySQL user accounts are not re-used between services, environments. The data from production is sanitized before it’s copied to other environments. We need to make sure that only authorized person may change the databases infrastructure and if they do so, we need to know why the change was made, and who approved it. Yes, a number of certifications require not only audit trail (who made what) but also technical enforcement (unauthorized person cannot do it).
For sake of keeping this post within reasonable length I won’t write about multiple clouds, multiple database products. But I have to write about cost of running the databases. The infrastructure cost is dragging down companies’ P&L. And, you know, cloud providers aren’t actually keen to help their users in that regard. Like someone said in Internet, you pay not for what you use, but for what you forgot to turn off.
We believe the database shouldn’t be a hindrance. We start the RevDB project to let people create awesome products and services. And our duty is to support development teams’ creativity by operating their databases.
Infrastructure as code for databases
We leverage the Infrastructure as Code principle for MySQL databases. The beauty of it is not only that it simplifies infrastructure management at scale but also provides out of box many perks like versioned documentation, better database security, automatic compliance with many certifications.
In this blog we will write in details about each aspect of our solutions. I hope that will be helpful for fellow DBAs and I hope that you share your feedback and ideas with us.
Since it’s infrastructure as code, the central piece of the solution is GitHub. It stores the code, enforces code reviews. It’s a place to go for documentation. Together with a CI/CD component it drives changes in the infrastructure. We manage GitHub with Terraform. It unlocks unimaginable before possibilities. For example, I can securely initiate changes in GitHub without being organization admin!
With Terraform we manage also other services besides GitHub. AWS resources is a huge and an obvious one. Also, Datadog for monitoring, PagerDuty for alerts, even JIRA.
Our provisioning consists of three layers. Let’s say we need to deploy a MySQL replication cluster. Terraform creates AWS resources for it: autoscaling group, networks, security groups, buckets, IAM etc. After autoscaling group creates an EC2 instance, Chef kicks in and provisions OS level components: packages, configs, services etc. And finally, the third provisioning layer is our Python software – it chooses roles for MySQL replication, configures replicas, updates service discovery, maintains MySQL users, changes schema, takes backups and so on.
Our unique proposal is that we picked best tools for a job and made them work together. I’m very happy to see how it’s shaping out. It’s a solution that will make everyone happy. Developers because they will get the performant database one can rely on, DBAs because it will be pleasure to work with, managers because projects will become more predictable and easier to plan. And finally, shareholder because they will control the bill, not the cloud provider.
Stay tuned, there is more technical stuff to come. We plan to document and share many problems and how to tackled them.
We also are looking for customers who feel the pain of running databases at scale and like our vision. Drop us a message, together we will do something amazing, so not only SpaceX inspires all awe.