Abstract
Database Management Systems (DBMS) have been monolithic structures with their own dedicated hardware, storage arrays, and consoles. Amazon Web Services (AWS) realized that while each company can use unique methods of collecting and using data, the actual processes of building the management infrastructure are almost always the same. AWS remedies DBMS problems with its Amazon Relational Database Service (Amazon RDS).
Sample
Introduction
Database Management Systems (DBMS) are an integral part of almost every large-scale software system. DBMS are large, complex software suites that not only store and retrieve data, but also secure the data, allow backups of the data, replicate the data across multiple systems for greater reliability, and cache the data for faster access.
Such large database systems are difficult to set up, maintain, and expand. They are also challenging to clone, which is a critical step that allows Development (Dev) and Quality Assurance (QA) departments to use the same environment for developing and testing an organization's products.
Meanwhile, companies are finding that the data they capture is becoming more and more valuable to their business, either as a way of measuring and improving their own operations, or as a product that can be sold. The process of extracting value from the data becomes complicated, however. This is because as the data becomes more valuable, more care should be taken with the DBMS infrastructure-and yet the people who are involved in running and maintaining the DBMS are the ones who can best help with exploiting the data for the business.
Amazon Relational Database Service (Amazon RDS) was created by Amazon Web Services (AWS) out of its own experience with these DBMS complications. Amazon RDS provides cost-effective DBMS deployment, quick and efficient scaling, and easy support of development and QA.
This paper describes how you can set up Amazon RDS and use it as a drop-in replacement for traditional DBMS, with all the cost, scaling, and agility advantages of deploying software in the cloud.
Problems with Traditional DBMS
Because traditional DBMS are so large and complex, you can run into problems with them at any stage of the software lifecycle: Production, Dev, and QA. This section first describes some of these problems in the order in which you're most likely to encounter them (Production, then in the QA-Dev cycle), and then discusses how Amazon RDS can help mitigate them.
Problems with Production DBMS
Databases-particularly large-scale databases-are at the heart of successful web applications and critical enterprise software. Any company that uses a database, or provides database services, finds itself depending on the database either for critical support of its operations or for its competitive business advantage. But there is a usually a long and involved process in setting up the DBMS that support these databases, particularly if the databases need to scale across hundreds, thousands, or even millions of users, and/or across continents or time zones, all while guaranteeing reliability.
Traditionally, DBMS have been monolithic structures with their own dedicated hardware, storage arrays, and consoles. Proper sizing of the hardware infrastructure has been a capital-intensive-but largely opaque-process. Initial configurations are based on educated guesses about system load and user needs, and lock in thousands of dollars of capital budget. Of course, these guesses almost always get at least one aspect of the deployment wrong, meaning, it requires more design time and expenditures on additional memory, bigger CPUs, and/or more disk space.
In growing its own infrastructure and then developing AWS, Amazon realized that while each company can use unique methods of collecting and using data, the actual processes of building the management infrastructure-and the problems inherent in growing that infrastructure-are almost always the same. Werner Vogels, the CTO of AWS, calls these processes "undifferentiated heavy lifting," meaning, tasks that every organization must perform but they impart no business advantage.
These processes encompass the whole spectrum of database management: you must choose the right software and hardware to run, install the latest version of software, configure the right security levels to access the hardware and software, get the entire system onto the network, and finally, make the system accessible as a data store.
Once the DBMS is running, the operational challenges start: you must make sure the database is backed up while running read replicas of the database to increase access speed and increasing capacity as the system grows. If you decide that you need high availability, there are additional complications: you must replicate the data onto separate hardware platforms, and you must detect the failure of the main server and re-route traffic to the replicated server.
These processes are not everyday tasks; in fact, setting up a new database on a new machine is a relatively rare occurrence for most organizations, as would be setting up replication, and so on. Because these processes happen rarely, they cause a different sort of problem: when it is time to perform them again, you or your team needs to re-learn them and/or update their skill set to the latest release. This means that critical configuration tasks – which can cause their own problems, and sometimes ones that do not manifest themselves until well down the road-are being done by relative rookies, every time.