Java and Technology weblog
I spent the afternoon of Day3 at OSCON attending two interesting database scalability talks. The first was on Database Scalability Patterns; The second on Database Sharding (and Spider for MySQL).
All my notes are below…
Database Scalability Patterns
Database scalability patterns are part design patterns and part application life cycle.
Typically, most databases go through the following phases:
2. Vertical partitioning
3. Vertical scaling
4. Read slaves
5. Horizontal partitioning
MyFirstDatabase is just a term to refer to a simple, unsophisticated database setup.
2. Vertical Partitioning
You now have much more data and more transactions.
The Wikipedia definition of Vertical partitioning is creating tables with fewer columns and using additional tables to store the remaining columns.
3. Vertical Scaling
Adding more RAM, more disks. Basically, adding more hardware.
This is not necessarily a one time deal as you can do multiple iterations.
4. Read Slaves
“Read Slaves” or ‘Master – Slave”
I think the idea here is that you have a single destination for database writes, but multiple sources for database reads. These ‘slaves’ read database are replicated form the single write source.
This approach however does not work for write bottle necks! It only works for read heavy loads.
- full copy of data on each node
- Partial copy
- Perhaps don’t use a RDBMS
Also note that the Master/Slave setup typically requires changes to your application code. But overall, the Master/Slave approach is fairly easy
Note that scaling large number of database writes can be much more difficult. One approach is to use multiple Masters, but while there are many ways to implement multiple masters, there are few that really work in a production environment! You may be able to reduce CPU, but it is difficult to reduce the I/O. Really, the master slave approach is is a fail-over solution, not a scalability solution.
5. Horizontal partitioning
According to Wikipedia, Horizontal partitioning is putting different rows into different tables. It is sometimes referred to as Sharding.
- Move each piece to own server
- Duplicate some data as needed
- When splitting the data, you must separate dependencies in the app code first!
Note that each each node is a new instance of vertical scaling
“Universal truths” of scaling databases
1) Vertical scalability is helpful for every pattern
Even in a horizontally scaled, fully distributed database, the number of nodes needed is affected by vertical scalability
2) New nodes are never free
-Adds points of failure
-Add management costs
-Add complexity to architecture
-Add complexity to your app code
- Plan for layered data sources
- Read/write connections in code i.e. have separate read vs write connections in the code to start with. Then you can have planned outages where you can still read, but not do updates.
- Use schemas to separate services (think about what pieces of data need to talk/be aware of each other, and what do not)
The second database scaling talk was on “Sharding for the Masses” by Giuseppe Maxia (MySQL @ Oracle)
He simply described sharding as “breaking a database into pieces”.
It is used simply as a way of dealing with too much data & traffic.
One approach to scaling is replication.
Client sends a write to master and the reads to a load balancer.
The write master and the read load balancer distribute to slaves.
I think this is simply ’4. Read Slaves’ from the database patterns talk above. Again, the speaker pointed out that this approach doesn’t scale well when you have too many writes because the he ‘Write master’ becomes the single point of failure.
So, another alternative is to use homemade sharding.
‘Homemade’ because your app contains logic of how/where to put data.
However there are 2 problems with this homemade approach.
The first is that your application logic can become very complex.
The second is that if the sharding logic is in your application, the approach doesn’t work if data is accessed via a different application!
Sharding via Spider for MySQL
So, instead of using a homemade approach, the speaker suggested using Spider for MySQL.