Rian's Really Good Technoblog!

More Backing Up MySQL: Know Your Cat: Storage Engines

by rian on Nov.02, 2009, under Databases, Disaster Recovery, Infrastructure

Last time, I wrote about options for backing up MySQL, but I talked about them as though there is one kind of database, the MySQL database.  However, that’s not at all true.  There are two main types of storage engine used in MySQL– MyISAM and InnoDB.  In fact, there are many more, but these are the two most commonly used without special requirements. This is about the point where those of you not interested in the more, well, “technical” aspects of databases might want to go outside for a walk.

In reality, you could use MySQL for years and really know little or nothing about these database engines.  I did.  After all, if you’re setting up a database to manage, say, your office equipment in a small business, you’ll maybe have a couple of hundred rows?  You may only access the database once a week.  Who cares what the engine is?  MySQL will default to MyISAM (usually), and that will work fine.  You’ll never notice, nor care.

However, when you begin to approach almost any kind of consistency, performance, or resource constraint, you’ll want to think about which engine you’re using.  This, in turn, will affect your backup process.

The 10,000 foot comparison of MyISAM and InnoDB

I always find that it’s nice to look at these things in a table:

MyISAM InnoDB
ACID (i.e., transactional) No Yes
Foreign Keys No Yes
Full-text Searches Yes No
Locking Table Row
Memory Use Less More
Speed Winner Possibly in light load Usually
Backup mysqldump, mysqlhotcopy, replication, files mysqldump, replication, files*, ibbackup**

* you technically can do a files-based backup with InnoDB (in fact, I do), but it’s a bit more involved.  See below for a discussion.

** ibbackup is a commercial product sold by the developers of the innodb engine, now a subsidiary of, gulp, Oracle.  The price? According to their site: “…please contact innodb_sales_ww at oracle.com…” i.e., “how much ya got?”

Let’s hit these in order:

ACID: atomicity, consistency, uh… isohedron, um… danger?  Whatever.  It’s an acronym that means that the engine has mechanisms in place to ensure that “transactions” are performed reliably.  The classic example is the bank transfer.  You need to know that if you take the money out of account A, it gets into account B.  If not, the whole thing is called off.  MyISAM does not implement these mechanisms.  InnoDB does.

Foreign Keys (FK): Similarly to ACID, FK assures that your data makes sense in relation to your model.  For example, if you’re looking at an 8-track tape collection (I’m super old), you can’t have a recording without the recording’s artist.  So, before the recording table gets “They Only Come Out at Night” inserted, the artist table must contain Edgar Winter.  If you delete Edgar from the artist table, you’d better be sure that “TOCOAN” is also out of the recording table.  So, unlike ACID, which enforces proper operation of the database, FK enforces proper/logical USE of the database.  MyISAM does not support FK.  InnoDB does.

Full-text Searches: In short, if you have a large text-type field, MyISAM supports special functions to index that field so that it can be searched against keywords with natural language and keyword expansion extensions if desired.  InnoDB doesn’t.

Locking: MyISAM does not support row-level locking.  That means that if you want to lock a field to prevent it from changing during an operation, you need to lock the entire table that it’s in.  That means all write operations stop until you’re done.  In a heavy load situation, that’s, as they say, “bad”.  This is one of the main reasons that InnoDB whoops MyISAM in its ability to support concurrent access, particularly at heavy loads.

Memory Use: All those fancy features of InnoDB aren’t free.  One cost is additional memory use.  For instance, InnoDB maintains “clustered” indexes and includes the primary key in any secondary indexes.  This speeds data access and improves CPU and IO efficiency, but at a cost of storing all that information in memory.  MyISAM stores  data and indexes in separate files.  Less efficient, but easier on the RAM.

Speed Winner: Oh boy, I hate to even get into this one, but most of the benchmarks I’ve seen show InnoDB to win speed contests in most cases.  This becomes particularly true at high load levels.  This makes a lot of sense if you consider the concurrency and efficiency characteristics.  Or, if you have some anecdotal evidence to the contrary, let’s just agree that you’re right, and I’m wrong.  (But I’m not.)

BACKUP!: Finally.  OK, so the big difference is that it’s not quite as easy to just copy the InnoDB files and expect anything good to happen.  With MyISAM, you can.  This means, according to the MySQL peeps, that you also cannot use the mysqlhotcopy utility with InnoDB since it uses the file mechanism to make the backup. You can, however, use the mysqldump utility with either.  Both are easier to do with a logical volume snap or replicated slave server to shutdown during the backup.  I also included InnoBase’s ibbackup program in case you’ve got some money burning a hole in your pocket.  Given the available options, I’m not sure I see the point in such a thing, but if you need some of the features it provides and don’t want to roll your own, it might be worth looking into.

So, that brings us back to my solution.  Given that I personally prefer to default my storage to InnoDB unless there’s a reason not to, I went with the replication model– create a slave server and calmly shut that down, make the backup, and restart the slave.  The thing I like about this approach is that it requires a minimum of fuss, causes no downtime, provides a running backup server, and does nothing that precludes me from using whatever type of storage engine I choose or mixing them together.

Two Pronged Approach to Backups:

Firstly, I turn on binary logging on the slave.  Then, when I shut the database down, I copy the relevant files (ibdata, .ibd, ib_logfile, .frm, and the my.cnf) to the backup staging area.  After that I can start the database back up, and it’ll sync itself back to the master.  There’s definitely some exposure to Murphy’s Law in there (e.g., the master takes on more data while the slave is shut down and then crashes before it syncs) but the probability of that is pretty minuscule.

Then, on a less frequent basis, I take a mysqldump of the database.  It takes a lot longer, but it’s got the advantages of not being binary, so it’s readable and less likely to become completely corrupted.  It’s sort of a backup-backup in case I need to fall back from a corrupt binary backup.

OK, maybe that’s enough on that topic.  Can hardly wait to see what I write about next…

:, , , , , ,

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...