So some code that made it through QA without a hitch blew chunks in Production. Fingers were pointed in various directions, but fundamentally, the SQL queries were incredibly simple, doing a simple retrieval based on matching a unique index, like so:
select field1 from MYTABLE where phone = "12223334444"
Production insisted that the moment the new software was deployed, the dedicated slave DB was being pounded into the ground, and the culprit was large numbers of full table scans. But all of the queries we knew about were exactly like the one above: retrieve part of a single record based on an exact match of a primary key.
Their DB was bigger than ours, so I loaded up a few hundred thousand phony records and tried again. As I expected, my thrash script barely raised the load. Four copies of it spewing these queries as fast as they could barely raised the load. Then I turned on log_queries_not_using_indexes to see if I was getting the volume of full table scans they were, and of course I wasn’t.
Then a real query came in from the new software, and went right into the slow-query log as a full table scan. Why? Because it looked like this:
select field1 from MYTABLE where phone = 12223334444
Fuck, fuck, fuck, fuck, fuck. MySQL silently converts the integer to a string and returns the right answer when you do this, but doesn’t use the index.