How do you tune SQL queries to improve performance?
Tuning your SQL queries can have a significantly positive impact on performance. And understanding how your specific RDBMS works can help tremendously as well. But here we will go over some tips on how to tune SQL queries in general, non-RDBMS specific terms.
What is a query execution plan?
Understanding query execution plans is one of the first steps to properly tuning SQL queries. So, what is a query execution plan – also known as an explain plan? Well, a query execution plan lists all the details of how that particular RDBMS plans on processing a particular query. Inside this plan are details on how the index will be used, how joins will be performed (and their associated logic), and also an estimate of the resource cost. Understanding the explain plan utility for your particular RDBMS is critical if you want to successfully tune SQL queries.
Here are some non-RDBMS specific things to keep in mind when tuning your SQL queries to improve performance:
Reduce the rows that are returned by a query
This is fairly obvious – if a query returns less rows, then clearly the query will be more efficient.
Get rid of unnecessary columns and tables
Even though this isn’t a change made to your actual query, the less unused space in your database, the more efficient your queries will be. This is another pretty obvious one.
GROUP BY may be better to use than DISTINCT. In some DBMS’s GROUP BY is a more efficient way of retrieving unique rows than DISTINCT. This is because GROUP BY performs the sort that finds duplicates earlier in the processing of a query than DISTINCT. The DISTINCT clause will perform the sort at the very last step, and will do this against the final result set. Why does it matter when the sort is performed? Well, if duplicate rows are eliminated earlier on in the processing of a query, then it means that the rest of the processing of the query will be more efficient because there will presumably be less rows to perform the rest of processing on, since the duplicates have already been eliminated. For your particular RDBMS, you should look at the explain plans for running a query with GROUP BY or DISTINCT to see how they compare.
Hints might help you tune your SQL queries. What is a hint? A hint is special syntax that you can put inside your SQL. What does a hint do? Well, it tells the query optimizer to perform a certain action, like if you want to tell the optimizer to use a certain method to join tables, or if you want to tell the optimizer to use a certain index.
Understand your optimizer to help you tune SQL queries
Knowing how the query optimizer for your particular RDBMS works can be a big help. This is because every optimizer does things differently. Let’s go through some things that you should keep in mind when dealing with query optimizers:
Not enough database statistics
Suppose there are not enough statistics about the database. Since cost based optimizers rely on those statistics to perform their analysis, some optimizers may have to use a rule based optimizer instead in that case. And other databases may decide not to use an index at all and just do a full table scan instead.
Are order of predicates taken into account?
You should know whether or not your optimizer takes the order of the predicates in a WHERE clause into account, and whether that order has any effect on the order in which the predicates are actually evaluated. What does that mean in plain English? Well, a predicate is the comparison portion of the WHERE clause. So, for example, if we have some SQL that says “WHERE website_name = “ProgrammerInterview.com”, then in that particular SQL query there is just one predicate comparing the website_name column to the text “ProgrammerInterview.com”. But, if we have some SQL that says “WHERE website_name = “ProgrammerInterview.com AND website_subject=”technical””, then we have two different predicates – one that checks for the website name and another that checks for the website category.
Now that we’ve cleared up what we mean by predicates, let’s get back to the original topic. So, we said that you should know if your optimizer takes the order of the predicates in a WHERE clause into account, and if that order affects the order in which the predicates are evaluated. But, why should the order of the predicates matter? Well, if you optimizer does take the order into account, then you would want the predicate that eliminates the higher number of rows to be evaluated first by the optimizer. So, for example, let’s say that we have a table called Websites which has columns for the website_active and the website_subject. The website_active column is just a “yes” or a “no” entry, and let’s assume that most of the rows (something like 90%) in the table have a “yes” value for website_active. But, let’s also say that there are three possible subjects – like “technical”, “self help”, “cooking”, etc. And, the subjects are evenly distributed amongst the rows – so 1/3rd of the rows are technical, 1/3rd are self help, etc.
Now, let’s say we want to run a query with a where clause like this “WHERE website_subject = ‘cooking’ AND website_active = ‘NO’. Which predicate of the WHERE clause should be executed first – the website_active = ‘NO’ or the website_subject = ‘cooking’? Well, think about that on your own for a second. Wouldn’t it make more sense to run the predicate which eliminates more rows first? That way, the second predicate has less rows to process. With that in mind, let’s ask ourself which predicate will eliminate more rows? Wouldn’t it be the check to see “WHERE website_active = ‘NO’? Because, that check will eliminate 90% of the rows in the table. But the check for “website_subject = ‘cooking'” would only eliminate 67% of the rows, so clearly 90% is better which means that the predicate that checks website_active = ‘NO’ should be run first.
Are order of table names being taken into account?
Just as the order of the predicates being used in the WHERE clause can have a big effect on the efficiency of the query, so can the order of the table names in the JOIN or FROM clause. This is especially true with rule based optimizers. Of course, the best thing for the RDBMS to do is to choose the most selective table first. This way the most number of rows will be removed from the result set, which means that less rows will have to be processed and the query can run more efficiently. You should check to see what your particular optimizer does to see if you need to tune your SQL queries accordingly.
Are queries being rewritten?
You should also check to see if your optimizer rewrites queries into more efficient formats. One example of this is when optimizers will rewrite subqueries into their equivalent joins, and that will make the processing that must follow much simpler. For some DBMS’s, there are certain options that have to be enabled so that the optimizer can actually rewrite queries.
Tune SQL queries by ensuring that your indexes perform well
One very important thing that can help your SQL queries run better is making sure that your database indexes also perform well. Read here for more details on that subject: Improving Index performance in SQL.
Summary of tips on how to to tune your SQL queries
In general, you should know the options available to you that may help you in tuning your SQL queries. Of course, not everything we presented above will help make your SQL perform better – because everyone’s particular situation is different. But, knowing your options is critical – as the saying goes “To the man with a hammer, every problem looks like a nail.” Make sure that you have more than just a hammer in your toolbox!