MS SQLServer - Database Performance Tuning
We can define performance tuning as an ongoing process of achieving maximum throughput from all components of a database system. Unfortunately, a database management system (DBMS, or alternatively, RDBMS) does not run in isolation; therefore systems other than the database engine can influence the performance of the DBMS. These include the operating system running the database, applications that interface with the database, and the network over which the database system and applications communicate. An oft-confronted reality of the database developer is end-user discontent with the performance of the database application. These applications may be functionally correct, but poor response time or performance frustrates end-users, creating a sense of dissatisfaction that may result in the abandonment of the application.
The concept of database tuning is critical to the success of most database-centric applications. An organization that establishes a database application with the aim of reducing errors, improving consistency, and increasing employee effectiveness must first convince its employees of the application's benefits. Slow performance can negatively impact employee perceptions, which could lead to a resistance in using the applications, despite the application's correct and enhanced functionality. In addition, when customers access an application, their overall impressions of the organization can be influenced by the ease and speed associated with the application. For example, an organization that offers bill payment and claim service through the Web must address not only the ease of navigating their Web pages, but also the speed at which transactions occur, since both can influence the overall level of customer satisfaction with the organization. Therefore, database applications must function not only correctly but also quickly to meet user expectations.
When performance issues arise, we must know how to address the system's performance and response time deficiencies.
Database tuning involves many factors beyond the DBMS. First, let's indicate that operational inadequacy and degradation most often come from performance bottlenecks. A performance bottleneck occurs when a database component is assigned an excessive amount of work at a given time, which it is incapable of satisfying adequately. Also, it is possible that these bottlenecks can extend beyond the DBMS and encompass other external components, such as the client application, network communication, operating systems, or some combination thereof. For this reason, a successful database administrator must understand each of these components and how they interact. A sufficient understanding of this architecture will assist the database administrator in isolating a performance bottleneck and then identifying its causes and candidate solutions.
Tuning SQL Queries
Disk Input/Output (I/O) Bottlenecks
I/O operations require read/write disk drive heads to physically move across the drive platters, potentially incurring a significant time penalty in the process. The amount of time each i/o operation requires depends mostly on the disk's revolutions per minute (rpm) and whether the operation is sequential or non-sequential. In non-sequential drive reads, the drive heads must scan the disk to look for the next logically sequential block, increasing the total I/O operation time.
Another way of avoiding I/O bottlenecks during parallel processing or concurrent query access, all tables accessed by parallel operations should be striped. Striping divides the data of a large table into small portions and stores them on separate datafiles on separate disks.
You should design different Filegroups for large tables and indexes. You must also spread the data files across controllers, I/O channels, and internal buses. To make striping effective, you must make sure that enough controllers and other I/O components are available to support the bandwidth of parallel data movement into and out of the distributed filegroup. You can use RAID systems or you can perform striping manually through careful data file allocation to Filegroups.
Central Processing Unit (CPU) Bottlenecks
CPU bottlenecks occur when too many resources compete for computer processing time at once. In this case, the database administrator may consider a combined solution of adding physical resources as well as system restructuring to alleviate the bottleneck.
Random Access Memory (RAM) Bottlenecks
RAM, like the CPU, is a physical resource of the server itself, so any other processes running on the server will take away from the amount of RAM available to the database system. In addition, if the server running the DBMS system is also being used as a network domain controller, more RAM will be taken away from the database system for that process. Therefore, the database server should be a member server of the network and not acting as a primary or a backup domain controller system.
Relational table design can have a huge effect on performance. Most popular RDBMS’ have manuals or white papers that cover best practices for designing efficient tables. Here are some general guidelines that apply to all SQL implementations:
· CHAR vs. VARCHAR : For character columns that are ﬁve characters or less, the CHAR data type is most appropriate, even if the data length varies. The VARCHAR data type is better for variable length columns that are usually longer than ﬁve characters in size. For columns that always contain data of the same length, CHAR is an obvious choice. The reason for using CHAR for short columns is that VARCHAR columns include 1 to 3 bytes to hold the length of the column data, and there is additional processing overhead for calculating the length when the column data changes.
· Numeric column data types : Use the smallest data type in which the data will ﬁt. A lot of space is wasted if, for example, the BIGINT data type is used when the data would always ﬁt in INT or SMALLINT.
· Identical data types for primary and foreign key columns : Joins work much more efficiently if the data types of the columns used in the join predicate (usually the primary key in one table and a foreign key in the other table) have identical data types. When the data types are different, the DBMS has to convert one of them so that the data values can be compared correctly, and such conversion amounts to unnecessary overhead.
· Beware of triggers: Triggers placed on database tables are sometimes convenient and/or necessary ways to solve speciﬁc problems such as enforcing complicated constraints. However, the work that the trigger does is part of the unit of work of the SQL statement that modiﬁes the table (the statement that caused the trigger to ﬁre), and therefore, it slows down that SQL statement.
Indexes can greatly improve data access times. However, always keep in mind that indexes take up storage space and they have to be maintained. Here are some considerations related to the use of indexes to improve query performance:
· Avoid indexes on frequently updated columns. Creating an index on a column that is frequently updated doubles up the amount of writes required when the column is updated. Always remember that when column data is updated, the DBMS must also update any indexes that include that column.
· Create only selective indexes. Index selectivity is a ratio of the number of distinct values a column has divided by the number of rows in a table. For example, if a table has 1000 rows and a column has 800 distinct values, the selectivity of the index is 0.8, which is considered good. However, a column such as gender that only has two distinct values (M and F) has very poor selectivity (.002 in this case). Unique indexes always have a selectivity ratio of 1.0, which is the best possible. A good rule of thumb is to avoid indexes with a selectivity of less than 0.33 unless they are indexes especially designed for low selectivity such as bit-map indexes.
· Foreign key indexes improve joins. With most optimizers, an index on a foreign key column greatly improves join performance, and it can enable additional join methods for the optimizer to use.
· Index columns frequently used in predicates. For large tables, every query should contain a WHERE predicate that references an indexed column. Therefore, it is best to ﬁnd the columns that are most frequently referenced by predicates and to index them.
· Don't overindex. As a rule of thumb, don't create more than three or four indexes for any table. As already stated, indexes take up storage and must be maintained. Too many indexes on a table can cripple the performance of an INSERT or UPDATE issued against that table.
· Avoid overlapping indexes. Nearly every RDBMS can use an index even when the WHERE predicate references only the ﬁrst column of the index. Therefore, overlapping indexes (those that have the same leading column) are redundant and unnecessary.
· Consider unique indexes. With some RDBMS’, unique indexes are so superior that DBAs often add otherwise unnecessary columns to an index just to make it unique.
· Drop indexes for bulk loads. For mass loads of data into a table, consider dropping some of the indexes and re-creating them after the load is complete. This can save a substantial amount of time in some DBMSs.
Tuning SQL Queries
About 80 percent of database query performance problems can be solved by adjusting the SQL statement. However, you must understand how the particular DBMS being used processes SQL statements in order to know what to tweak.
A query execution plan is a description of how an RDBMS will process a particular query, including index usage, join logic, and estimated resource cost. It is important to learn how to use the execution plan in your DBMS, because it will show you exactly how the DBMS will process the SQL statement you are attempting to tune.
Avoid using a WHERE clause in your SELECT statement to narrow the number of rows returned. If you don't use a WHERE clause, then SQL Server will perform a table scan of your table and return all of the rows. In some case you may want to return all rows, and not using a WHERE clause is appropriate in this case. But if you don't need all the rows returned, use a WHERE clause to limit the number of rows returned.
By returning data you don't need, you are causing SQL Server to perform I/O it doesn't need to perform, wasting SQL Server resources. In addition, it increases network traffic, which can also lead to reduced performance. And if the table is very large, a table scan will lock the table during the time-consuming scan, preventing other users from accessing it, hurting concurrency.
Another negative aspect of a table scan is that it will tend to flush out data pages from the cache with useless data, which reduces SQL Server's ability to reuse useful data in the cache, which increases disk I/O and hurts performance.
To help identify long running queries, use the SQL Server Profiler Create Trace Wizard to run the "TSQL By Duration" trace. You can specify the length of the long running queries you are trying to identify (such as over 1000 milliseconds), and then have these recorded in a log for you to later investigate.
When using the UNION statement, keep in mind that, by default, it performs the equivalent of a SELECT DISTINCT on the final result set. In other words, UNION takes the results of two like recordsets, combines them, and then performs a SELECT DISTINCT in order to eliminate any duplicate rows. This process occurs even if there are no duplicate records in the final recordset. If you know that there are duplicate records, and this presents a problem for your application, and then by all means use the UNION statement to eliminate the duplicate rows.
On the other hand, if you know that there will never be any duplicate rows, or if there are, and this presents no problem to your application, then you should use the UNION ALL statement instead of the UNION statement. The advantage of the UNION ALL is that is does not perform the SELECT DISTINCT function, which saves a lot of unnecessary SQL Server resources from being using.
Sometimes you might want to merge two or more sets of data resulting from two or more queries using UNION. For example:
SELECT column_name1, column_name2FROM table_name1WHERE column_name1 = some_valueUNIONSELECT column_name1, column_name2FROM table_name1WHERE column_name2 = some_value
This same query can be rewritten, like the following example, and when doing so, performance will be boosted:
SELECT DISTINCT column_name1, column_name2FROM table_name1WHERE column_name1 = some_value OR column_name2 = some_value
And if you can assume that neither criteria will return duplicate rows, you can even further boost the performance of this query by removing the DISTINCT clause.
Carefully evaluate whether your SELECT query needs the DISTINCT clause or not. Some developers automatically add this clause to every one of their SELECT statements, even when it is not necessary. This is not a good and should be avoided.
The DISTINCT clause should only be used in SELECT statements if you know that duplicate returned rows are a possibility, and that having duplicate rows in the result set would cause problems with your application.
The DISTINCT clause creates a lot of extra work for SQL Server, and reduces the physical resources that other SQL statements have at their disposal. Because of this, only use the DISTINCT clause if it is necessary.
In your queries, don't return column data you don't need. For example, you should not use SELECT * to return all the columns from a table if you don't need all the data from each column. In addition, using SELECT * prevents the use of covered indexes, further potentially hurting query performance.
If your application allows users to run queries, but you are unable in your application to easily prevent users from returning hundreds, even thousands of unnecessary rows of data they don't need, consider using the TOP operator within the SELECT statement. This way, you can limit how may rows are returned, even if the user doesn't enter any criteria to help reduce the number or rows returned to the client. For example, the statement:
SELECT TOP 100 fname, lname FROM customersWHERE state = 'AP'
limits the results to the first 100 rows returned, even if 10,000 rows actually meet the criteria of the WHERE clause. When the specified number of rows is reached, all processing on the query stops, potentially saving SQL Server overhead, and boosting performance.
The TOP operator works by allowing you to specify a specific number of rows to be returned, like the example above, or by specifying a percentage value, like this:
SELECT TOP 10 PERCENT fname, lname FROM customersWHERE state = 'AP’
In the above example, only 10 percent of the available rows would be returned. Keep in mind that using this option may prevent the user from getting the data they need. For example, the data they are looking for may be in record 101, but they only get to see the first 100 records. Because of this, use this option with discretion.
You may have heard of a SET command called SET ROWCOUNT. Like the TOP operator, it is designed to limit how many rows are returned from a SELECT statement. In effect, the SET ROWCOUNT and the TOP operator perform the same function.
While is most cases, using either option works equally efficiently, there are some instances (such as rows returned from an unsorted heap) where the TOP operator is more efficient than using SET ROWCOUNT. Because of this, using the TOP operator is preferable to using SET ROWCOUNT to limit the number of rows returned by a query.
In a WHERE clause, the various operators used directly affect how fast a query is run. This is because some operators lend themselves to speed over other operators. Of course, you may not have any choice of which operator you use in your WHERE clauses, but sometimes you do.
Here are the key operators used in the WHERE clause, ordered by their performance. Those operators at the top will produce results faster than those listed at the bottom.
· >, >=, <, <= · LIKE · <>
This lesson here is to use = as much as possible, and <> as least as possible.
In a WHERE clause, the various operands used directly affect how fast a query is run. This is because some operands lend themselves to speed over other operands. Of course, you may not have any choice of which operand you use in your WHERE clauses, but sometimes you do.
Here are the key operands used in the WHERE clause, ordered by their performance. Those operands at the top will produce results faster than those listed at the bottom.
· A single literal used by itself on one side of an operator
· A single column name used by itself on one side of an operator, a single parameter used by itself on one side of an operator
· A multi-operand expression on one side of an operator
· A single exact number on one side of an operator
· Other numeric number (other than exact), date and time
· Character data, NULLs
The simpler the operand, and using exact numbers, provides the best overall performance.
If a WHERE clause includes multiple expressions, there is generally no performance benefit gained by ordering the various expressions in any particular order. This is because the SQL Server Query Optimizer does this for you, saving you the effort. There are a few exceptions to this, which are discussed on this web site
Don't include code that doesn't do anything. This may sound obvious, but it has been done in some off-the-shelf SQL Server-based applications. For example, you may see code like this:
SELECT column_name FROM table_nameWHERE 1 = 0
When this query is run, no rows will be returned. Obviously, this is a simple example, a query like this (or part of a larger query) like this doesn't perform anything useful, and shouldn't be run. It is just wasting SQL Server resources. In addition, it has been observed in more than one case where such dead code actually causes SQL Server to through errors, preventing the code from even running.
By default, some developers, especially those who have not worked with SQL Server before, routinely include code similar to this in their WHERE clauses when they make string comparisons:
SELECT column_name FROM table_name WHERE LOWER(column_name) = 'name'
In other words, these developers are making the assumption that the data in SQL Server is case-sensitive, which it generally is not. If your SQL Server database is not configured to be case sensitive, you don't need to use LOWER or UPPER to force the case of text to be equal for a comparison to be performed. Just leave these functions out of your code. This will speed up the performance of your query, as any use of text functions in a WHERE clause hurts performance.
But what if your database has been configured to be case-sensitive? Should you then use the LOWER and UPPER functions to ensure that comparisons are properly compared? No. The above example is still poor coding. If you have to deal with ensuring case is consistent for proper comparisons, use the technique described below, along with appropriate indexes on the column in question:
SELECT column_name FROM table_nameWHERE column_name = 'NAME' or column_name = 'name'This code will run much faster than the first example.
Try to avoid WHERE clauses that are non-sargable. The term "sargable" (which is in effect a made-up word) comes from the pseudo-acronym "SARG", which stands for "Search ARGument," which refers to a WHERE clause that compares a column to a constant value. If a WHERE clause is sargable, this means that it can take advantage of an index (assuming one is available) to speed completion of the query. If a WHERE clause is non-sargable, this means that the WHERE clause (or at least part of it) cannot take advantage of an index, instead performing a table/index scan, which may cause the query's performance to suffer.
Non-sargable search arguments in the WHERE clause, such as "IS NULL", "<>", "!=", "!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE", and "LIKE '%500'" generally prevents (but not always) the query optimizer from using an index to perform a search. In addition, expressions that includes a function on a column, expressions that have the same column on both sides of the operator, or comparisons against a column (not a constant), are not sargable. But not every WHERE clause that has a non-sargable expression in it is doomed to a table/index scan. If the WHERE clause includes both sargable and non-sargable clauses, then at least the sargable clauses can use an index (if one exists) to help access the data quickly. In many cases, if there is a covering index on the table, which includes all of the columns in the SELECT, JOIN, and WHERE clauses in a query, then the covering index can be used instead of a table/index scan to return a query's data, even if it has a non-sargable WHERE clause. But keep in mind that covering indexes have their own drawbacks, such as producing very wide indexes that increase disk I/O when they are read. In some cases, it may be possible to rewrite a non-sargable WHERE clause into one that is sargable. For example, the clause: WHERE SUBSTRING(firstname,1,1) = 'm' can be rewritten like this: WHERE firstname like 'm% Both of these WHERE clauses produce the same result, but the first one is non-sargable (it uses a function) and will run slow, while the second one is sargable, and will run much faster. WHERE clauses that perform some function on a column are non-sargable. On the other hand, if you can rewrite the WHERE clause so that the column and function are separate, then the query can use an available index, greatly boosting performance. for example: Function Acts Directly on Column, and Index Cannot Be Used: SELECT member_number, first_name, last_nameFROM membersWHERE DATEDIFF(yy,datofbirth,GETDATE()) > 21 Function Has Been Separated From Column, and an Index Can Be Used: SELECT member_number, first_name, last_nameFROM membersWHERE dateofbirth <> 5
WHERE column_name <= 5 Each of the above clauses produce the same results, but the second one is sargable. If you don't know if a particular WHERE clause is sargable or non-sargable, check out the query's execution plan in Query Analyzer. Doing this, you can very quickly see if the query will be using index lookups or table/index scans to return your results. With some careful analysis, and some clever thought, many non-sargable queries can be written so that they are sargable. Your goal for best performance (assuming it is possible) is to get the left side of a search condition to be a single column name, and the right side an easy to look up value. If you run into a situation where a WHERE clause is not sargable because of the use of a function on the right side of an equality sign (and there is no other way to rewrite the WHERE clause), consider creating an index on a computed column instead. This way, you avoid the non-sargable WHERE clause altogether, using the results of the function in your WHERE clause instead. Because of the additional overhead required for indexes on computed columns, you will only want to do this if you need to run this same query over and over in your application, thereby justifying the overhead of the indexed computed column. If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server optimizer has to use a nested table scan to perform this activity, instead try to use one of the following options instead, all of which offer better performance: · Use EXISTS or NOT EXISTS · Use IN · Perform a LEFT OUTER JOIN and check for a NULL condition When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will generally want to use the EXISTS clause, as it is usually more efficient and performs faster. If you find that SQL Server uses a TABLE SCAN instead of an INDEX SEEK when you use an IN or OR clause as part of your WHERE clause, even when those columns are covered by an index, consider using an index hint to force the Query Optimizer to use the index. For example SELECT * FROM tblTaskProcesses WHERE nextprocess = 1 AND processid IN (8,32,45) takes about 3 seconds, while: SELECT * FROM tblTaskProcesses (INDEX = IX_ProcessID) WHERE nextprocess = 1 AND processid IN (8,32,45) returns in under a second. If you use LIKE in your WHERE clause, try to use one or more leading character in the clause, if at all possible. For example, use: LIKE 'm%' not: LIKE '%m' If you use a leading character in your LIKE clause, then the Query Optimizer has the ability to potentially use an index to perform the query, speeding performance and reducing the load on SQL Server. But if the leading character in a LIKE clause is a wildcard, the Query Optimizer will not be able to use an index, and a table scan must be run, reducing performance and taking more time. The more leading characters you can use in the LIKE clause, the more likely the Query Optimizer will find and use a suitable index. If your application needs to retrieve summary data often, but you don't want to have the overhead of calculating it every time it is needed, consider using a trigger that updates summary values after each transaction into a summary table. While the trigger has some overhead, overall, it may be less that having to calculate the data every time the summary data is needed. You may have to experiment to see which methods are fastest for your environment. If your application needs to insert a large binary value into an image data column, perform this task using a stored procedure, not using an INSERT statement embedded in your application. The reason for this is because the application must first convert the binary value into a character string (which doubles its size, thus increasing network traffic and taking more time) before it can be sent to the server. And when the server receives the character string, it then has to convert it back to the binary format (taking even more time). Using a stored procedure avoids all this because all the activity occurs on the SQL Server, and little data is transmitted over the network. When you have a choice of using the IN or the BETWEEN clauses in your Transact-SQL, you will generally want to use the BETWEEN clause, as it is much more efficient. For example: SELECT customer_number, customer_nameFROM customerWHERE customer_number in (1000, 1001, 1002, 1003, 1004) is much less efficient than this: SELECT customer_number, customer_nameFROM customerWHERE customer_number BETWEEN 1000 and 1004 Assuming there is a useful index on customer_number, the Query Optimizer can locate a range of numbers much faster (using BETWEEN) than it can find a series of numbers using the IN clause (which is really just another form of the OR clause). If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to use an index (assuming there is one). If the substring you are searching for does not include the first character of the column you are searching for, then a table scan is performed. If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead, for better performance. Instead of doing this: WHERE SUBSTRING(column_name,1,1) = 'b' Try using this instead: WHERE column_name LIKE 'b%' If you decide to make this choice, keep in mind that you will want your LIKE condition to be sargable, which means that you cannot place a wildcard in the first position. Where possible, avoid string concatenation in your Transact-SQL code, as it is not a fast process, contributing to overall slower performance of your application. Generally, avoid using optimizer hints in your queries. This is because it is generally very hard to outguess the Query Optimizer. Optimizer hints are special keywords that you include with your query to force how the Query Optimizer runs. If you decide to include a hint in a query, this forces the Query Optimizer to become static, preventing the Query Optimizer from dynamically adapting to the current environment for the given query. More often than not, this hurts, not helps performance. If you think that a hint might be necessary to optimize your query, be sure you first do the entire following first: · Update the statistics on the relevant tables. · If the problem query is inside a stored procedure, recompile it. · Review the search arguments to see if they are sargable, and if not, try to rewrite them so that they are sargable. · Review the current indexes, and make changes if necessary. If you have done all of the above, and the query is not running as you expect, then you may want to consider using an appropriate optimizer hint. If you decided on usingsome hints, keep in mind that as your data changes, and as the Query Optimizer changes (through service packs and new releases of SQL Server), your hard-coded hints may no longer offer the benefits they once did. So if you use hints, you need to periodically review them to see if they are still performing as expected. If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written. This assumes that no parenthesis have been used to change the order of execution. Because of this, you may want to consider one of the following when using AND: · Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time. · If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression. You may want to consider using Query Analyzer to look at the execution plans of your queries to see which is best for your situation. If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following: · Of the search criterions in the WHERE clause, at least one of them should be based on a highly selective column that has an index. · If at least one of the search criterions in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause. · If none of the column in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query. The Query Optimizer will perform a table scan or a clustered index scan on a table if the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has a useful index. A query with one or more OR clauses can sometimes be rewritten as a series of queries that are combined with a UNION ALL statement, in order to boost the performance of the query. For example, let's take a look at the following query: SELECT employeeID, firstname, lastname FROM names WHERE dept = 'prod' or city = 'Orlando' or division = 'food' This query has three separate conditions in the WHERE clause. In order for this query to use an index, then there must be an index on all three columns found in the WHERE clause. This same query can be written using UNION ALL instead of OR, like this example: SELECT employeeID, firstname, lastname FROM names WHERE dept = 'prod' UNION ALLSELECT employeeID, firstname, lastname FROM names WHERE city ='Orlando' UNION ALLSELECT employeeID, firstname, lastname FROM names WHERE division ='food' Each of these queries will produce the same results. If there is only an index on dept, but not the other columns in the WHERE clause, then the first version will not use any index and a table scan must be performed. But in the second version of the query will use the index for part of the query, but not for the entire query. Admittedly, this is a very simple example, but even so, it does demonstrate how rewriting a query can affect whether or not an index is used or not. If this query was much more complex, then the approach of using UNION ALL might be must more efficient, as it allows you to tune each part of the index individually, something that cannot be done if you use only ORs in your query. Note, that we are using UNION ALL instead of UNION. The reason for this is to prevent the UNION statement from trying to sort the data and remove duplicates, which hurts performance. Of course, if there is the possibility of duplicates, and you want to remove them, then of course you can use just UNION. If you have a query that uses ORs and it not making the best use of indexes, consider rewriting it as a UNION ALL, and then testing performance. Only through testing can you be sure that one version of your query will be faster than another. Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server. In other cases, perhaps the client doesn't even need sorted data to achieve its goal. The key here is to remember that you shouldn't automatically sort data, unless you know it is necessary. Whenever SQL Server has to perform a sorting operation, additional resources have to be used to perform this task. Sorting often occurs when any of the following Transact-SQL statements are executed: · ORDER BY · GROUP BY · SELECT DISTINCT · UNION · CREATE INDEX (generally not as critical as happens much less often) In many cases, these commands cannot be avoided. On the other hand, there are few ways that sorting overhead can be reduced. These include: · Keep the number of rows to be sorted to a minimum. Do this by only returning those rows that absolutely need to be sorted. · Keep the number of columns to be sorted to the minimum. In other words, don't sort more columns that required. · Keep the width (physical size) of the columns to be sorted to a minimum. · Sort column with number datatypes instead of character datatypes.When using any of the above Transact-SQL commands, try to keep the above performance-boosting suggestions in mind. If you have to sort by a particular column often, consider making that column a clustered index. This is because the data is already presorted for you and SQL Server is smart enough not to resort the data. If your SELECT statement includes an IN operator along with a list of values to be tested in the query, order the list of values so that the most frequently found values are placed at the first of the list, and the less frequently found values are placed at the end of the list. This can speed performance because the IN option returns true as soon as any of the values in the list produce a match. The sooner the match is made, the faster the query completes. If you need to use the SELECT INTO option, keep in mind that it can lock system tables, preventing others users from accessing the data they need. If you do need to use SELECT INTO, try to schedule it when your SQL Server is less busy, and try to keep the amount of data inserted to a minimum. If your SELECT statement contains a HAVING clause, write your query so that the WHERE clause does most of the work (removing undesired rows) instead of the HAVING clause do the work of removing undesired rows. Using the WHERE clause appropriately can eliminate unnecessary rows before they get to the GROUP BY and HAVING clause, saving some unnecessary work, and boosting performance. For example, in a SELECT statement with WHERE, GROUP BY, and HAVING clauses, here's what happens. First, the WHERE clause is used to select the appropriate rows that need to be grouped. Next, the GROUP BY clause divides the rows into sets of grouped rows, and then aggregates their values. And last, the HAVING clause then eliminates undesired aggregated groups. If the WHERE clause is used to eliminate as many of the undesired rows as possible, this means the GROUP BY and the HAVING clauses will have less work to do, boosting the overall performance of the query. If your application performs many wildcard (LIKE %) text searches on CHAR or VARCHAR columns, consider using SQL Server's full-text search option. The Search Service can significantly speed up wildcard searches of text stored in a database. The GROUP BY clause can be used with or without an aggregate function. But if you want optimum performance, don't use the GROUP BY clause without an aggregate function. This is because you can accomplish the same end result by using the DISTINCT option instead, and it is faster. For example, you could write your query two different ways: USE NorthwindSELECT OrderIDFROM [Order Details]WHERE UnitPrice > 10GROUP BY OrderID
USE NorthwindSELECT DISTINCT OrderIDFROM [Order Details]WHERE UnitPrice > 10
Both of the above queries produce the same results, but the second one will use fewer resources and perform faster.
The GROUP BY clause can be sped up if you follow these suggestions:
· Keep the number of rows returned by the query as small as possible.
· Keep the number of groupings as few as possible.
· Don't group redundant columns.
· If there is a JOIN in the same SELECT statement that has a GROUP BY, try to rewrite the query to use a subquery instead of using a JOIN. If this is possible, performance will be faster. If you have to use a JOIN, try to make the GROUP BY column from the same table as the column or columns on which the set function is used.
Consider adding an ORDER BY clause to the SELECT statement that orders by the same column as the GROUP BY. This may cause the GROUP BY to perform faster. Test this to see if is true in your particular situation.
Sometimes perception is more important that reality. For example, which of the following two queries is the fastest?
· A query that takes 30 seconds to run, and then displays all of the required results.
· A query that takes 60 seconds to run, but displays the first screen full of records in less than 1 second.
Most DBAs would choose the first option as it takes less server resources and performs faster. But from many user's point-of-view, the second one may be more palatable. By getting immediate feedback, the user gets the impression that the application is fast, even though in the background, it is not.
If you run into situations where perception is more important than raw performance, consider using the FAST query hint. The FAST query hint is used with the SELECT statement using this form:
where number_of_rows is the number of rows that are to be displayed as fast as possible.When this hint is added to a SELECT statement, it tells the Query Optimizer to return the specified number of rows as fast as possible, without regard to how long it will take to perform the overall query. Before rolling out an application using this hint, it is best to test it thoroughly to see that it performs as you expect. You may find out that the query may take about the same amount of time whether the hint is used or not. If this is the case, then don't use the hint.
Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, we can reduce I/O and boost our application's performance.
SQL Server 2000 offers a new data type called "table." Its main purpose is for the temporary storage of a set of rows. A variable, of type "table," behaves as if it is a local variable. And like local variables, it has a limited scope, which is within the batch, function, or stored procedure in which it was declared. In most cases, a table variable can be used like a normal table. SELECTs, INSERTs, UPDATEs, and DELETEs can all be made against a table variable.
For best performance, if you need a temporary table in your Transact-SQL code, try to use a table variable instead of creating a conventional temporary table instead. Table variables are created and manipulated in memory instead of the tempdb database, making them much faster. In addition, table variables found in stored procedures result in fewer compilations (than when using temporary tables), and transactions using table variables only last as long as the duration of an update on the table variable, requiring less locking and logging resources.
It is fairly common request to write a Transact-SQL query to to compare a parent table and a child table and find out if there are any parent records that don't have a match in the child table. Generally, there are three ways this can be done:
Using a NOT EXISTS
SELECT a.hdr_keyFROM hdr_tbl aWHERE NOT EXISTS (SELECT * FROM dtl_tbl b WHERE a.hdr_key = b.hdr_key)
Using a Left Join
SELECT a.hdr_keyFROM hdr_tbl aLEFT JOIN dtl_tbl b ON a.hdr_key = b.hdr_keyWHERE b.hdr_key IS NULL
Using a NOT IN
SELECT hdr_keyFROM hdr_tblWHERE hdr_key NOT IN (SELECT hdr_key FROM dtl_tbl)
In each case, the above query will return identical results. But, which of these three variations of the same query produces the best performance? Assuming everything else is equal the best performing version through the worst performing version will be from top to bottom, as displayed above. In other words, the NOT EXISTS variation of this query is generally the most efficient.
Generally, because the indexes found on the tables, along with the number of rows in each table, can influence the results. If you are not sure which variation to try yourself, you can try them all and see which produces the best results in your particular circumstances.
Be careful when using OR in your WHERE clause, it is fairly simple to accidentally retrieve much more data than you need, which hurts performance. For example, take a look at the query below:
SELECT companyid, plantid, formulaidFROM batchrecordsWHERE companyid = '0001' and plantid = '0202' and formulaid = '39988773'
ORcompanyid = '0001' and plantid = '0202'
As you can see from this query, the WHERE clause is redundant, as:
companyid = '0001' and plantid = '0202' and formulaid = '39988773'
is a subset of:
companyid = '0001' and plantid = '0202'
In other words, this query is redundant. Unfortuantely, the SQL Server Query Optimizer isn't smart enough to know this, and will do exactly what you tell it to. What will happen is that SQL Server will have to retrieve all the data you have requested, then in effect do a SELECT DISTINCT to remove redundant rows it unnecessarily finds.
In this case, if you drop this code from the query:
companyid = '0001' and plantid = '0202'
then run the query, you will receive the same results, but with much faster performance.
If you need to verify the existence of a record in a table, don't use SELECT COUNT(*) in your Transact-SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-SQL IF EXITS to determine if the record in question exits, which is much more efficient.
Here's how you might use COUNT(*):
IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')
Here's a faster way, using IF EXISTS:
IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')
The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the text is proven true, while COUNT(*) must count go through every record, whether there is only one, or thousands, before it can be found to be true.
Let's say that you often need to INSERT the same value into a column. For example, perhaps you have to perform 100,000 INSERTs a day into a particular table, and that 90% of the time the data INSERTed into one of the columns of the table is the same value.
If this the case, you can reduce network traffic (along with some SQL Server overhead) by creating this particular column with a default value of the most common value. This way, when you INSERT your data, and the data is the default value, you don't INSERT any data into this column, instead allowing the default value to automatically be filled in for you. But when the value needs to be different, you will of course INSERT that value
Into the column.
Performing UPDATES takes extra resources for SQL Server to perform. When performing an UPDATE, try to do as many of the following recommendations as you can in order to reduce the amount of resources required to perform an UPDATE. The more of the following suggestions you can do, the faster the UPDATE will perform.
· If you are Updating a column of a row that has an unique index, try to only update one row at a time.
· Try not to change the value of a column that is also the primary key.
· When updating VARCHAR columns, try to replace the contents with contents of the same length.
· Try to minimize the Updating of tables that have UPDATE triggers.
· Try to avoid Updating columns that will be replicated to other databases.
· Try to avoid Updating heavily indexed columns.
· Try to avoid Updating a column that has a reference in the WHERE clause to the column being updated.
Of course, you may have very little choice when Updating your data, but at least give the above suggestions a thought.
If you have created a complex transaction that includes several parts, one part of which has a higher probability of rolling back the transaction than the others, better performance will be provided if you locate the most likely to fail part of the transaction at the front of the greater transaction. This way, if this more-likely-to-fail transaction has to roll back because of a failure, there have been no resources wasted on the other less-likely-to-fail transactions.
Database Performance Tuning requires skill and a lot of patience. Resolving a performance issue may take several weeks and eventually may be resolved by something as simple as changing a where clause, configuration setting or a faulty hardware component or adding or removing an index. The best way to troubleshoot a performance issue therefore is to follow the basics guidelines and make changes in a phase wise manner and record the impact of those changes at each stage before finally arriving at an acceptable performance level.