Friday, March 2, 2012

Performance : SQL Server Performance Tips

Checklist: SQL Server Performance

This checklist is a companion to Chapter 14, "Improving SQL Server Performance"

SQL: Scale Up vs. Scale Out

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifOptimize the application before scaling up or scaling out.
Ff647681.checkbox(en-us,PandP.10).gifAddress historical and reporting data.
Ff647681.checkbox(en-us,PandP.10).gifScale up for most applications.
Ff647681.checkbox(en-us,PandP.10).gifScale out when scaling up does not suffice or is cost-prohibitive.

Schema

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifDevote the appropriate resources to schema design.
Ff647681.checkbox(en-us,PandP.10).gifSeparate online analytical processing (OLAP) and online transaction processing (OLTP) workloads.
Ff647681.checkbox(en-us,PandP.10).gifNormalize first, denormalize later for performance.
Ff647681.checkbox(en-us,PandP.10).gifDefine all primary keys and foreign key relationships.
Ff647681.checkbox(en-us,PandP.10).gifDefine all unique constraints and check constraints.
Ff647681.checkbox(en-us,PandP.10).gifChoose the most appropriate data type.
Ff647681.checkbox(en-us,PandP.10).gifUse indexed views for denormalization.
Ff647681.checkbox(en-us,PandP.10).gifPartition tables vertically and horizontally.

Queries

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifKnow the performance and scalability characteristics of queries.
Ff647681.checkbox(en-us,PandP.10).gifWrite correctly formed queries.
Ff647681.checkbox(en-us,PandP.10).gifReturn only the rows and columns needed.
Ff647681.checkbox(en-us,PandP.10).gifAvoid expensive operators such as NOT LIKE.
Ff647681.checkbox(en-us,PandP.10).gifAvoid explicit or implicit functions in WHERE clauses.
Ff647681.checkbox(en-us,PandP.10).gifUse locking and isolation level hints to minimize locking.
Ff647681.checkbox(en-us,PandP.10).gifUse stored procedures or parameterized queries.
Ff647681.checkbox(en-us,PandP.10).gifMinimize cursor use.
Ff647681.checkbox(en-us,PandP.10).gifAvoid long actions in triggers.
Ff647681.checkbox(en-us,PandP.10).gifUse temporary tables and table variables appropriately.
Ff647681.checkbox(en-us,PandP.10).gifLimit query and index hint use.
Ff647681.checkbox(en-us,PandP.10).gifFully qualify database objects.

Indexes

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifCreate indexes based on use.
Ff647681.checkbox(en-us,PandP.10).gifKeep clustered index keys as small as possible.
Ff647681.checkbox(en-us,PandP.10).gifConsider range data for clustered indexes.
Ff647681.checkbox(en-us,PandP.10).gifCreate an index on all foreign keys.
Ff647681.checkbox(en-us,PandP.10).gifCreate highly selective indexes.
Ff647681.checkbox(en-us,PandP.10).gifCreate a covering index for often-used, high-impact queries.
Ff647681.checkbox(en-us,PandP.10).gifUse multiple narrow indexes rather than a few wide indexes.
Ff647681.checkbox(en-us,PandP.10).gifCreate composite indexes with the most restrictive column first.
Ff647681.checkbox(en-us,PandP.10).gifConsider indexes on columns used in WHERE, ORDER BY, GROUP BY, and DISTINCT clauses.
Ff647681.checkbox(en-us,PandP.10).gifRemove unused indexes.
Ff647681.checkbox(en-us,PandP.10).gifUse the Index Tuning Wizard.

Transactions

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifAvoid long-running transactions.
Ff647681.checkbox(en-us,PandP.10).gifAvoid transactions that require user input to commit.
Ff647681.checkbox(en-us,PandP.10).gifAccess heavily used data at the end of the transaction.
Ff647681.checkbox(en-us,PandP.10).gifTry to access resources in the same order.
Ff647681.checkbox(en-us,PandP.10).gifUse isolation level hints to minimize locking.
Ff647681.checkbox(en-us,PandP.10).gifEnsure that explicit transactions commit or roll back.

Stored Procedures

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifUse Set NOCOUNT ON in stored procedures.
Ff647681.checkbox(en-us,PandP.10).gifDo not use the sp_prefix for custom stored procedures.

Execution Plans

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifEvaluate the query execution plan.
Ff647681.checkbox(en-us,PandP.10).gifAvoid table and index scans.
Ff647681.checkbox(en-us,PandP.10).gifEvaluate hash joins.
Ff647681.checkbox(en-us,PandP.10).gifEvaluate bookmarks.
Ff647681.checkbox(en-us,PandP.10).gifEvaluate sorts and filters.
Ff647681.checkbox(en-us,PandP.10).gifCompare actual versus estimated rows and executions.

Execution Plan Recompiles

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifUse stored procedures or parameterized queries.
Ff647681.checkbox(en-us,PandP.10).gifUse sp_executesql for dynamic code.
Ff647681.checkbox(en-us,PandP.10).gifAvoid interleaving data definition language (DDL) and data manipulation language (DML) in stored procedures, including the tempdb database DDL.
Ff647681.checkbox(en-us,PandP.10).gifAvoid cursors over temporary tables.

SQL XML

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifAvoid OPENXML over large XML documents.
Ff647681.checkbox(en-us,PandP.10).gifAvoid large numbers of concurrent OPENXML statements over XML documents.

Tuning

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifUse SQL Profiler to identify long-running queries.
Ff647681.checkbox(en-us,PandP.10).gifTake note of small queries called often.
Ff647681.checkbox(en-us,PandP.10).gifUse sp_lock and sp_who2 to evaluate locking and blocking.
Ff647681.checkbox(en-us,PandP.10).gifEvaluate waittype and waittime in master..sysprocesses.
Ff647681.checkbox(en-us,PandP.10).gifUse DBCC OPENTRAN to locate long-running transactions.

Testing

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifEnsure that your transactions logs do not fill up.
Ff647681.checkbox(en-us,PandP.10).gifBudget your database growth.
Ff647681.checkbox(en-us,PandP.10).gifUse tools to populate data.
Ff647681.checkbox(en-us,PandP.10).gifUse existing production data.
Ff647681.checkbox(en-us,PandP.10).gifUse common user scenarios, with appropriate balances between reads and writes.
Ff647681.checkbox(en-us,PandP.10).gifUse testing tools to perform stress and load tests on the system.

Monitoring

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifKeep statistics up to date.
Ff647681.checkbox(en-us,PandP.10).gifUse SQL Profiler to tune long-running queries.
Ff647681.checkbox(en-us,PandP.10).gifUse SQL Profiler to monitor table and index scans.
Ff647681.checkbox(en-us,PandP.10).gifUse Performance Monitor to monitor high resource usage.
Ff647681.checkbox(en-us,PandP.10).gifSet up an operations and development feedback loop.

Deployment Considerations

CheckDescription
Ff647681.checkbox(en-us,PandP.10).gifUse default server configuration settings for most applications.
Ff647681.checkbox(en-us,PandP.10).gifLocate logs and the tempdb database on separate devices from the data.
Ff647681.checkbox(en-us,PandP.10).gifProvide separate devices for heavily accessed tables and indexes.
Ff647681.checkbox(en-us,PandP.10).gifUse the correct RAID configuration.
Ff647681.checkbox(en-us,PandP.10).gifUse multiple disk controllers.
Ff647681.checkbox(en-us,PandP.10).gifPre-grow databases and logs to avoid automatic growth and fragmentation performance impact.
Ff647681.checkbox(en-us,PandP.10).gifMaximize available memory.
Ff647681.checkbox(en-us,PandP.10).gifManage index fragmentation.
Ff647681.checkbox(en-us,PandP.10).gifKeep database administrator tasks in mind.
patterns & practices Developer Center

 



How to Optimize Your Transact-SQL Code
Transact-SQL, just like any programming language, offers more than one way to perform many tasks. And as you might imagine, some techniques offer better performance than others. In this section you will learn some of the "tricks-of-the-trade" when it comes to writing high performing Transact-SQL code.

Choose the Appropriate Data Types
While you might think that this topic should be under database design, I have decided to discuss it here because Transact-SQL is used to create the physical tables that were designed during the earlier database design stage.
Choosing the appropriate data types can affect how quickly SQL Server can SELECT, INSERT, UPDATE, and DELETE data, and choosing the most optimum data type is not always obvious. Here are some suggestions you should implement when creating physical SQL Server tables to help ensure optimum performance.
  • Always choose the smallest data type you need to hold the data you need to store in a column. For example, if all you are going to be storing in a column are the numbers 1 through 10, then the TINYINT data type is more appropriate that the INT data type. The same goes for CHAR and VARCHAR data types. Don't specify more characters for character columns that you need. This allows SQL Server to store more rows in its data and index pages, reducing the amount of I/O needed to read them. Also, it reduces the amount of data moved from the server to the client, reducing network traffic and latency.
  • If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. Although the VARCHAR data type has slightly more overhead than the CHAR data type, the amount of space saved by using VARCHAR over CHAR on variable length columns can reduce I/O, improving overall SQL Server performance.
  • Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O overhead.
  • If you need to store large strings of data, and they are less than 8,000 characters, use a VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that drag down performance.
  • If you have a column that is designed to hold only numbers, use a numeric data type, such as INTEGER, instead of a VARCHAR or CHAR data type. Numeric data types generally require less space to hold the same numeric value as does a character data type. This helps to reduce the size of the columns, and can boost performance when the columns is searched (WHERE clause) or joined to another column.

Use Triggers Cautiously
Triggers can be a powerful tool in Transact-SQL, but since they execute every time that a table is INSERTED, UPDATED, or DELETED (depending on how the trigger is created), they can produce a lot of overhead. Here's some tips on how to optimize trigger performance.
  • Keep the code in your triggers to the very minimum to reduce overhead. The more code that runs in the trigger, the slower each INSERT, UPDATE, and DELETE that fires it will be.
  • Don't use triggers to perform tasks that can be performed using more efficient techniques. For example, don't use a trigger to enforce referential integrity if SQL Server's built-referential integrity is available to accomplish your goal. The same goes if you have a choice between using a trigger or a CHECK constraint to enforce rules or defaults. You will generally want to choose a CHECK constraint as they are faster than using triggers when performing the same task.
  • Try to avoid rolling back triggers because of the overhead involved. Instead of letting the trigger find a problem and rollback a transaction, catch the error before it can get to the trigger (if possible based on your code). Catching an error early (before the trigger fires) consumes fewer server resources than letting the trigger roll back.
Don't Access More Data Than You Need
While this suggestion may sound obvious, it must not be, because this is a common performance-related issue I find over and over again in many SQL Server-based applications. Here are some ideas on how to minimize the amount of data that is returned to the client.
  • Don't return more columns or rows of data to the client than absolutely necessary. This just increases disk I/O on the server and network traffic, both of which hurts performance. In SELECT statements, don't use SELECT * to return rows, always specify in your SELECT statement exactly which columns are needed to be returned for this particular query, and not a column more. In most cases, be sure to include a WHERE clause to reduce the number or rows sent to only those rows the clients needs to perform the task immediately at hand.
  • If your application allows users to run queries, but you are unable in your application to easily prevent users from returning hundreds, even thousands of unnecessary rows of data they don't need, consider using the TOP operator within the SELECT statement. This way, you can limit how may rows are returned, even if the user doesn't enter any criteria to help reduce the number or rows returned to the client.
Avoid Using Cursors
Transact-SQL is designed to work best on result sets, not on individual records. That's where cursors come into play. They allow you to process individual records. The only problem with individual record processing is that it is slow. Ideally, for high-performing SQL Server-based applications, cursors should be avoided.
If you need to perform row-by-row operations, try to find another method to perform the task. Some options are to perform row-by-row tasks at the client instead of the server, using tempdb tables at the server, or using a correlated sub-query.
Unfortunately, these are not always possible, and you have to use a cursor. If you find it impossible to avoid using cursors in your applications, then perhaps one of these suggestions will help.
  • SQL Server offers you several different types of cursors, each with its different performance characteristics. Always select the cursor with the least amount of overhead that has the features you need to accomplish your goals. The most efficient cursor you can choose is the fast forward-only cursor.
  • When using a server-side cursor, always try to fetch as small a result set as possible. This includes fetching only those rows and columns the client needs immediately. The smaller the cursor, no matter what type of server-side cursor it is, the fewer resources it will use, and performance will benefit.
  • When you are done using a cursor, don't just CLOSE it, you must also DEALLOCATE it. Deallocation is required to free up the SQL Server resources used by the cursor. If you only CLOSE the cursor, locks are freed, but SQL Server resources are not. If you don't DEALLOCATE your cursors, the resources used by the cursor will stay allocated, degrading the performance of your server until they are released.
Use Joins Appropriately
Table joins can be a big contributor of performance problems, especially if the joins include more than two tables, or if the tables are very large. Unfortunately, joins are a fact of life in relational databases. Because they are so common, you will need to take extra time to help ensure that your joins are as optimal as possible. Here are some tips to help.
  • If you have two or more tables that are frequently joined together, then the columns used for the joins should have an appropriate index. If the columns used for the joins are not naturally compact, then considering adding surrogate keys to the tables that are compact in order to reduce the size of the keys, thus decreasing read I/O during the join process, and increasing overall performance. You will learn more about indexing in the next section of this article.
  • For best performance, the columns used in joins should be of the same data types. And if possible, they should be numeric data types rather than character types.
  • Avoid joining tables based on columns with few unique values. If columns used for joining aren't mostly unique, then the SQL Server optimizer will perform a table scan for the join, even if an index exists on the columns. For best performance, joins should be done on columns that have unique indexes.
  • If you have to regularly join four or more tables to get the recordset you need, consider denormalizing the tables so that the number of joined tables is reduced. Often, by adding one or two columns from one table to another, joins can be reduced.
Encapsulate Your Code in Stored Procedures
Virtually all of the Transact-SQL used in your SQL Server-based applications should be encapsulated in stored procedures, not run as dynamic SQL or scripts. This not only reduces network traffic (only the EXECUTE or CALL is issued over the network between the client and SQL Server), but it speeds up the Transact-SQL because the code in the stored procedure residing on the server is already pre-compiled. Here are a couple of things to keep in mind when writing stored procedures for optimal performance.
When a stored procedure is first executed (and it does not have the WITH RECOMPILE option specified), it is optimized and a query plan is compiled and cached in SQL Server's memory. If the same stored procedure is called again, it will use the cached query plan instead of creating a new one, saving time and boosting performance. This may or may not be what you want. If the query in the stored procedure is the same each time, then this is a good thing. But if the query is dynamic (the WHERE clauses changes substantially from one execution of the stored procedure to the next), then this is a bad thing, as the query will not be optimized when it is run, and the performance of the query can suffer.
If you know that your query will vary each time it is run from the stored procedure, you will want to add the WITH RECOMPILE option when you create the stored procedure. This will force the stored procedure to be re-compiled each time it is run, ensuring the query is optimized each time it is run.
Always include in your stored procedures the statement, "SET NOCOUNT ON". If you don't turn this feature on, then every time a SQL statement is executed, SQL Server will send a response to the client indicating the number of rows affected by the statement. It is rare that the client will ever need this information. Using this statement helps reduce the traffic between the server and the client.
Deadlocking can occur within a stored procedure when two user processes have locks on separate objects and each process is trying to acquire a lock on the object that the other process has. When this happens, SQL Server ends the deadlock by automatically choosing one and aborting the process, allowing the other process to continue. The aborted transaction is rolled back and an error message is sent to the user of the aborted process.
To help avoid deadlocking in your SQL Server application, try to design your application using these suggestions: 1) have the application access server objects in the same order each time; 2) during transactions, don't allow any user input. Collect it before the transaction begins; 3) keep transactions short and within a single batch, and 4) if appropriate, use as low of an isolation level as possible for the user connection running the transaction.
How to Select Indexes for Optimal Database Performance
Index selection is a mystery for many SQL Server DBAs and developers. Sure, we know what they do and how they boost performance. The problem often is how to select the ideal type of index (clustered vs. non-clustered), the number of columns to index (do I need multi-column indexes?), and which columns should be indexed.
In this section we will take a brief look at how to answer the above questions. Unfortunately, there is no absolute answer for every occasion. Like much of SQL Server performance tuning and optimization, you may have to do some experimenting to find the ideal indexes. So let's begin by looking as some general index creation guidelines, then we will take a more detailed look at selecting clustered and non-clustered indexes.
Is There Such a Thing as Too Many Indexes?
Yes. Some people think that all you have to do is index everything, and then all of your performance issues will go away. It doesn't work that way. Just as an index can speed data access, it can also degrade access if it is inappropriately selected. The problem with extra indexes is that SQL Server must maintain them every time that a record is INSERTED, UPDATED, or DELETED from a table. While maintaining one or two indexes on a table is not too much overhead for SQL Server to deal with, if you have four, five, or more indexes, they can be a large performance burden on tables. Ideally, you want to have as few as indexes as you can. It is often a balancing act to select the ideal number of indexes for a table in order to find optimal performance.
As a general rule of thumb, don't automatically add indexes to a table because it seems like the right thing to do. Only add indexes if you know that they will be used by the queries run against the table. If you don't know what queries will be run against your table, then don't add any indexes until you know for sure. It is too easy to make a guess on what queries will be run, create indexes, and then later find out your guesses were wrong. You must know the type of queries that will be run against your data, and then these need to be analyzed to determine the most appropriate indexes, and then the indexes must be created and tested to see if they really help or not.
The problem of selecting optimal indexes is often difficult for OLTP applications because they tend to experience high levels of INSERT, UPDATE, and DELETE activity. While you need good indexes to quickly locate records that need to be SELECTED, UPDATED, or DELETED, you don't want every INSERT, UPDATE, or DELETE to result in too much overhead because you have too many indexes. On the other hand, if you have an OLAP application that is virtually read-only, then adding as many indexes as you need is not a problem because you don't have to worry about INSERT, UPDATE, or DELETE activity. As you can see, how your application is used makes a large difference in your indexing strategy.
Another thing to think about when selecting indexes is that the SQL Server Query Optimizer may not use the indexes you select. If the Query Optimizer chooses not to use your indexes, then they are a burden on SQL Server and should be deleted. So how come the SQL Server Query Optimizer won't always use an index if one is available?
This is too large a question to answer in detail here, but suffice to say, sometimes it is faster for SQL Server to perform a table scan on a table than it is to use an available index to access data in the table. Two reasons that this may happen is because the table is small (not many rows), or if the column that was indexed isn't at least 95% unique. How do you know if SQL Server won't use the indexes you create? We will answer this question a little later when we take a look at how to use the SQL Server Query Analyzer later in this article.
Tips for Selecting a Clustered Index
Since you can only create one clustered index per table, take extra time to carefully consider how it will be used. Consider the type of queries that will be used against the table, and make an educated guess as to which query is the most critical, and if this query will benefit from having a clustered index.
In general, use these rules of thumb when selecting a column for a possible clustered index.
  • The primary key you select for your table should not always be a clustered index. If you create the primary key and don't specify otherwise, then SQL Server automatically makes the primary key a clustered index. Only make the primary key a clustered index if it meets one of the following recommendations.
  • Clustered indexes are ideal for queries that select by a range of values or where you need sorted results. This is because the data is already presorted in the index for you. Examples of this include when you are using BETWEEN, <, >, GROUP BY, ORDER BY, and aggregates such as MAX, MIN, and COUNT in your queries.
  • Clustered indexes are good for queries that look up a record with a unique value (such as an employee number) and when you need to retrieve most or all of the data in the record. This is because the query is covered by the index.
  • Clustered indexes are good for queries that access columns with a limited number of distinct values, such as a columns that holds country or state data. But if column data has little distinctiveness, such as columns with a yes or no, or male or female, then these columns should not be indexed at all.
  • Clustered indexes are good for queries that use the JOIN or GROUP BY clauses.
  • Clustered indexes are good for queries where you want to return a lot of rows, just not a few. This is because the data is in the index and does not have to be looked up elsewhere.
  • Avoid putting a clustered index on columns that increment, such as an identity, date, or similarly incrementing columns, if your table is subject to a high level of INSERTS. Since clustered indexes force the data to be physically ordered, a clustered index on an incrementing column forces new data to be inserted at the same page in the table, creating a table hot spot, which can create disk I/O bottlenecks. Ideally, find another column or columns to become your clustered index.
What can be frustrating about the above advice is that there might be more than one column that should be clustered. But as we know, we can only have one clustered index per table. What you have to do is evaluate all the possibilities (assuming more than one column is a good candidate for a clustered index) and then select the one that provides the best overall benefit.

Tips for Selecting Non-Clustered Indexes
Selecting non-clustered indexes is somewhat easier than clustered indexes because you can created as many as is appropriate for your table. Here are some tips for selecting which columns in your tables might be helped by adding non-clustered indexes.
  • Non-clustered indexes are best for queries that return few rows (including just one row) and where the index has good selectivity (above 95%).
  • If a column in a table is not at least 95% unique, then most likely the SQL Server Query Optimizer will not use a non-clustered index based on that column. Because of this, don't add non-clustered indexes to columns that aren't at least 95% unique. For example, a column with "yes" or "no" as the data won't be at least 95% unique.
  • Keep the "width" of your indexes as narrow as possible, especially when creating composite (multi-column) indexes. This reduces the size of the index and reduces the number of reads required to read the index, boosting performance.
  • If possible, try to create indexes on columns that have integer values instead of characters. Integer values have less overhead than character values.
  • If you know that your application will be performing the same query over and over on the same table, consider creating a covering index on the table. A covering index includes all of the columns referenced in the query. Because of this, the index contains the data you are looking for and SQL Server doesn't have to look up the actual data in the table, reducing logical and/or physical I/O. On the other hand, if the index gets too big (too many columns), this can increase I/O and degrade performance.
  • An index is only useful to a query if the WHERE clause of the query matches the column(s) that are leftmost in the index. So if you create a composite index, such as "City, State", then a query such as "WHERE City = 'Houston'" will use the index, but the query "WHERE STATE = 'TX'" will not use the index.
Generally, if a table needs only one index, make it a clustered index. If a table needs more than one index, then you have no choice but to use non-clustered indexes. By following the above recommendations, you will be well on your way to selecting the optimum indexes for your tables.

No comments:

Post a Comment