Monthly Archives: August 2011

Stored Procedures in the procedure cache

I’ve written about auto-parameterization, forced parameterization, the ‘optimize for ad-hoc workloads’ option and sp_exectutesql.   Before pulling all of these ideas together to discuss query plan reuse there is another very important component that I haven’t talked about. That is stored procedures.  Like parameterized queries whose execution plans get stored in the plan cache and can be reused by similar queries, the execution plans for stored procedures also get stored in the plan cache.  From a plan cache use perspective stored procedures provide some benefits over parameterized queries.  They also however are different in some very important and nuanced ways.  The point of this post isn’t to examine all of the pros and cons of stored procedures but to look at how they and the individual statements in them interact with the plan cache.

First I need to create a table with some example data.

IF object_id(‘dbo.Skew’) IS NOT NULL
DROP TABLE dbo.Skew
CREATE TABLE dbo.Skew
(
Col1 int,
Col2 char(2)
)
GO

DECLARE @c int
SET @c = 0
WHILE @c < 1000
BEGIN
INSERT INTO Skew (Col1,Col2) VALUES (@c,’aa’)
SET @c = @c + 1
END
GO

INSERT INTO Skew (Col1,Col2) VALUES (1000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (2000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (3000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (4000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (5000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (6000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (7000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (8000,’aa’)
GO 1000

INSERT INTO Skew (Col1,Col2) VALUES (9000,’aa’)
GO 1000

CREATE INDEX SkewIndex ON dbo.Skew (Col1)

This data is skewed so different search arguments will produce different query plans. The purpose of this is to show how, like other parameterized queries, a stored procedure can produce a query plan that may not be the best for subsequent queries with different search arguments.
The query I will be using for most of my examples:

CREATE PROC ExampleProc (@Col1 int, @Col2 nchar(2))
AS
SELECT Col2 FROM Skew where Col1 = @Col1
GO

And this can be used to look at the procedure cache:

select t.text, cp.bucketid, cp.usecounts, cp.size_in_bytes, cp.cacheobjtype, cp.objtype, qp.query_plan
from sys.dm_exec_cached_plans cp
outer apply sys.dm_exec_query_plan(cp.plan_handle) qp
outer apply sys.dm_exec_sql_text(cp.plan_handle) t
where (t.dbid < 32767 or t.dbid is null)
order by objtype,cacheobjtype,t.text

First I want to look at what happens when this query is run outside of a stored procedure, i.e. as an ad hoc query.
SELECT Col2 FROM Skew where Col1 = @Col1

Substituting @Col1 with 1 and running the query produces a plan that looks like this:

But running the query substituting @Col1 with 1000 produces an execution plan that looks like this:

If you recall from reading the post about auto and forced parameterization a parameterized query will create a plan that will continue to be used for future queries that only differ by search arguments.  In this instance the query was not auto-parameterized nor is forced parameterization in effect.  Because of this the optimizer is free to produce a plan that is best suited for the specific search arguments used in each execution.  The upside of this is that the execution plan produced is crafted for the best performance.  The downside is that a lot of queries with different search arguments will produce a lot of different plans and all of these plans will be cached.  This can be tested by changing the search argument to any number and examining the buildup of plans in the plan cache.

When this query is included in a stored procedure it operates much like a parameterized query.
The stored procedure is compiled and, because of parameter sniffing, the execution plan produced is optimized for the search arguments used on the initial compilation.  If the procedure is called again with values that would perform faster with a different execution plan SQL Server will not take advantage of a new plan without specific instructions from the author.
While a stored procedure can protect the plan cache from being filled by a lot of plans it can also cause SQL Server to use a sub-standard plan.
If I call the stored procedure with these parameters

EXEC ExampleProc 1,’aa’

The execution plan looks like this.

This plan is similar to the plan created when the query in the stored procedure (this one: SELECT Col2 FROM Skew where Col1 = 1) was run as an ad-hoc query.
If I change the parameter from 1 to 1000

EXEC ExampleProc 1000,’aa’

the stored procedure will keep the plan used during the initial compliation rather then using a table scan, which we know from running the ad hoc query earily would be the quicker way to retrive the data for this specific search argument. While table scan, which would be the most efficient way of getting the data back, SQL Server is stuck using the cached execution plan. While normally I would paste a picture of the query plan after this as proof of what is happing that would be pretty dumb because the query plan in the plan cache is the same exact plan I just posted.  However, one can see that the same cached plan is being used by looking at the usecounts column.  The number increments by one each time the plan is used.  I executed the stored procedure 5 times and the usecounts value is 5.  Not a coincidence.

There are a few ways to mitigate the problem of a statement in a procedure using an substandard query plan.

1. One could create the procedure with the WITH RECOMPILE option
CREATE PROC ExampleProc (@Col1 int, @Col2 nchar(2))
WITH RECOMPILE
AS
SELECT Col2 FROM Skew where Col1 = @Col1
GO

WITH RECOMPILE is a misnomer. It doesn’t actually force the query to recompile, technically. When running a procedure that was created with WITH RECOMPILE the procedure is compiled, the procedure is executed and then the plan that was created is discarded.  A better description of WITH RECOMPILE would be “WITH DON’T SAVE IN THE PLAN CACHE” but I guess Microsoft didn’t want us to have to type all of that.

It is impossible to query the plan cache to see what plan is used (cause it isn’t cached for us to look at) so instead I will enable viewing the execution plan through the “Include Actual Execution Plan” button on the SSMS toolbar.

Clearing the procedure cache using DBCC FREEPROCCACHE, then executing the stored procedure with the first parameter being 1 then this plan is created.

And this is what is in the proc cache:

The SET STATISTICS_XML commands are associated with enabling and disabling the showing of the execution plan in SSMS and can be ignored.  The more optimal plan was created and the plan wasn’t saved in the plan cache.

Changing the first parameter of the stored proc to 1000 the new plan looks like this:

Looking at the procedure cache shows that the plan was not cached. Unlike the previous examples the plan changed as the search argument changed.

2. Use WITH RECOMPILE when executing the stored procedure.
Adding WITH RECOMPILE to the EXEC command will cause that specific execution of the procedure to compile from scratch and to not be cached.  If the procedure had been executed previously without the WITH RECOMPILE option and cached that instance in the plan cache will be ignored by future executions of the procedure that use WITH RECOMPILE.  Freeing the procedure cache and running the query a few times can show it in the plan cache. First I have to remove WITH RECOMPILE from the procedure itself and recreate it so it will get saved in the cache.

I ran the query five times with the first parameter equal to 1 to show that the usecount is now 5 and that the plan includes a bookmark lookup.  The cached query plan looks like this:

Now running the procedure again with the WITH RECOMPILE option, the 1st parameter set to 1000 and Include Actual Execution Plan turned on:

EXEC ExampleProc 1000,’aa’ WITH RECOMPILE

The plan cache shows that the use count is still 5 and the execution plan produced shows that a table scan is used instead of the bookmark lookup that is in the cached plan.

Executing the procedure again without the WITH RECOMPILE option and with the 1st parameter as 1000 causes the usecount to be incremented by one and the plan used is the cached sub-optimal plan.

3. Use OPTION (RECOMPILE) on an individual query in the stored procedure.

CREATE PROC ExampleProc (@Col1 int, @Col2 nchar(2))
AS
SELECT Col2 FROM Skew where Col1 = @Col1 OPTION (RECOMPILE)
GO

Watching the usecounts of this procedure in the plan cache shows that the procedure itself is being reused but looking at the query plan each time also shows that this specific statement is being recompiled with each use and that the optimal query plan is being created depending on the value being passed into the first parameter.  Adding a second statement in the procedure allows one to see the difference between how different statements are handled by SQL Server.

CREATE PROC ExampleProc (@Col1 int, @Col2 nchar(2))
AS
SELECT Col2 FROM Skew where Col1 = @Col1
SELECT Col2 FROM Skew where Col1 = @Col1 OPTION (RECOMPILE)
GO

This query shows each of the SQL Statements in the batch:

select
case
when sql_handle is not null then
case
when qs.statement_end_offset <> -1 then
substring(t.text,statement_start_offset/2+1,(qs.statement_end_offset/2-qs.statement_start_offset/2)+1)
else substring(t.text,statement_start_offset/2+1,(len(t.text)*2-qs.statement_start_offset)/2+1)
end
else null
end [query_text],
cp.usecounts,qs.execution_count,qs.plan_generation_num,cp.cacheobjtype,cp.objtype,qp.query_plan,qs.*
from sys.dm_exec_cached_plans cp
outer apply sys.dm_exec_query_plan(cp.plan_handle) qp
outer apply sys.dm_exec_sql_text(cp.plan_handle) t
left join sys.dm_exec_query_stats qs
on qs.plan_handle = cp.plan_handle
where (t.dbid < 32767 or t.dbid is null)
order by objtype,cacheobjtype,t.text

Clearing the procedure cache, running this 5 times and looking at the plan cache shows us what is going on:

EXEC ExampleProc 1000,’aa’

The usecounts column is related to the stored procedure itself.  The execution _count column is for the SQL Statements.  In this example the select without the RECOMPILE option has executed the same number of times the procedure has been used.  The select with the RECOMPILE option has been executed once but the plan generation number shows that it has been compiled 6 times, once for the initial compilation and 5 times for the actual executions of the procedure.

The statement level recompile options is great for those times when a small number of statements in a procedure perform best when the plan for that query is optimized each instance it is run.

 

Choosing which queries to parameterize (or don’t work so hard, SQL Server)

Auto-parameterization, i.e. simple parameterization, and forced parameterization will parameterized queries for you while you sit back sipping on a beer.  We don’t really have a choice with auto-parameterization (though we can force individual queries to not be stored in the plan cache if we choose) and forced parameterization is like using a Sledgehammer to remove coffee stains from a tea cup.  Forced parameterization works against many ad-hoc queries. It might parameterize some queries you would rather have compiled each time they run and it might fail to parameterize some queries you want to keep in the plan cache.  The question is how do we get more granular in deciding which queries to parameterize and which queries to not parameterize?

SQL Server gives us two basic options.  We can turn on forced parameterization and choose individual queries to be compiled at each use or we can leave forced parameterization off and choose specific queries to parameterize (and cache) each time they are run.

One of the ways of choosing the queries to parameterize is the system stored procedure sp_executesql. Sp_executesql allows us to selectively parameterize queries and cache the plans for those queries for easy reuse.  Using sp_executesql, forced parameterization can be disabled and individual queries written to take advantage of SQL Server ability to parameterize.

A confusing things about sp_executesql is the number of times the word parameter is used as part of understanding the syntax. First there are the parameters that are passed into sp_executesql. Then there are the parameters that are part of the sql statement, the parameter list, which is a single string that contains a list of parameters. And then there are the parameters themselves which correspond to the parameters in the parameters list which correspond to the parameters in the sql statement.  That last sentence makes me tired. And in some cases the parameters are part of a single string variable while in another they are individuals.  It can get confusing for someone using sp_executesql for the first time.

I use these definitions to help clarify which parameters are which.
Procedure parameters: these are the parameters of sp_executesql. The number of parameters is variable.  It includes the SQL Statement, and can include the string that contains the list of parameters for the SQL Statment plus a parameter for each parameter in the SQL statement.

Statement parameters: the parameters that are part of the SQL statement to be executed. The SQL statement and the included parameters are all part of the same string variable.

List Parameters: This is a list of the parameters of the SQL statement. The items in this list will correspond with the statement parameters. It is a single string of all the parameters, not individually declared variables.

Value Parameters: These are the parameters that hold the values that are eventually passed to the SQL Statement.  They are actually parameters of sp_executesql so they are also procedure parameters.

An example:
First we need to create the SQL statement which will include the statement parameters
SET @stmt = N’SELECT Col1 FROM Test WHERE Col1 = @1 AND Col2 = @2′

Then create the list parameters.  These are actually part of a single string
SET @listparams = N’@1 int,@2 nchar(2)’

Notice that the parameters in the list parameters string match the statement parameters in the SQL statement.
Then some variables that will hold the values that will eventually be passed to the SQL Statement.  These are also the value parameters.
SET @val1 =1
SET @val2 = ‘aa’

Put them all together and this is what you get.
DECLARE @stmt nvarchar(200)
DECLARE @listparams nvarchar(200)
DECLARE @val1 int
DECLARE @val2 nchar(2)
SET @stmt = N’SELECT Col1 FROM Test WHERE Col1 = @1 AND Col2 = @2′

SET @listparams = N’@1 int,@2 nchar(2)’
SET @val1 = 1
SET @val2 = N’aa’

EXEC sp_executesql @stmt,@listparams,@val1,@val2

The variables aren’t technically needed.   It could have also been written like this:
EXEC sp_executesql N’SELECT Col1 FROM Test WHERE Col1 = @1 AND Col2 = @2′
,N’@1 int,@2 nchar(2)’,1,N’aa’

When executing a query using sp_executesql the query is parameterized.  This can be seen by looking at sys.dm_exec_query_cache along with sys.dm_exec_sql_text.

One thing to notice is that unlike when using auto and forced parameterization there is no shell execution plan that contains a pointer to the full plan.  There is only the prepared query.  This is great if you have a procedure cache littered with shell execution plans that all point to a small set of cached plans.  While the shell query plans are typically smaller than the full query plan they still take up memory which may be better used for caching other plans or, since the procedure cache memory is stolen from the buffer pool, better used for caching data and index pages.

Using sp_executesql is a great way to control what gets into the procedure cache. It provides the ability to parameterize queries that won’t be auto-parameterized or to control the auto-parameterization process.

sp_executesql has other benefits as well which I hope to blog about in a future post.

Spy vs. Spy (or Auto-Parameterization Vs. Forced Parameterization)

An interesting thing about auto-parameterization (simple parameterization) is that when SQL Server auto-parameterizes a query it chooses the data type of the input variable based on the size of the value being passed in.  For example, this query:
SELECT Col1 FROM Test WHERE Col1 = 1
produces a query plan with a input parameter of type tinyint.

And these two queries:
SELECT Col1 FROM Test WHERE Col1 = 1000000
SELECT Col1 FROM Test WHERE Col1 = 7745
produce query plans with parameters of type int and smallint.

Forced parameterization doesn’t do this.  With forced parameterization the parameter type is consistent.  The auto-parameterized query:
SELECT Col1 FROM Test WHERE Col1 = 1
that produced a tinyint data type in auto-parameterization produced an int data type parameter in forced parameterization.

You can make SQL Server be consistent for auto-parameterization by casting the literal to a consistent data type. Switching back to simple parameterization and running these queries:
SELECT Col1 FROM Test WHERE Col1 = cast(1000000 as int)
SELECT Col1 FROM Test WHERE Col1 = cast(7745 as int)
produce a parameterized query using an int parameter data type.  We can even tell from the use count that both queries are taking advantage of the same cached parameterized query plan.

Initially I suspected that forced parameterization was using the data type of the column being queried against to determine the data type of the parameter.  However when I switched the data type of the column to a bigint the parameter remained an int.
Is any of this useful?  I don’t know.  It is unclear to me the degree that auto-parameterization will create plans with different input parameter data types and for which data types it does this.

Optimize for Ad hoc Workloads (the not so good news)

This new feature in SQL 2008 has received a lot of attention so I’m only going to note one of the peculiarities I’ve discovered working with it. Mainly, having this option enabled, along with Forced Parameterization, could in fact accentuate the problem it is meant to solve and cause greater plan cache bloat.

First, here is a quick introduction to Optimize for Ad hoc Workloads for the uninitiated.
In short, this configuration option helps fight plan bloat by not caching single use plans.  With this turned on the plan does not get cached the first time a query is run, only the second.  This is pretty easy to see by running a simple test.  This test uses the database and tables I set up in a previous post on auto-parameterization.  I cleared the procedure cache (DBCC FREEPROCCACHE) and ran this query against that database:

SELECT Col2 FROM Test WHERE Col1 = 7

Looking at the plan cache using this query I see some interesting things:

SELECT p.size_in_bytes,p.cacheobjtype,p.objtype,qp.query_plan,t.text
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_query_plan(p.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t

The cacheobjtype is Compiled Plan Stub.  Notice that the query_plan is null but that the text is not.  We can see the text of the query that was run but no query plan was cached.  Also note that the size_in_bytes of the query is a measly 336.  Running the SELECT again and looking at the plan cache shows that the Compiled Plan Stub is gone replaced with a Compiled Plan. Now the size is 24576 bytes and the query plan is now available to view.

Now on to the interesting part. If your query is auto-parameterized then the auto-parameterized query plan will still get cached, even on the very first run of the query! Again, using the database and queries from the auto-parameterization blog post, I run a query I know will be auto-parameterized.

SELECT Col1 FROM Test WHERE Col1 = 1

A glance at the plan cache shows that not only has the Compiled Plan Stub been created but that a full parameterized plan has also been created which is 40960 bytes.

What happens if Forced Parameterization is enabled along with Optimize for Ad hoc Workloads?

With Optimize for Ad Hoc Workloads and Forced Parameterization turned on I am running a query from my Forced Parameterization blog post which I know will not be auto-parameterized but will be force-parameterized.

SELECT Col2 FROM Test WHERE Col1 = 1
Here is the cached plan:

The plan cache shows that this produced a compiled plan stub along with a compiled plan of 232 bytes and 57344 bytes respectively.  In other words optimizing for ad hoc workloads would normally not cache the query plan on the first run but only create a compiled plan stub.  It did create a compiled plan stub but it created one for what would have been the shell execution plan or the pointer to the parameterized plan.  But on top of that it also created a fully cached parameterized plan.

If this is a single use plan, then we have essentially blunted the blade of the Optimize for Ad hoc Workload knife.  The purpose of using Optimize for Adhoc workloads is to keep these single use plans from being cached,  yet under forced parameterization (and auto-parameterization too), plan caching is exactly what is happening.

Using Optimize for Ad hoc Workloads under the simple parameterization model this query would have produced a compiled plan stub of only 232 bytes and never created the compiled plan that is 57344 bytes.  A fantastic memory savings for a plan that will never be used again.

So, what would happen if there were thousands of single use plans created because forced parameterization is enabled?  I’m going to scale this down to two queries for a simple example and look at the results under six different circumstances.

Simple Parameterization with Optimize for Ad hoc disabled
Forced Parameterization with Optimize for Ad hoc disabled
Simple Parameterization with Optimize for Ad hoc enabled (1st run)
Simple Parameterization with Optimize for Ad hoc enabled (2nd run)
Forced Parameterization with Optimize for Ad hoc enabled (1st run)
Forced Parameterization with Optimize for Ad hoc enabled (2nd run)

Simple Parameterization with Optimize for Ad hoc disabled:

This is a straight forward out of the box example.  These two plans use 106496 bytes in the plan cache.  For a query to reuse these query plans the exact text of the query must be run again, otherwise a new plan will be created.

Forced Parameterization with Optimize for Ad hoc disabled:

In this case forced parameterization puts two plans for each query in the plan cache, the parameterized plan and the shell plan that points to the parameterized plan for a total of 188416 bytes.

Simple Parameterization with Optimize for Ad hoc enabled (1st run):

Only 672 bytes.  This is fantastic. If you never run the queries again then you’ve saved a lot of space in the plan cache.

Simple Parameterization with Optimize for Ad hoc enabled (2nd run):

106496 bytes.  Exactly the same as the first example.  So if you need to run the query again then you have to burn a little bit of CPU to compile the plan and now it is cached in case you need it a third time or more.

Forced Parameterization with Optimize for Ad hoc enabled (1st run):

139728 bytes.  This is expensive for a plan that may never be used again.  Especially compared to the 672 bytes from example 1.
Forced Parameterization with Optimize for Ad hoc enabled (2nd run):

188416 bytes.  This uses the most memory of all and if you never use these plans again you have just wasted this space in your plan cache.  The gamble, of course, is that you may have a lot of different shell plans but that these will all point to a core group of parameterized plans and ultimately save memory and compile time.

This example highlights the importance of understanding what it going on under the hood. If you have mostly single use plans and turn on Optimize for Ad hoc Workloads you could save yourself a lot of plan cache memory.  If you think turning on Forced Parameterization on top of that may help save plan cache you could be shooting yourself in the foot and not even realize it.

On the other hand if your database is characterized by mostly multi-use plans then having Forced Parameterization turned on may help you.  If you turn on optimize for Ad-Hoc workloads also then you are likely to burn more CPU because you have to recompile plans on the second time they are run.

The problem is that the query plan cache is very unlikely to be characterized by single use plans only or by multi use plans only.   This is why it is important to understand your data and how you access your data. Often we hear or read about a feature in SQL Server that helped another DBA and are too quick to enable that feature for ourselves without understanding what it really does. This is an example of why it is bad to do that.

UPDATE 6/27/2013:
Based on some of what I’ve been reading online and communication with fellow SQL Server guys I thought it important to come back here and state that this is an edge case situation. I think this may have been misconstrued as a recommendation to not have both Optimize for Ad hoc Workloads and Forced Parameterization on at the same time. That is not my recommendation. My recommendation is that if you do happen to have both on check to make sure you aren’t encountering the characteristics described in this blog post.

Forced Parameterization

In the previous post I investigated what was going on when SQL Server decides to auto-parameterize your query. What I didn’t mention is that this is also called simple parameterization. There is another category of parameterization called “Forced”.

In forced parameterization the rules are a bit different. In the previous post I mentioned the two conceptual stages of auto-parameterization. In the first stage the query is auto-parameterized by the LPE. In the second stage the query optimizer looks to see if the choice of the literal value could result in different plans being chosen. If not, the query is deemed “safe” and when run the parameterized query will be used and cached and a shell query plan will be created for the text literal query.

Forced parameterization rules allow for the parameterization of a larger range of queries than simple-parameterization. Some queries that would not be considered for parameterization in the first stage can be considered for forced parameterization. Also, in the second stage, potential for different plans will not result in the query optimizer deeming the plan unsafe. Conceptually, no longer should one think of parameterization occurring in two stages. The query is either parameterized or it is not. What happens inside the storage engine is a mystery to me but perfmon gives us a clue. In the SQL Server:SQL Server Statistics object there are counters to track parameterization information. Auto-Param Attemps/sec, Failed Auto-Params/sec, Forced Parameterizations/sec, Safe Auto-Params/sec and Unsafe Auto-Params/sec are the counters of interest to us.

When operating under simple-parameterization Auto-Param counters can keep track of the attempts to auto-parameterize, failures to auto-parameterized and if those plans are deemed safe or not, revealing the two stage concept I talked about in my last post. However when using forced Parameterization the Auto-Param counters are quiet.

To investigate forced parameterization I’ve created a schema with three tables, two normal tables and one intermediate table. In these tables I’ve inserted data that is skewed so that different plans will be produced with a query that differs only by the literal constant. It can be reproduced by using this:
USE AutoParam
GO
–DROP TABLE Parent
CREATE TABLE Parent
(ID int IDENTITY (1,1), Col1 int, Col2 char(2))
GO
–DROP TABLE ParentChild
CREATE TABLE ParentChild
(ParentID int, ChildID int)
GO
–DROP TABLE Child
CREATE TABLE Child
(ID int IDENTITY (1,1), Col1 int, Col2 char(2))
GO
–insert data into Parent
DECLARE @c int
SET @c = 0
WHILE @c < 1000
BEGIN
INSERT INTO Parent (Col1,Col2) VALUES (@c,’aa’)
SET @c = @c + 1
END
GO
INSERT INTO Parent (Col1,Col2) VALUES (1000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (2000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (3000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (4000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (5000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (6000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (7000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (8000,’aa’)
GO 1000
INSERT INTO Parent (Col1,Col2) VALUES (9000,’aa’)
GO 1000
CREATE INDEX ParentID ON Parent(ID)
CREATE INDEX ParentChildParentID ON ParentChild(ParentID)
CREATE INDEX ParentChildChildID ON ParentChild(ChildID)
CREATE INDEX ChildID ON Child(ID)
CREATE INDEX ParentCol1 ON Parent(Col1)
CREATE INDEX ChildCol1 ON Child(Col1)
GO
–insert data into child
DECLARE @c int
SET @c = 0
WHILE @c < 10000
BEGIN
INSERT INTO Child (Col1,Col2) VALUES (@c,’cc’)
SET @c = @c + 1
END
GO
–insert data into the intermediate table
INSERT INTO ParentChild (ParentID,ChildID)
SELECT P.ID,C.ID FROM Parent P
INNER JOIN Child C
ON C.Col1 = P.Col1

With forced parameterization turned off (simple is on) I want to see if the queries have been auto-parameterized by the LPE. Recall that in the auto-parameterized plan the StmtSimple element has an attribute that looks something like this:
ParameterizedText=”(@1 tinyint)SELECT [Col1] FROM [Test] WHERE [Col1]=@1″

To see the estimated execution plans turn on SHOWPLAN_XML and run Query A and Query B:
SET SHOWPLAN_XML ON
GO

Query A:
SELECT P.Col2 FROM Child C
LEFT JOIN ParentChild PC
ON C.ID = PC.ChildID
LEFT JOIN Parent P
ON P.ID = PC.ParentID
WHERE P.Col1 = 1

Query B
SELECT P.Col2 FROM Child C
LEFT JOIN ParentChild PC
ON C.ID = PC.ChildID
LEFT JOIN Parent P
ON P.ID = PC.ParentID
WHERE P.Col1 = 1000

The ParameterizedText attribute does not show up in either estimated execution plan, so these queries have not been auto-parameterized by the LPE. Monitoring perfmon counters for this query show that there was an auto-parameterization attempt but that the attempt failed. The safe and unsafe counters didn’t register.

In another query window turn on forced parameterization (I like to use a separate window so that I don’t have to keep setting SHOWPLAN_XML on and off.)
ALTER DATABASE AutoParam SET PARAMETERIZATION FORCED

With forced parameterization on (and SHOWPLAN_XML still on) run Query A and Query B as before and look at the estimated execution plans. There is a difference in the process here though. After running each select statement clear the procedure cache again. This is important because under forced parameterization the second estimated plan will be the same as the first, even though under simple parameterization the statements result in two different plans. Forced parameterization is already taking effect when you are looking at the estimated plans because the second one you look at will have used the plan of the first. Freeing the procedure cache allows the second one to generate a plan without being influenced by the first.

Query A
Query B

When you look at each of these plans the ParameterizedText attribute should exist in the StmtSimple element. Recall that under simple-parameterization these plans didn’t have a ParemeterizedText attribute and the perfmon counters showed a failed attempt at auto-parameterization. Now the auto-param perfmon counters are quiet and the forced parameterization counter registers a hit.

The final step is to turn off SHOWPLAN_XML, clear the procedure cache and run the two queries again to see if they have been deemed safe. This time don’t free the procedure cache between running the queries.

SET SHOWPLAN_XML OFF
GO
DBCC FREEPROCCACHE
GO

Run each of the queries above independently so they are separate batches. Once that is done look at the plan cache. If you are running this on SQL Server 2005 see the previous post on how to fix this query to work on that version.

SELECT t.text,p.objtype,p.cacheobjtype,p.plan_handle,qp.query_plan,s.query_hash,s.query_plan_hash
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_query_plan(p.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t
LEFT JOIN sys.dm_exec_query_stats s
ON s.plan_handle = p.plan_handle
ORDER BY s.query_plan_hash

This shows that both queries have resulted in shell queries (these are the queries on lines 2 and 3. Looking at the execution plans of these shell queries each has a pointer to the real execution plan:

Query A:
ParameterizedPlanHandle=”0x06000A00FF175F3140217686010000000000000000000000″
Query B: ParameterizedPlanHandle=”0x06000A00FF175F3140217686010000000000000000000000″

For all you know I simply pasted the same ParameterizedPlanHandle twice :), but I didn’t. I copied these directly from the XML of the query plans. These plan handles match the plan_handle of the 4th query in the list. This is the parameterized query.

This is an example of what forced parameterization can do. It can take a query that fails auto-parameterization in simple mode and parameterize it. Again, what happens in the guts of SQL Server is hard to know but I think this shows pretty well that the two stage concept of auto-parameterization does not work for forced parameterization.

Forced parameterization has some definite pitfalls to be aware of. In the above select statements, it is easy to see how a query that would perform better under one execution plan could easily get saddled by one that is already in cache. The first query uses nested loops and bookmark lookups. It also only returned one record. If the second query (which returns 1000 records) gets stuck using the plan from the first query the nested loops will have more records to process and the bookmark lookups will have be exercised for all 1000 records. However, the plan for the second query uses table and index scans which is appropriate when the number of records being returned justify table scans. If the first query has to use the second plan then a number of table scans will have to be performed to find just one record. Not a wise use of resources.

Next blog entry will look at an optimization feature that also helps keep the plan cache from becoming bloated: Optimize for Adhoc Workloads.

How To Tell If Your Query Has Been Auto-Parameterized (and why wasn’t it?)

SQL Server has the ability to auto-parameterize certain queries.  What this means is that constant literal values are replaced with variables before the query is compiled.  This allows SQL Server to reuse query plans and helps speed up query execution by skipping the compilation step, which can be a quite expensive relative to the time it actually takes to get the data from the storage engine.  Future queries that differ in the value of the literal constants can reuse a query plan that is already compiled reducing the round trip time between  a client requesting data and SQL Server returning data to the client.  Aside from reducing the total execution time it has the added benefit of helping to reduce plan cache bloat and processor usage.

Plan cache bloat occurs when the plan cache becomes filled with query plans that are unlikely to be reused.  The plan cache is a limited resource, especially on 32-bit editions, and having the plan cache filled with single-use plans has an adverse effect on SQL Server.  Plan cache pages are stolen from the buffer pool so a swollen plan cache can reduce the amount of memory available to SQL Server to cache data pages.  Less buffer pool means more disk I/O. More disk I/O means a slower system.

As new plans are being generated, plans that haven’t been used in a while are discarded to make room for new plans.  In the event one of the single use plans may have actually been able to be used again it may no longer be available in the plan cache and require compilation again.  Plan compilation takes CPU and a constantly churning plan cache uses more CPU than a stable plan cache with highly reusable plans.

Auto-parameterization takes a query that looks like this:
SELECT Col1 FROM Test WHERE Col1 = 1

and compiles it as this:
(@1 tinyint)SELECT [Col2] FROM [Test] WHERE [Col1] = @1

The @1 is the parameter.  A future query of
SELECT Col1 FROM Test WHERE Col1 = 2
could possibly reuse the query plan generated for the first query if the first query has be auto-parameterized.  It can be difficult to tell when a query has been auto-parameterized. Hopefully this post will help someone one day.

The process of auto-parameterization happens in two stages (conceptually for the purposes of this post anyway).  First when the SQL statement is parsed by the Language Processing and Execution (LPE) component of the storage engine it can be auto-parameterized. The LPE passes the query to the Query Processor for optimization.  While the Query Processor is optimizing the query it can deem a query as “safe”.  If a query has been deemed safe by the Query Processor the query runs and the query plan for the parameterized query is cached.  If the Query Processor deems the query as “unsafe” the query still executes but the plan that is cached is for that specific query only.  A similar “unsafe” query that differs only by the constant literals will have its own plan generated while a similar query for a “safe” plan will use the parameterized plan that already exists in the plan cache skipping the compilation step.

The key aspects to take away from this are the two stages of auto-parameterization.  Just because a query is auto-parameterizable doesn’t mean that it will be auto-parameterized.  The query processor can “veto” auto-parameterization.  Generally the query processor will veto auto-parameterization if it detects that the choice of constant literals could affect the plan choice.  In the follow test I’m going to show how you can look at what happened in the two stages to discover if a query is auto-parameterizable and if the query processor has allowed auto-parameterization by deeming the query safe or has vetoed auto-parameterizing by deeming the query as unsafe.  I’m going to do this with two examples. In both examples cases the queries can be auto-parameterized but in one example the query and the indexes could lead to different plan choices depending on the values of the constant literals while in the second it is unlikely that SQL Server will choose a different plan regardless of the constant literals used.

First I will create a test database and test table.
USE MASTER
GO

–DROP DATABASE AutoParam
–GO
CREATE DATABASE AutoParam
GO
USE AutoParam
GO

IF object_id(‘dbo.Test’) IS NOT NULL
DROP TABLE dbo.Test
CREATE TABLE dbo.Test
(
Col1 int,
Col2 char(2)
)
GO

Now let’s fill the test table with some data and create an index.

DECLARE @c int
SET @c = 0
WHILE @c < 10000
BEGIN
INSERT INTO Test (Col1,Col2) VALUES (@c,’aa’)
SET @c = @c + 1
END
GO
CREATE INDEX TestIndex ON dbo.Test (Col1)
GO

The test data is 10000 rows with the values of Col1 being even distributed from 0 to 9999 and the values of Col2 being constant ‘aa’ throughout.
Before actually running the query we want to take a look at the Estimated Execution Plan to see if the query itself is auto-parameterizable.
SET SHOWPLAN_XML ON
GO
SELECT Col1 FROM Test WHERE Col1 = 1
GO

Running this query should produce a single row, single column of data that contains the XML of estimated execution plan.  Click on the XML link and the execution plan should appear.  Now right click on the window and select Show Execution Plan as XML.  This should cause the XML of the plan to show.  In the StmtSimple element of the XML there is an attribute called ParameterizedText. This should contain the parameterized text of the query.  It should look like this:
ParameterizedText=”(@1 tinyint)SELECT [Col1] FROM [Test] WHERE [Col1]=@1″

Now let’s run the query for real.  Close the XML window and the Execution Plan window.  Now in query windows run this:
SET SHOWPLAN_XML OFF
GO
DBCC FREEPROCCACHE
GO
SELECT Col1 FROM Test WHERE Col1 = 1
GO
First we are turning off the feature which returns the estimated execution plan so the query will run live.  Second we are clearing the plan cache so that when we investigate it later it will be very easy to see.
After running this query we can investigate the plan cache by joining a couple of DMVs together to see what is there.

SELECT t.text, p.objtype, p.cacheobjtype, p.plan_handle, qp.query_plan, s.query_hash, s.query_plan_hash
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_query_plan(p.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t
LEFT JOIN sys.dm_exec_query_stats s
ON s.plan_handle = p.plan_handle
ORDER BY s.query_plan_hash

Explaining these DMVs is out of the scope of this post but I put in enough info to see what is going on.  The first column has the text of the query.  The second, the type of plan, the third is the identifier of the in-memory plan (it goes away when the plan goes out of cache), the forth is the XML of the query plan itself.  The query_hash column and query_plan_hash column are a dash of information from sys.dm_exec_query_stats.  I included this because there will be no query stats available for the literal text of the query that has been parameterized and it makes it easier in this example to pick out the parameterized query.  Query_hash and query_plan_has are only available on SQL 2008 and higher so if you run this on SQL 2005 use any other columns from the sys.dm_exec_query_stats DMV (by the way this example is for SQL Server 2008 and possibly 2005, not for 2000).

(Compressing the pictures for the page makes them hard to read. Click on any picture to see a larger, readable version of it.)

 

The first line of these results is the query to show the plan cache and can be ignored.  The second line is a shell query that contains the literal text of the query that was run.  If you look at the plan for the shell query you will see a pointer to the parameterized query on line 3.   Click on the query plan in line two and look at the plans XML.

Notice that in the XML element StmtSimiple the value for the attribute ParameterizedPlanHandle contains the plan handle for the parameterized query on line 3.
So this query has not only been auto-parameterized, it has been deemed safe.
Next lets run several queries that can use this cached plan.

DBCC FREEPROCCACHE
GO
SELECT Col1 FROM Test WHERE Col1 = 1
GO
SELECT Col1 FROM Test WHERE Col1 = 7
GO
SELECT Col1 FROM Test WHERE Col1 = 1000
GO
SELECT Col1 FROM Test WHERE Col1 = 7745
GO
SELECT Col1 FROM Test WHERE Col1 = 10000
GO
After running these queries run the DMV query from earlier and you should get this:

As you can see these five queries point to two parameterized queries. You can use the ParameterizedPlanHandle attribute in each query plan to see which parameterized query each points to.

Now let’s look at a query that isn’t deemed safe.   First we want to look at the Estimated Execution Plan
SET SHOWPLAN_XML ON
GO
SELECT Col2 FROM Test WHERE Col1 = 1
GO

Unlike the previous query this query isn’t covered by the index we created.  So in this case the query processor has two choices to return Col2.  It can scan the table or it can do an index seek on Col1 and then a bookmark lookup on the table for Col2.
Looking at the XML of the plan we see that this query has been auto-parameterized.  But is it safe.
ParameterizedText=”(@1 tinyint)SELECT [Col2] FROM [Test] WHERE [Col1]=@1″
Running this:
SET SHOWPLAN_XML OFF
GO
DBCC FREEPROCCACHE
GO
SELECT Col2 FROM Test WHERE Col1 = 1
GO

 

Taking a glance at the query plan also will show that there is no longer a shell plan, thus no auto-parameterized query plan.
Finally by running five queries that differ only by their literal constants shows that all five queries have not been deemed safe.
DBCC FREEPROCCACHE
GO
SELECT Col2 FROM Test WHERE Col1 = 1
GO
SELECT Col2 FROM Test WHERE Col1 = 7
GO
SELECT Col2 FROM Test WHERE Col1 = 1000
GO
SELECT Col2 FROM Test WHERE Col1 = 7745
GO
SELECT Col2 FROM Test WHERE Col1 = 10000
GO

In this case the first stage investigation shows that the query can be auto-parameterized but the second stage investigation shows that it was deemed unsafe to be auto-parameterized, and thus wasn’t.

Why was the first example successfuly auto-parameterized while the second example was not? First there are the conditions necessary for any query to be auto-parameterized.  These are available in Appendix A of the Plan Caching in SQL Server 2008 white paper.  That is not what is at play here though as I intentionlly used queries that met those conditions. These conditions are enforced at stage one. However another condition of auto-parameterization is that “only those SQL statements for which parameter values do not affect query plan selection are auto-parameterized.”  This condition is enforced by the query processor, or in what I am calling stage two.  It is this stage that the query is deemed safe to parameterize or not.

Hopefully someone out there will find this information useful. The next post delves into this topic a bit further

Other resources:
SQL Programmability & API Development Team Blog
Plan Caching in SQL Server 2008 White Paper