Dude, where’s my plan (part 2).

In part 1 I looked at the general makeup of the plan cache and how SQL Server decided where in the plan cache to save a plan.  Now I will look at how SQL Server finds a plan in the plan cache so it can be reused instead of precious resources being wasted for plan recompilation.  When SQL Server needs to look up a plan it uses the same hashing algorithm it used to decide where to save the plan.  This algorithm relies on the object id, the database ID and the size of the cache in question. This algorithm produces a bucket number and it is in this bucket of the specific cache store where SQL Server looks for a saved plan.  This sounds vague but part 1 has a lot more detail. I’m trying not to rehash that here.

For objects (like views, triggers, etc) the object id used in the hash is simply the object id of the object in question. These object ids can be seen by querying sys.objects. Ad hoc queries though don’t have an object id.  For these the batch is hashed to produce an object id.  Once you know the algorithm that SQL Server uses to cache a plan it now makes sense why ad hoc queries which don’t text match cannot be reused by a logically similar but physically different query.  The object id produced by each query will be different and if that object id is different the bucket used to store that plan will be different.

For example these two select statements, though logically similar and producing practically the same plan, get stored in different buckets.  The difference in these queries is that one of them has an extra space between the equal sign and the number one.

Once the proper bucket is discovered the plans stored in that bucket have to be examined to see if one matches the query being executed.  The question then becomes, what is SQL Server looking at to determine if a query being executed matches an execution plan that is saved in the plan cache?

When a query is being run there are attributes that can affect the execution of the query.  For example, a session with one specific SET option may return different results than a session with different SET options.  Collectively these attributes make up what is known as the cache key.  You can see which attributes comprise the cache key for a plan by querying sys.dm_exec_plan_attributes and looking at the is_cache_key column.

select st.text,pa.*
sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_sql_text(cp.plan_handle) st
cross apply sys.dm_exec_plan_attributes(cp.plan_handle) pa
where st.text = 'SELECT * FROM dbo.Test WHERE Col1 = 1'

If any of the attribute values for the query being executed don’t match a plan already stored in the bucket then the cached plan will not be reused, even if the text of the query matches and the results of the query will be the same.

To show this I have two examples.

Example 1: I created a small test database and opened two sessions in SSMS.  In one of the sessions I set the DATEFORMAT to dmy. In the second session I left the DATEFORMAT as the default, which is mdy on my instance.

Then I ran this query in each one:

SELECT * FROM dbo.Test WHERE Col1 = 1

Looking at the plan cache I see two cached plans with same bucketid. I know they are different plans because they have different plan handles.

To see which cache key values don’t match I compare the cache key attributes side by side.

The values for the date_format attribute are not the same. Even though the plans are stored in the same hash bucket two different plans will be stored.

Example 2: In the above plan attribute comparison notice that one of the attributes is user_id.  For this attribute Books Online says

Value of -2 indicates that the batch submitted does not depend on implicit name resolution and can be shared among different users. This is the preferred method. Any other value represents the user ID of the user submitting the query in the database.

If the object names in the query aren’t qualified by a schema name the search for the object will start with the users default schema.  If each user happened to have a table or view named Test in their database this is the object that would be found and the plans generated would actually be querying completely different tables.

For this second test I still use two session but I am executing the query as a different user in each session. Each of these users has a different default schema. I am also removing the schema qualification from the query: SELECT * FROM Test WHERE Col1 = 1

A comparison of the plan attributes for the different plans produced shows that now the user_id attribute is different.

In this particular database “Test” resolves to the Test table in the dbo schema.  But that can’t be guaranteed without qualifying the table name with the schema that owns it. This is one of the reasons it is a good idea to qualify objects with the schema the object lives in.

I can envision a poorly written application that uses many ad hoc queries and has database users all defaulting to their own schema.  A common query executed by each client application (such as querying some lookup tables) could each produce a query that will be cached in the same bucket as every other client running the same query.  Since their cache keys won’t match each instance of the application will produce another plan to be put in the same bucket, potentially producing a very long hash chain that must be traversed every time that query is run.  If the lookup tables are used often this could result in an application with reduced performance.

Is this scenario likely? No, but something similar could happen with any of the attributes designated as cache keys.