Parameters vs. Variables or One of These Things Is Not Like the Other

Take a look at @test in each of these code snippets.

@test in this procedure

(@test char(1))
SELECT * FROM ParamTest WHERE TestData = @test

Looks strikingly similar to @test in this code snippet

DECLARE @test char(1)
SET @test = 'A'
SELECT * FROM ParamTest WHERE TestData = @test

They look the same. They both use the same data types. The select statements use them the exact same way. So how can they be different?

SQL Server treats them differently because of how they are used. During optimization the SELECT statement using the parameter in the procedure will take advantage (disadvantage?) of a SQL Server feature known has parameter sniffing and can optimize the batch in ways that it cannot when using the variable.

To show this I need some test data so I will create a simple table:

    ParamTestID int identity(1,1)
    ,TestData char(1)

In this table I will load 10,000 rows – 9999 of them with TestData set to A and 1 with TestData set to Z.

DECLARE @i int
SET @i = 1
WHILE @i<10000
INSERT INTO ParamTest (TestData) SELECT 'A'
SET @i = @i+1

INSERT INTO ParamTest (TestData) SELECT 'Z'

I then will put a non-clustered index on TestData so the optimizer has multiple options for fulfilling a query which is filtered on TestData.

CREATE INDEX IX_ParamTest ON ParamTest(TestData)

Depending on the WHERE clause in your search the optimizer will choose either a table scan or an index seek/bookmark lookup to fulfill the query.
First lets constrain our query by looking for TestData that is set to A

SELECT * FROM ParamTest WHERE TestData = 'A'

We can see that with this criterion the optimizer chose a table scan.

And by looking at the properties on the select we see that optimizer estimated 9999 rows would be returned.

Now lets contrain our query by looking for TestData that is set to Z

SELECT * FROM ParamTest WHERE TestData = 'Z'

In this case the optimizer chose an index seek/bookmark lookup to fulfill the query

And that the properties show an estimate of 1 row would be returned.

So depending on the data distribution we can get a different plan. In the first case a table scan was deemed to be the quicker way to fulfill the query and in the second case an index seek/bookmark lookup.

What happens if, instead of filtering by A or Z directly we use a variable to filter.

DECLARE @test char(1)
SET @test = 'Z'
SELECT * FROM ParamTest WHERE TestData = @test

The execution plan with properties:

Notice that even though @test is set to Z and that previously Z gave us an index seek with a bookmark lookup this plan now uses a table scan. Also notice that the Estimated Number of Rows is now 5000 instead of 1. This is because when optimizing the query the value the variable is set to isn’t used by the optimizer. Without knowing specifically what to optimize for the optimizer will look at the total density of the table. Since there are only two values, A and Z in TestData the total density is .5. Half of 10,000 rows is 5000. If the TestData column had 3 values in it the total density would be .3333333 and it would estimate 3333.67 rows.

This is fine if you are looking for the value A because a table scan would be used anyway. If you are looking for the value Z though the table scan would be inefficient. The optimizer has no way of knowing what you are looking for so it must make some assumptions. The assumption is that the data is evenly distributed.

Things are different when inside of a stored procedure.

(@test char(1))
SELECT * FROM ParamTest WHERE TestData = @test

If I execute this stored procedure with  the value of Z
EXEC GetTestData 'Z'

*Note: stored procs like this can open you up to SQL Injection. Use caution.

Notice that even though the where clause was using @test and @test was set to Z, unlike the previous example where @test was a variable the optimizer is now using the value of @test in the optimization process. This is parameter sniffing. The parameters of the stored procedure can be used by SQL Server in optimizing statements that use the parameter.

This is both good and bad. The bad part is that once the plan for the stored procedure is compiled it will continue to use the same plan. A long as this execution plan is in the plan cache if you execute GetTestData with a parameter of A instead of the using the more optimal method of fulfilling the query, a table scan, an index seek and bookmark lookup will be used.
See more about plan caching here.

This can be a problem if the data in your table is widely skewed so that an execution plan for one data point isn’t optimal for another and vice versa.

There are several ways around this.
1. You can use the RECOMPILE option on the stored procedure definition. This will cause the query plan of the stored procedure to never be cached. This means every time it runs it will be compiled from scratch. This has performance ramifications. It will require CPU to compile. If the compilation of the plan uses a lot of memory you can also get bottlenecks on your SEMIPHORE which means you are limiting how often SQL Server can execute the procedure. This is not recommended.
2. You can add the RECOMPILE option to individual statements in the procedure. This means that instead of the entire procedure being compiled on each execution on the statement with the RECOMPILE option will get compiled each time. This may not have the same level of performance ramifications that using RECOMPILE on the procedure definition does.
3. If you have skewed data and know that a query plan built for Z will perform well with Z and horrible for A but know that a plan built for A will perform great with A but only so-so for Z then it might be worth always optimizing for the value the performs the best overall. You may not get the best performance every time the procedure executes but you get the best overall performance. You can do this by putting the OPTIMIZE FOR option at the statement level.
4. With the OPTIMIZE FOR option you get another option called OPTIMIZE FOR UNKNOWN. This option doesn’t try to optimize for any specific value but creates a plan utilizing the overall density. If I modify the GetTestData procedure to use the OPTIMIZE FOR UNKNOWN option I will get a plan that looks exactly like the plan created when using the variable in the ad hoc query. In fact, before the OPTMIZE FOR UNKNOWN option was added to the product in order to get this behavior from the optimizer the value of a parameter as assigned to a variable and the variable was used in the SQL statement rather than the parameter. This was often used to circumvent parameter sniffing.

Sometimes when I’m troubleshooting a stored procedure I will paste the contents into another window and run it ad hoc. I will DECLARE some variables at the start of the ad hoc batch in order to simulate the stored procedure and because they are required by statements in the batch. This can lead to execution plans of statements in the ad hoc batch being different than execution plans of statements in the stored procedure. Depending on where the bottleneck is this can cause some problems with optimizations and the resultant query plans.

The moral of this story is that things aren’t always how they look. Just because a statement has and @value being used doesn’t mean SQL will always treat it the same.