No More MDI in SSMS?

After installing SQL Server 2012 I was disappointed to discover that the MDI option has been removed leaving me with having to use the tabbed interface. I hate the tabs. Technically the tabed interface is MDI ’cause you can undock a tab and separate it from the application. Once you do that though you can’t minimize it (unless you minimize the entire application) and if you have a lot of windows undocked you have a window management nightmare. The undock option is useful but doesn’t replace what I’ve been forced to give up.

My main problem with the tabs is that they interupt how I work. My screen is wide enough to only show 4 or 5 tabs at a time. The rest of the tabs have to be accessed via a pull down. When a tab is activited it becomes one of the shown tabs an a previously shown tab is relegated to the pull down menu. If you go find the tab and surface it again, it gets put back on the main header but in a different place than before. If I had become used to the order they were in I now find myself clicking a bunch of tabs looking for the right one. And if I have to look in the pulldown for the right tab then I end up doing a scrambled search which reorganizes all of my tabs and componds the problem when looking for future tabs (which were no longer where I left them).

In the 2008R2 version when a query windows is minimized it is put onto the main desktop of SSMS (assuming you have the MDI option set). I could then move the minimized windows anywhere on the screen I wanted. I could group these icons by the project I was working on. When I minimize the query window it always gets put back exactly where it was before. I could easily switch between projects without getting lost. If I had to do a mad scramble for the right window maximizing and minimizing them wouldn’t completely rearrange the working order I had set and since I had the icons grouped by project I usually only had two or three windows to search. I was very efficient with the window working like this.

After I installed SQL Server 2012 I found out the tabs were gone. Back to the mad scramble I suppose. I’m not digging the new SSMS interface at all.

Parameters vs. Variables or One of These Things Is Not Like the Other

Take a look at @test in each of these code snippets.

@test in this procedure

CREATE PROCEDURE GetTestData
(@test char(1))
AS
SELECT * FROM ParamTest WHERE TestData = @test

Looks strikingly similar to @test in this code snippet

DECLARE @test char(1)
SET @test = 'A'
SELECT * FROM ParamTest WHERE TestData = @test

They look the same. They both use the same data types. The select statements use them the exact same way. So how can they be different?

SQL Server treats them differently because of how they are used. During optimization the SELECT statement using the parameter in the procedure will take advantage (disadvantage?) of a SQL Server feature known has parameter sniffing and can optimize the batch in ways that it cannot when using the variable.

To show this I need some test data so I will create a simple table:

CREATE TABLE ParamTest
(
    ParamTestID int identity(1,1)
    ,TestData char(1)
)

In this table I will load 10,000 rows – 9999 of them with TestData set to A and 1 with TestData set to Z.

DECLARE @i int
SET @i = 1
WHILE @i<10000
BEGIN
INSERT INTO ParamTest (TestData) SELECT 'A'
SET @i = @i+1
END

INSERT INTO ParamTest (TestData) SELECT 'Z'

I then will put a non-clustered index on TestData so the optimizer has multiple options for fulfilling a query which is filtered on TestData.

CREATE INDEX IX_ParamTest ON ParamTest(TestData)

Depending on the WHERE clause in your search the optimizer will choose either a table scan or an index seek/bookmark lookup to fulfill the query.
First lets constrain our query by looking for TestData that is set to A

SELECT * FROM ParamTest WHERE TestData = 'A'

We can see that with this criterion the optimizer chose a table scan.

And by looking at the properties on the select we see that optimizer estimated 9999 rows would be returned.

Now lets contrain our query by looking for TestData that is set to Z

SELECT * FROM ParamTest WHERE TestData = 'Z'

In this case the optimizer chose an index seek/bookmark lookup to fulfill the query

And that the properties show an estimate of 1 row would be returned.

So depending on the data distribution we can get a different plan. In the first case a table scan was deemed to be the quicker way to fulfill the query and in the second case an index seek/bookmark lookup.

What happens if, instead of filtering by A or Z directly we use a variable to filter.

DECLARE @test char(1)
SET @test = 'Z'
SELECT * FROM ParamTest WHERE TestData = @test

The execution plan with properties:

Notice that even though @test is set to Z and that previously Z gave us an index seek with a bookmark lookup this plan now uses a table scan. Also notice that the Estimated Number of Rows is now 5000 instead of 1. This is because when optimizing the query the value the variable is set to isn’t used by the optimizer. Without knowing specifically what to optimize for the optimizer will look at the total density of the table. Since there are only two values, A and Z in TestData the total density is .5. Half of 10,000 rows is 5000. If the TestData column had 3 values in it the total density would be .3333333 and it would estimate 3333.67 rows.

This is fine if you are looking for the value A because a table scan would be used anyway. If you are looking for the value Z though the table scan would be inefficient. The optimizer has no way of knowing what you are looking for so it must make some assumptions. The assumption is that the data is evenly distributed.

Things are different when inside of a stored procedure.

CREATE PROCEDURE GetTestData
(@test char(1))
AS
SELECT * FROM ParamTest WHERE TestData = @test

If I execute this stored procedure with  the value of Z
EXEC GetTestData 'Z'

*Note: stored procs like this can open you up to SQL Injection. Use caution.

Notice that even though the where clause was using @test and @test was set to Z, unlike the previous example where @test was a variable the optimizer is now using the value of @test in the optimization process. This is parameter sniffing. The parameters of the stored procedure can be used by SQL Server in optimizing statements that use the parameter.

This is both good and bad. The bad part is that once the plan for the stored procedure is compiled it will continue to use the same plan. A long as this execution plan is in the plan cache if you execute GetTestData with a parameter of A instead of the using the more optimal method of fulfilling the query, a table scan, an index seek and bookmark lookup will be used.
See more about plan caching here.

This can be a problem if the data in your table is widely skewed so that an execution plan for one data point isn’t optimal for another and vice versa.

There are several ways around this.
1. You can use the RECOMPILE option on the stored procedure definition. This will cause the query plan of the stored procedure to never be cached. This means every time it runs it will be compiled from scratch. This has performance ramifications. It will require CPU to compile. If the compilation of the plan uses a lot of memory you can also get bottlenecks on your SEMIPHORE which means you are limiting how often SQL Server can execute the procedure. This is not recommended.
2. You can add the RECOMPILE option to individual statements in the procedure. This means that instead of the entire procedure being compiled on each execution on the statement with the RECOMPILE option will get compiled each time. This may not have the same level of performance ramifications that using RECOMPILE on the procedure definition does.
3. If you have skewed data and know that a query plan built for Z will perform well with Z and horrible for A but know that a plan built for A will perform great with A but only so-so for Z then it might be worth always optimizing for the value the performs the best overall. You may not get the best performance every time the procedure executes but you get the best overall performance. You can do this by putting the OPTIMIZE FOR option at the statement level.
4. With the OPTIMIZE FOR option you get another option called OPTIMIZE FOR UNKNOWN. This option doesn’t try to optimize for any specific value but creates a plan utilizing the overall density. If I modify the GetTestData procedure to use the OPTIMIZE FOR UNKNOWN option I will get a plan that looks exactly like the plan created when using the variable in the ad hoc query. In fact, before the OPTMIZE FOR UNKNOWN option was added to the product in order to get this behavior from the optimizer the value of a parameter as assigned to a variable and the variable was used in the SQL statement rather than the parameter. This was often used to circumvent parameter sniffing.

Sometimes when I’m troubleshooting a stored procedure I will paste the contents into another window and run it ad hoc. I will DECLARE some variables at the start of the ad hoc batch in order to simulate the stored procedure and because they are required by statements in the batch. This can lead to execution plans of statements in the ad hoc batch being different than execution plans of statements in the stored procedure. Depending on where the bottleneck is this can cause some problems with optimizations and the resultant query plans.

The moral of this story is that things aren’t always how they look. Just because a statement has and @value being used doesn’t mean SQL will always treat it the same.

SQL Injection Info

A list of resources to look at to help understand SQL Injection.

I’ve not seen anything in blog land that describes SQL injection the way I have done so in my original post on the topic. There may be something out there but I don’t know of it.

SQL Injection falls under the rubric of security. Program Manager for the SQL Server Engine team, Il-Sung Lee, gives a talk about developing secure applications. He discusses SQL injection and other interesting things.

Understanding SQL injection requires understanding how the query processor parses a batch. Conor Cunningham gave a speech at SQLbits X about how the query processor works. Not specific about SQL injection but he does talk about the parser and the query tree.

Paul White talks specifically about the parse tree and how you can see the initial tree is. Great for inspecting how a query was parsed.

Truncate Table or “So what would say you do here?”

I’ve heard many times (and said it myself) that TRUNCATE TABLE doesn’t delete data like the DELETE statement does. It is transactional but instead of deleting the data it removes the allocations. I decided I wanted to know what it really does behind the scenes so I set up a small sample database to test and track it.

So what am I testing? I’m testing to see what happens to the pages in a SQL Server file that manage allocations when a table is truncated. The pages in question are the PFS page, the GAM page, the SGAM page and the IAM page/chain for the table. To learn more about these pages I recommend reading Paul Randal’s blog posts about them.
PFS, GAM and SGAM pages

You will also want to read up about extents.
Extents

My database, TruncateTest, has one table, dbo.mytest. This table has one column, a varchar(8000). If I use the entire 8000 characters of the varchar, I can use a single page for each record. This makes it easy to set up the database pages I need for the test.

I have inserted 16 records into dbo.mytest. The first 8 pages should get allocated to mixed extents and the second 8 pages should be given a dedicated extent. Using DBCC IND and DBCC PAGE I have been able to see what the GAM, SGAM, PFS and IAM allocations are for these records. I’m going to be looking at a lot of database pages and it may get confusing. I summarize this data at the end in a neat little chart to make it easier to grasp. The data I show comes out of SQL Server is so you know I’m not making it up.

First I want to look at the IAM page to see what extents and pages my database is using.
To find out what database page the IAM page is on I will use DBCC IND

DBCC IND ('TruncateTest','mytest',1)

IAM page

The IAM page is the page from the previous results that doesn’t have an IAM file ID (IAMFID) and an IAM page ID (IAMPID) listed. It is page 1:154. Once I have that I am going to use DBCC PAGE to look at the IAM to see which of my pages are allocated to mixed extents and which are allocated to a dedicated extent.

DBCC PAGE ('TruncateTest',1,154,3)

IAM: Single Page Allocations @0x00000000131EA08E

Slot 0 = (1:153) Slot 1 = (1:155) Slot 2 = (1:156)
Slot 3 = (1:157) Slot 4 = (1:158) Slot 5 = (1:159)
Slot 6 = (1:168) Slot 7 = (1:169)

IAM: Extent Alloc Status Slot 1 @0x00000000131EA0C2

(1:0) – (1:168) = NOT ALLOCATED
(1:176) – = ALLOCATED
(1:184) – (1:2552) = NOT ALLOCATED

The pages dbo.mytest is using (you can assum all of these start with 1:):
153, 155, 156, 157, 158, 159, 168, 169, 176, 177, 178, 179, 180, 181, 182, 183.

We can see that 153, 155, 156, 157, 158, 159, 168 and 169 are single page allocations while the extent that ranges from 1:176-1:183 is part of a dedicated allocation.

We can confirm this by looking at the PFS page (which is page 1 of file 1)

DBCC PAGE ('TruncateTest',1,1,3)

(1:152) – = ALLOCATED 0_PCT_FULL Mixed Ext
(1:153) – = ALLOCATED 100_PCT_FULL Mixed Ext
(1:154) – = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:155) – (1:159) = ALLOCATED 100_PCT_FULL Mixed Ext
(1:160) – (1:162) = ALLOCATED 0_PCT_FULL
(1:163) – (1:167) = NOT ALLOCATED 0_PCT_FULL
(1:168) – (1:170) = ALLOCATED 100_PCT_FULL Mixed Ext
(1:171) – = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:172) – (1:173) = ALLOCATED 100_PCT_FULL Mixed Ext
(1:174) – = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:175) – = ALLOCATED 100_PCT_FULL Mixed Ext
(1:176) – (1:183) = ALLOCATED 100_PCT_FULL

The pages the IAM page report as being in mixed extents are here also listed as being in mixed extents. The pages the IAM page report as being in a dedicated extent are also listed here as being in a dedicated extent, those in extent 1:176 – 1:183.

So we have the results of the looking at the IAM page for dbo.mytest and of the PFS page. There are two other pages I want to look at as well the GAM and SGAM pages.

The GAM page is page 2 of the database file

DBCC PAGE ('TruncateTest',1,2,3)

(1:0) – (1:192) = ALLOCATED
(1:200) – (1:2552) = NOT ALLOCATED

This shows that the extents that exist in the page ranges from 1:0 to 1:192 are all allocated. The minimum and maximim page numbers for dbo.mytest are 1:153 and 1:183 so all of my pages are in this allocated range. In my chart below I indicate this with a 0 because an unallocated bit in the GAM bitmap is 1. When an extent gets allocated the bit gets set to 0.

Now the SGAM page

DBCC PAGE ('TruncateTest',1,3,3)

(1:0) – (1:184) = NOT ALLOCATED
(1:192) – = ALLOCATED
(1:200) – (1:2552) = NOT ALLOCATED

The SGAM page is different than a GAM page. An unallocated extent in the SGAM bitmap is 0. When an extent is marked as allocated the bit is set to 1. When the bit is set to 1 this means that the extent is a mixed extent with a page available (go read Paul’s post if this doesn’t make sense.) In our case all of the extents are not allocated. This means that the extents are either mixed with no pages available for use or that the extent is dedicated to a single object. From the previous look at the PFS and IAM pages we know that we have mixed and dedicated extents. I’ve summarized it below in a nice little chart.

Extent Mixed Page GAM SGAM PFS IAM
1:152-159 Yes 1:153 0 0 Allocated Single Page
1:154 Allocated IAM page
1:155 Allocated Single Page
1:156 Allocated Single Page
1:157 Allocated Single Page
1:158 Allocated Single Page
1:159 Allocated Single Page
1:168-175 Yes 1:168 0 0 Allocated Single Page
1:169 Allocated Single Page
1:176-1:183 No 1:176 0 0 Allocated Extent
1:177 Allocated
1:178 Allocated
1:179 Allocated
1:180 Allocated
1:181 Allocated
1:182 Allocated
1:183 Allocated

So, now the question is, what happens to all of these when I truncate the table. Let’s find out.

TRUNCATE TABLE dbo.mytest

First I will use DBCC IND to see what pages are allocated to my table:

DBCC IND ('TruncateTest','mytest',1)

The blank above this line represents the big fat nothing that the query returned. There are apparently no pages allocated to my table. I will use DBCC PAGE to take a look at what used to be the IAM page for this table:

DBCC PAGE ('TruncateTest',1,154,3)

Slot 0 = (0:0) Slot 1 = (0:0) Slot 2 = (0:0)
Slot 3 = (0:0) Slot 4 = (0:0) Slot 5 = (0:0)
Slot 6 = (0:0) Slot 7 = (0:0)

IAM: Extent Alloc Status Slot 1 @0x00000000131EA0C2

(1:0) – (1:2552) = NOT ALLOCATED

There are no pages allocated by this IAM page anymore.

The PFS page:
(1:152) – = ALLOCATED 0_PCT_FULL Mixed Ext
(1:153) – = NOT ALLOCATED 100_PCT_FULL Mixed Ext
(1:154) – = NOT ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:155) – (1:159) = NOT ALLOCATED 100_PCT_FULL Mixed Ext
(1:160) – (1:162) = ALLOCATED 0_PCT_FULL
(1:163) – (1:167) = NOT ALLOCATED 0_PCT_FULL
(1:168) – (1:169) = NOT ALLOCATED 100_PCT_FULL Mixed Ext
(1:170) – = ALLOCATED 100_PCT_FULL Mixed Ext
(1:171) – = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:172) – (1:173) = ALLOCATED 100_PCT_FULL Mixed Ext
(1:174) – = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:175) – = ALLOCATED 100_PCT_FULL Mixed Ext
(1:176) – (1:183) = NOT ALLOCATED 100_PCT_FULL

The pages in dbo.mytest that used to show as allocated in the PFS page are now all not allocated. Page 154, which was the IAM chain is also unallocated.

The GAM page which shows the allocation of each extent:
(1:0) – (1:168) = ALLOCATED
(1:176) – = NOT ALLOCATED
(1:184) – (1:192) = ALLOCATED
(1:200) – (1:2552) = NOT ALLOCATED

The pages in the range from 1:0 to 1:168 show as still allocated. This is because they are mixed extents with some pages that are still being used by objects other than dbo.mytest. So they are still allocated. However extent 1:176 is not allocated. This extent was dedicated to dbo.mytest. It is now unallocated.

Finally the SGAM page:
(1:0) – (1:144) = NOT ALLOCATED
(1:152) – = ALLOCATED
(1:160) – = NOT ALLOCATED
(1:168) – = ALLOCATED
(1:176) – (1:184) = NOT ALLOCATED
(1:192) – = ALLOCATED
(1:200) – (1:2552) = NOT ALLOCATED

The mixed extents at 1:152 and 1:168 that held some of the pages from dbo.mytest are now showing as allocated whereas before they were not. This is because the pages related to dbo.mytest have been unallocated and those pages in the mixed extent are now available to be used by SQL Server to store pages for other objects.

An updated table of the allocations.

Extent Mixed Page GAM SGAM PFS IAM
1:152-159 Yes 1:153 0 1 Not Allocated None
1:154 Not Allocated None
1:155 Not Allocated None
1:156 Not Allocated None
1:157 Not Allocated None
1:158 Not Allocated None
1:159 Not Allocated None
1:168-175 Yes 1:168 0 1 Not Allocated None
1:169 Not Allocated None
1:176-1:183 No 1:176 1 0 Not Allocated None
1:177 Not Allocated
1:178 Not Allocated
1:179 Not Allocated
1:180 Not Allocated
1:181 Not Allocated
1:182 Not Allocated
1:183 Not Allocated

When I take a before and after shot of the data pages in dbo.mytest they all still contain the data as if they were active records. The bit patterns on the data pages are the shape of what used to be a valid record. They are essentially abandoned in the wild to fend for themselves.

The advantage of a truncate is that each page and record doesn’t have to be touched. This means that each page doesn’t have to be loaded into the buffer pool, which takes disk I/O and memory usage. It also means that the transaction doesn’t have to contain the information needed to roll the delete back if necessary (which means more I/O). The transaction for a truncate table only has to hold enough information to roll back changes to a few pages, the GAM, SGAM, PFS and IAM pages.

So, it appears that a TRUNCATE TABLE:

  • Removes the allocations from the IAM page. Even though the data pages don’t get their data removed the IAM pages do.
  • Removes the allocations from the GAM page if the extent is dedicated to the object.
  • Adds an allocation to the SGAM page to show that the mixed extent has pages available.
  • Removes the allocations from the PFS page.

Log File Shrinkage (or “I was in the pool!”)

I was having lunch with some friends the other day and one of them mentioned putting in a maintenance plan to regularly shrink the transaction log. I responded with a very quick, “You shouldn’t do that,” but didn’t have a chance to explain it. So, here is the explanation. I realize there are already tons of article and blog posts on this but if I referenced one of those I wouldn’t get to write about it.

The transaction log is necessary for the changes to your database to be atomic and durable. If some aspect of the transaction fails then the transaction log is used to know how to rollback all parts of that transaction. Also, if the changes to your database are recorded in the transaction log first they don’t have to be immediately written to disk in your data file. This gives you big gains in performance since changes to the data file can be written to disk during a big single checkpoint operation rather than bunches of little ones. If the SQL Server fails before the checkpoint, during recovery the completed transactions saved in the log can be replayed and be submitted to the database.

The log file is the most important part of the database, in my opinion, and a SQL Server database cannot function without one. Therefore, log file performance is very important to the performance of the database. Regularly shrinking the transaction log can introduce unnecessary overhead and performance problems to your SQL Server.

During the daily workload your transaction log is going to grow to a size that has the ability to handle whatever that workload is. The size of the log is affected by many things: the quantity of transactions, the time it takes to complete some transactions, how often you back up the log and database mirroring are just a few. In the natural course of a day’s work the transaction log will grow to a size that can accommodate everything it needs do. You can’t stop it from doing this. It needs to do it. It will do it or your SQL Server will stop functioning.

During this workload there will be times when only a portion of the transaction log is active. There will be other times when almost all of it is active. If you decide to shrink the transaction log when almost all of it is active, or the portion at the end of the log file is active there will be very little change in the size of the file. If you decide to shrink the log file when the beginning of the log file is active but the end of the log file is not then the inactive portions at the end of the file will be removed.

We now have a smaller log file. What happens though when the daily workload occurs that requires the same amount of log file that you needed yesterday? It wouldn’t have grown to that size unnecessarily so at some point it’s going to need to grow to that size again. This is where the unnecessary overhead comes into play. Unlike the data file which can (can, not will) grow in size without needing to initialize the space it uses the log file must zero initialize any new space it grabs. This means that it will write zeros on every bit of the new log file as it grows. Since all transactions in the database must be logged in the transaction log to be durable (able to be replayed during recovery that is) those transactions must wait for the log file to be initialized before they can be recorded in the log as complete, which means that some application out there is waiting longer than it needs to before it can continue. On a busy server this can put a lot of database activity on hold while the log file grows. If you shrink the log file on a regular basis you will encounter this extra overhead on a regular basis. The solution is to not shrink the log file. It isn’t like you can use the space saved for anything else. If you do and the transaction log can’t grow then your SQL Server will stop functioning.

Yes there are occasions when you have to do something abnormal and the log file will be grossly expanded. You might want to consider how abnormal this is. If you have to do this every month then it may be worth it to keep the log file larger. If you have to do this every year…well, maybe not, but you still have to have enough disk space for the log to expand to when you do need to perform this abnormal activity. If you know what this log expanding action is, how often you are going to need to perform it and generally how much disk you are going to need to do it then I can see shrinking the log file in the meantime. However, if you don’t know why your log file is as big as it is it probably isn’t a good idea to assume it’s in an abnormal state.

But before you decide to shrink the file between abnormal activities that cause abnormal growth there is something else about log files to be aware of – the number of virtual log files in the transaction log.

Another problem with shrinking the log file on a regular basis is that it can introduce an unhealthy number of virtual log files (or VLFs) into your log file. Internally a log file is broken up into separate portions called virtual log files. Too many (or too few) VLFs can cause problems. The number of VLFs added during each log growth is dependent on the size of that growth. Single large growths will produce a lot fewer VLFs than multiple small growths. This means that if you shrink your transaction log and let it grow back again in small chunks, which produce a large number of VLFs, you might be introducing a condition that causes performance degradation. It is better to shrink your log file once, then manually grow it in chunks so it will be the right size and have an appropriate number of VLFs. Even if you are aware of this and decide to shrink your log file on a regular basis you will still have to manual grow it every time in order to put the right number of VLFS in it. How to do that though is beyond the scope of this simple blog post though.

So, if the log is being shrunk on a regular basis then on a regular basis your database will encounter extra overhead that it didn’t need to encounter. You might also introduce another condition that causes problems, to many virtual log files. And even if you are aware of the potential problems related to having too many or too few VLFs you have introduced another ongoing maintenance task, and who wants that.

Here is a good article by Gail Shaw about how to manage the transaction log.

There are other good articles and blog posts out there. You can Google search for them.

Column Aliases (or Who Is That Masked Crusader?)

Aaron Bertrand talks about the various methods for column aliasing and which he prefers. My favorite is a variation on #4 in his list (I don’t think he has them in order of preference but numbered them to make them easier to discuss) which is surprising because of the options listed #4 is my least favorite. In the past I used to use = whenever I aliased a column in a select list but over the years I’ve slowly moved to another method.

I will repeat his list here because it’s easier to discuss it without having to constantly refer back to the original post.

SELECT 1 AS x;     -- #1
SELECT x = 1;      -- #2
SELECT 'x' = 1;    -- #3
SELECT 1 x;        -- #4
SELECT 1 AS 'x';   -- #5
SELECT 1 'x';      -- #6

I find #4 to be the least readable, especially when the select list isn’t numbers and the items selected aren’t on different lines. I think this example from AdventureWorks demonstrates the problem.

select ContactID CID,Gender MorF, EmployeeID EID
from HumanResources.Employee

At a glance with a long select list it take more time to interpret and understand. The variation I prefer is like this but using brackets to identify the alias.

select ContactID [CID],Gender [MorF], EmployeeID [EID]
from HumanResources.Employee

One of the reasons I like this method is that I don’t have to rename every column to be consistent. The real column names can also be surrounded by brakets. This makes it easier (though not as easy as Aaron’s preference) to look at a select list and see what column names will be used in the result set. For example:

select
 ContactID [CID]
,[NationalIDNumber]
,Gender [MorF]
,EmployeeID [EID]
,[BirthDate]
from HumanResources.Employee

However I’ve found that the most readable way (for me anyway for for reasons I explain at the bottom of this post) to use this method is to always us [ ] when aliasing a column and never use it when I’m not.

This makes the select list look like this

select
 ContactID [CID]
,NationalIDNumber
,Gender [MorF]
,EmployeeID [EID]
,BirthDate
from HumanResources.Employee

When I see this I know that if I see brackets the column is aliased and if not, it’s not. I suppose there could be an occasion where the brackets are necessary for the real column name in the select list which would result in two brackets in one selection. A table like this:

create table [1test] ([1ID] int)

would require a select like this:

select [1ID] from [1test]

If I wanted to alias 1ID using my convention I’d have to do:

select [1ID] [ID]
From [1test]

I can’t think of an occasion though when this has happened though since I never use a column or table name that requires the brackets in any database that I design.

I appreaciate Aaron’s use of = to alias columns. It makes for very readable code. Though I think the manner in which the alisas are vertically lined up focuses the attention on what the aliases are. For some reason my focus is more often on what columns are being selected from the underlying table (or more specifically, how much of the select list is the result of a function or some other calculation). Lining those up vertically on the left hand side helps me quickly recognize that. Yes it’s easy to look at the = example and see which columns returned are not directly from the underlying table but it is easier for me to scan down the items selected and do a quick mental calcuation of the percent of items that will come from calculations verses those which are selected directly from the underlying table. I’m not sure why this is important to me but it is.

Often when I need to find a specific item in the select list to edit or examine I will open the proc in SSMS and search for that alias and identify the right row immediately. So for me, the method I use (a variation on Aaron’s option #4) provides me with an easy way of knowing what the column names of the return list are but balances that with some other things I want to know at first glance. But I might be a little weird. That is the funny thing about preferences. There are usually some underlying reasons for them that we don’t get until we think about it.

All that being said, if I started using = I could probably adjust pretty quickly and after a while it would be second nature. I’m also not so stuck on my method that I would balk if I worked somewhere that had a coding standard requiring = or AS.

It’s always interesting to see what preferences other SQL Server developer use and why.

Getting Started with SQL on SQL Server (Part 2 – Sets, Rows and Columns)

The number one thing to understand about using SQL to get data from a database is how it operates against data and what it outputs when you are done.  To explain this I must first explain what sets, rows and columns are.  Once you know this you can start doing operations on those sets, rows and columns. You should read my first post if you have not already.  It explains the format I am taking, why I’m writing this, and some of the liberties I am taking in my terminology.

What is a set?

I find it is easier to understand sets if you have a picture to look at because it can give us an instinctual understanding of rows and columns.

An example of a set

This is a graphical representation of a set. Anyone who has worked with spreadsheets has seen this kind of graphical representation before.

A set is a collection of entities.  So then what is an entity?  An entity is a representation of a “real” thing or item.  I put real in quotes because real doesn’t have to be a physical object or an actual object that really exists though it often is.  In this set the people are conjured up to be examples.  They aren’t real people though they could be.  They represent the idea of a real person but not an actual person.  Another example is a sale.  You may record all of your sales in a database but a sale isn’t a tangible thing.  Your company can sell a tangible thing, take tangible money for that thing and transfer that thing to a tangible person.  You can even record the sale on a tangible receipt.  However the sale itself isn’t tangible even though it did take place and can be recorded in a database.

Each entity should have some piece of information that uniquely identifies it and makes it distinct from the other entities.  In this limited example first and last name handle this duty.  While in a real database you wouldn’t want to use just first and last name for this (because it is pretty easy to find two people with the same first and last name), for understanding the concept and in this limited example using first and last name is fine.

Each entity can also have attributes that describe it. In our example each person has an email address and a phone number.  So in short, an entity is a representation of a “real” thing that can be distinctly identified by some piece of information and has, associated to it, attributes which describe the entity.

I will note here that some people use the term entity to refer to a collection of things and use the term instance when talking about an individual thing in that collection.  I think that is fine but it is often counterintuitive to someone just starting out.  As this point we don’t want to get bogged down in a war about terminology. The point is to understand the concept enough to understand how SQL works.  Just know that when I used the term entity I am talking about a single thing not a collection.  I use the term set for a collection of entities.

A set is a collection of zero or more entities and this collection is organized into rows.

Rows

Every row in this example set is about a single entity. In this case the thing that the row is about is a person. Each row identifies a specific person and has information about that person.  The first row is a woman named Cassidy Griffin, whose email address is casidy22@adventure-works.com and whose phone number is 999-555-0198.  Beside her name there is the number 6046.  This is a unique semi-arbitrary number assigned specifically to her.  You can ignore this for now.  I will explain what that is for later.
The forth row is Marcus A. Powell and the email address and phone number in this row belongs to Marcus.  In general you can consider each row as a statement of fact.  There is a person named Cassidy Griffin and this row is info about her.  And there is a person named Marcus A. Powell and this row is info about him.

One of the important things to recognize is that each row represents a single entity.  Information about Cassidy doesn’t bleed over into the row that holds information about Marcus and vice-versa.  You can separate the row about Cassidy from the row about Marcus and you don’t lose any information about either.  This is important when you start talking about operations against sets.

Columns

A column is a cross section of a set for a specific attribute.  Looking at the email address column from the example above we can find all of the email addresses for every entity (or person) in our set. Recall that I said that each row is a statement of fact about something in the world.  If that is true then the columns of the set determine what information makes up that statement of fact.  As you add columns to a set you are adding more statements of fact about each entity in the set.   I could add an Address column to the set above and we could then record as a statement of fact, the address of each person.

The columns determine what type of information is going to be stored for each row and each column is type specific.  A phone number isn’t stored in a FirstName column, nor is a FirstName stored in an EmailAddress column.  Phone number columns store phone numbers only.  For each row and each column there is only one value stored.  In other words, if Cassidy has two email addresses you wouldn’t put both of them into the EmailAddress column.  If she has two phone numbers you don’t put two phone numbers in the Phone column.  Each column for each row should hold a single value. A caveat for this is that the value stored can be a value of a complex type, such as an XML string.  An XML string has a lot of different values but as a whole the XML string can be considered a single value.

The Shape of a Set

So, sets are collections of rows that all have the same columns and the columns all hold the same type of information.  I’m going to call this the shape of the set.   In our example above the set has the columns ContactID, FirstName, MiddleName, LastName, EmailAddress and Phone.   If you have two sets with the same columns of the same types then these sets have the same shape.  If you add a column to a set then you have changed the shape of the set.

A set can have zero rows or a set can have an infinite number of rows. The set in the picture above has 10 rows.  This set has 0 rows.

Set with zero rows

The number of rows that the set has doesn’t affect its shape.

What can you do with sets?

With sets you can do some very interesting things.
We can add sets together.
This set:

added to this set:

results in this set:

An example of a set

We can subtract one set from another.
This set:

An example of a set

minus this set:

results in this set:

Remember when I said that the information about one person is contained in one row and that information doesn’t bleed into another row?  The containment of that information in one row allows us to subtract sets like this.   I can subtract the rows of one set from another and not affect the remaining rows.

You can also look for where sets intersect.
The intersection of this set:

and this set:

result in this set:

The result is the two rows the sets have in common.

In all of the previous examples it is important to understand that when adding, subtracting or showing the intersection of two sets the comparison is based on all columns for a single row.  Let’s look at the intersection example again.  It will look like the previous intersection example but in the second set David Williams has a middle name of Ray.

The intersection of this set:

and this set:

results in this set:

Why does this intersection result in a single row only when the previous example resulted in two rows remaining when the two sets being checked for intersection look the same.  Because when adding, subtracting or intersecting sets all of the columns for each row are used to determine if a row in one set is the same as a row in another set.  In this example only Riley Brooks is common between the two sets being compared for where they have the same rows.   The David Williams row in each set is no longer the same because the middle name is different.

You can join sets.

This set:

An example of a set

joined to this set on ContactID:

gives you this set:

In this example I took each row from the first set, compared it with each row from the second set on a specific column and if the values matched the combination of matching rows became part of a new set.  For example ContactID of 6046 in the first set is the same as ContactID 6046 in the second set.  Since these match I will put these rows together end to end as on big row.  I do this for every row in each set until I have all the rows that match.

This is where an understanding the shape of set comes into play.  When doing addition, subtraction and intersection all of the sets involved have to have the same shape, i.e. the same columns that hold the same data types.  Addition, subtraction and intersection operations produce a new set but that set has the same shape as the older sets.  However, with joins I don’t require any of the sets to have the same shape.  The result of a join is often a set with a completely different shape than the old sets. The shape of the set in this join example has been simplified to contain all of the columns of each set however the shape of a reulting set can be dictated.

Next post I will talk about T-SQL commands that help us dictate the shape and content of a result set, SELECT and WHERE.

 

Getting Started with SQL on SQL Server (Part 1)

This is a blog post that I’ve tried to write three times now.  My intent is to provide someone new to SQL (and SQL Server) enough background and theory about SQL and the relational model that makes their transition to using SQL in their job easier than mine was but not so much theory they get overloaded and fall into a state of analysis paralysis.  There are a couple of reasons I have found this difficult to write.

  1. There is a lot of theory behind SQL that has to be glossed over in a guide that is intended to help someone to get their brains around SQL.  What should and should not be covered is, for me, a difficult decision to make.  Too much theory and it can’t be applied immediately.  Not enough theory and lessons are immediately applicable but it harder to know how to expand on them for different but related uses.
  2. SQL and the relational model aren’t the same.  They use different terminology and SQL will allow you to do things that aren’t preferred in the relational model.
  3. To go from practically zero knowledge about SQL to being able to query a database and reliably get the data back that you want requires quite a bit of upfront knowledge.
  4. In the relational database world many of the words used: entity, column, attribute, row, record, set, etc. have a nuanced definition that often depend on who you talk to.  Some are used interchangeably or synonymously by some people while others insist on strict usage of certain words for certain phenomena.  A surprising number of flame wars revolve around something so innocuous.  It is easy to get bogged down by these nuances in an attempt to please all parties.  But in doing that what was intended to be a short series of blog posts becomes a book.

To write this I had to make some decisions about what terminology to use and what theory to reveal.  In general, I have fallen to using the terminology that is used in SQL Server, table instead of relation, row or record instead of tuple, column instead of attribute, etc. In a few places I throw in my own unique vernacular as well.  It is probably things like this that got me banned from speaking at PASS.  I kid, they didn’t actually ban me.  My invitation got “lost” in the mail. C’est la vie.

The book Pro SQL Server 2008 Relational Database Design and Implementation is an excellent resource to get a better grasp on the nuances of relational vocabulary and other relational theory.  CJ Date’s book SQL and Relational Theory: How to Write Accurate SQL Code is another good source of info about relational theory and he often expounds on when relational theory and SQL differ.

However we aren’t designing a database but learning how to query one and those nuanced distinctions, while good to know, aren’t necessary or possible for a blog post sized guide.  In practice I’ve heard renowned members of the SQL Server community use many of these terms interchangeably and have yet to meet someone on the job who cares if I call a row a record or a column an attribute.  If you are a newbie to SQL Server and you run into one of the few pedantic souls who cares, don’t worry about it.  If you work for one of them you can always quit. It can also be fun to intentionally use the “wrong” word just to tweak them. You can almost always tell from the context what someone means even if they don’t use the “correct” word.

Why I am writing this.

There are some things that I wish I had been told when I first got into databases that would have made the job much easier.  For people who have been working with SQL Server for years this information will be elementary and good to read if it is late and you are out of sheep ( uh…for counting, get your mind out of the gutter.)  It’s the kind of thing you will learn by necessity as you, by trial and error, try to extract data from the database using SQL.  But for those who haven’t been using SQL for years I hope this little introduction will make the move into querying a SQL Server database a bit easier.

So this will go into the category of “Things I wish someone had told me.”  It is very basic stuff but most of us don’t start out writing great SQL queries on our first day just like we don’t start reading Shakespeare when we are four.

My first real post in this series will be about sets, rows and columns.

Performance Based Management

Andy Leonard doesn’t care for Performance Based Management.  I tend to agree.  I find this approach to be passive aggressive.  If management has a problem with an employee they should confront the employee directly.  With PBM, management can slot people into categories and blame the results on the system.  “Sorry, I didn’t want to punish you but someone had to be in the bottom 20%. It’s out of my hands.”

And then there is the collateral damage which is what I think Andy is talking about.  Even if all employees are doing a fine job someone has to be in the bottom 20% and you can be sure it won’t be the head guys. When you build a house someone has to pick up the shovel and dig.  You can’t punish the guy with the shovel because he’s not the engineer.  If everyone who is building the house is doing a good job but someone has to be labeled as the bottom 20% you can bet it won’t be the engineer or the foreman.