Monthly Archives: February 2012

Log File Shrinkage (or “I was in the pool!”)

I was having lunch with some friends the other day and one of them mentioned putting in a maintenance plan to regularly shrink the transaction log. I responded with a very quick, “You shouldn’t do that,” but didn’t have a chance to explain it. So, here is the explanation. I realize there are already tons of article and blog posts on this but if I referenced one of those I wouldn’t get to write about it.

The transaction log is necessary for the changes to your database to be atomic and durable. If some aspect of the transaction fails then the transaction log is used to know how to rollback all parts of that transaction. Also, if the changes to your database are recorded in the transaction log first they don’t have to be immediately written to disk in your data file. This gives you big gains in performance since changes to the data file can be written to disk during a big single checkpoint operation rather than bunches of little ones. If the SQL Server fails before the checkpoint, during recovery the completed transactions saved in the log can be replayed and be submitted to the database.

The log file is the most important part of the database, in my opinion, and a SQL Server database cannot function without one. Therefore, log file performance is very important to the performance of the database. Regularly shrinking the transaction log can introduce unnecessary overhead and performance problems to your SQL Server.

During the daily workload your transaction log is going to grow to a size that has the ability to handle whatever that workload is. The size of the log is affected by many things: the quantity of transactions, the time it takes to complete some transactions, how often you back up the log and database mirroring are just a few. In the natural course of a day’s work the transaction log will grow to a size that can accommodate everything it needs do. You can’t stop it from doing this. It needs to do it. It will do it or your SQL Server will stop functioning.

During this workload there will be times when only a portion of the transaction log is active. There will be other times when almost all of it is active. If you decide to shrink the transaction log when almost all of it is active, or the portion at the end of the log file is active there will be very little change in the size of the file. If you decide to shrink the log file when the beginning of the log file is active but the end of the log file is not then the inactive portions at the end of the file will be removed.

We now have a smaller log file. What happens though when the daily workload occurs that requires the same amount of log file that you needed yesterday? It wouldn’t have grown to that size unnecessarily so at some point it’s going to need to grow to that size again. This is where the unnecessary overhead comes into play. Unlike the data file which can (can, not will) grow in size without needing to initialize the space it uses the log file must zero initialize any new space it grabs. This means that it will write zeros on every bit of the new log file as it grows. Since all transactions in the database must be logged in the transaction log to be durable (able to be replayed during recovery that is) those transactions must wait for the log file to be initialized before they can be recorded in the log as complete, which means that some application out there is waiting longer than it needs to before it can continue. On a busy server this can put a lot of database activity on hold while the log file grows. If you shrink the log file on a regular basis you will encounter this extra overhead on a regular basis. The solution is to not shrink the log file. It isn’t like you can use the space saved for anything else. If you do and the transaction log can’t grow then your SQL Server will stop functioning.

Yes there are occasions when you have to do something abnormal and the log file will be grossly expanded. You might want to consider how abnormal this is. If you have to do this every month then it may be worth it to keep the log file larger. If you have to do this every year…well, maybe not, but you still have to have enough disk space for the log to expand to when you do need to perform this abnormal activity. If you know what this log expanding action is, how often you are going to need to perform it and generally how much disk you are going to need to do it then I can see shrinking the log file in the meantime. However, if you don’t know why your log file is as big as it is it probably isn’t a good idea to assume it’s in an abnormal state.

But before you decide to shrink the file between abnormal activities that cause abnormal growth there is something else about log files to be aware of – the number of virtual log files in the transaction log.

Another problem with shrinking the log file on a regular basis is that it can introduce an unhealthy number of virtual log files (or VLFs) into your log file. Internally a log file is broken up into separate portions called virtual log files. Too many (or too few) VLFs can cause problems. The number of VLFs added during each log growth is dependent on the size of that growth. Single large growths will produce a lot fewer VLFs than multiple small growths. This means that if you shrink your transaction log and let it grow back again in small chunks, which produce a large number of VLFs, you might be introducing a condition that causes performance degradation. It is better to shrink your log file once, then manually grow it in chunks so it will be the right size and have an appropriate number of VLFs. Even if you are aware of this and decide to shrink your log file on a regular basis you will still have to manual grow it every time in order to put the right number of VLFS in it. How to do that though is beyond the scope of this simple blog post though.

So, if the log is being shrunk on a regular basis then on a regular basis your database will encounter extra overhead that it didn’t need to encounter. You might also introduce another condition that causes problems, to many virtual log files. And even if you are aware of the potential problems related to having too many or too few VLFs you have introduced another ongoing maintenance task, and who wants that.

Here is a good article by Gail Shaw about how to manage the transaction log.

There are other good articles and blog posts out there. You can Google search for them.

Column Aliases (or Who Is That Masked Crusader?)

Aaron Bertrand talks about the various methods for column aliasing and which he prefers. My favorite is a variation on #4 in his list (I don’t think he has them in order of preference but numbered them to make them easier to discuss) which is surprising because of the options listed #4 is my least favorite. In the past I used to use = whenever I aliased a column in a select list but over the years I’ve slowly moved to another method.

I will repeat his list here because it’s easier to discuss it without having to constantly refer back to the original post.

SELECT 1 AS x;     -- #1
SELECT x = 1;      -- #2
SELECT 'x' = 1;    -- #3
SELECT 1 x;        -- #4
SELECT 1 AS 'x';   -- #5
SELECT 1 'x';      -- #6

I find #4 to be the least readable, especially when the select list isn’t numbers and the items selected aren’t on different lines. I think this example from AdventureWorks demonstrates the problem.

select ContactID CID,Gender MorF, EmployeeID EID
from HumanResources.Employee

At a glance with a long select list it take more time to interpret and understand. The variation I prefer is like this but using brackets to identify the alias.

select ContactID [CID],Gender [MorF], EmployeeID [EID]
from HumanResources.Employee

One of the reasons I like this method is that I don’t have to rename every column to be consistent. The real column names can also be surrounded by brakets. This makes it easier (though not as easy as Aaron’s preference) to look at a select list and see what column names will be used in the result set. For example:

 ContactID [CID]
,Gender [MorF]
,EmployeeID [EID]
from HumanResources.Employee

However I’ve found that the most readable way (for me anyway for for reasons I explain at the bottom of this post) to use this method is to always us [ ] when aliasing a column and never use it when I’m not.

This makes the select list look like this

 ContactID [CID]
,Gender [MorF]
,EmployeeID [EID]
from HumanResources.Employee

When I see this I know that if I see brackets the column is aliased and if not, it’s not. I suppose there could be an occasion where the brackets are necessary for the real column name in the select list which would result in two brackets in one selection. A table like this:

create table [1test] ([1ID] int)

would require a select like this:

select [1ID] from [1test]

If I wanted to alias 1ID using my convention I’d have to do:

select [1ID] [ID]
From [1test]

I can’t think of an occasion though when this has happened though since I never use a column or table name that requires the brackets in any database that I design.

I appreaciate Aaron’s use of = to alias columns. It makes for very readable code. Though I think the manner in which the alisas are vertically lined up focuses the attention on what the aliases are. For some reason my focus is more often on what columns are being selected from the underlying table (or more specifically, how much of the select list is the result of a function or some other calculation). Lining those up vertically on the left hand side helps me quickly recognize that. Yes it’s easy to look at the = example and see which columns returned are not directly from the underlying table but it is easier for me to scan down the items selected and do a quick mental calcuation of the percent of items that will come from calculations verses those which are selected directly from the underlying table. I’m not sure why this is important to me but it is.

Often when I need to find a specific item in the select list to edit or examine I will open the proc in SSMS and search for that alias and identify the right row immediately. So for me, the method I use (a variation on Aaron’s option #4) provides me with an easy way of knowing what the column names of the return list are but balances that with some other things I want to know at first glance. But I might be a little weird. That is the funny thing about preferences. There are usually some underlying reasons for them that we don’t get until we think about it.

All that being said, if I started using = I could probably adjust pretty quickly and after a while it would be second nature. I’m also not so stuck on my method that I would balk if I worked somewhere that had a coding standard requiring = or AS.

It’s always interesting to see what preferences other SQL Server developer use and why.