Category Archives: Uncategorized

Temporal Tables – Part 1 – Introduction

SQL Server 2016 has a cool new feature called Temporal Tables. This is how Microsoft describes them:

SQL Server 2016 introduces support for system-versioned temporal tables as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

In other words, the temporal table keeps track of your changes. If you query the temporal table normally you get the most current version of the data. You can query the temporal table with an AS OF indicator (or another type of indicator) and get what the data was as of the date you specify. I think this is way cool.

Here is a quick example:

CREATE TABLE dbo.Company
(
    CompanyID INT IDENTITY(1,1) NOT NULL,
    CompanyName NVARCHAR(100) NOT NULL,
    StateIncorporatedIn char(2) NOT NULL,
    FullAddress NVARCHAR(500) NOT NULL,
    Notes NVARCHAR(MAX) NULL,
    StartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
    EndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
    PERIOD FOR SYSTEM_TIME (StartTime, EndTime),
    CONSTRAINT PK_CompanyTest Primary Key CLUSTERED
(
    CompanyID
)
) WITH (SYSTEM_VERSIONING = ON);

In this table definition you can see four items that make this a temporal table. The table definition contains StartTime and EndTime columns with a GENERATED clauses to indicate these are system generated times. It contains a PERIOD clause and it contains a SYSTEM_VERSIONING clause. If you do all of these things you get a temporal table that holds a history of the changes made.

In addition to dbo.Company, behind the scenes this statement is actually creating a secondary table. The second table can be discovered by looking the history_table_id field of sys.objects catalog view to get the object id of the history table.

SELECT
    history_table_id
FROM sys.tables
WHERE
    object_id = object_id('dbo.Company')

or

SELECT
    ht.name
FROM sys.tables t
INNER JOIN sys.tables ht
    ON t.history_table_id = ht.object_id
WHERE
    t.object_id = object_id('dbo.Company');

This shows a yucky system generated name of: MSSQL_TemporalHistoryFor_885578193

We can query the primary table for the current values and we can query the yucky table for the historical data. However we can query the primary table with a clause indicating that we want the values at a certain time and SQL Server will use both tables and return the right data to you.

Here I will insert a record and then update it momentarily later.

INSERT INTO dbo.Company
(
    CompanyName,
    StateIncorporatedIn,
    FullAddress,
    Notes
)
VALUES
(
    'Dale Company',
    'NC',
    '100 Dale Way',
    'Great Company'
);

UPDATE c SET
    Notes = 'Better Company'
FROM dbo.Company c
WHERE
    CompanyName = 'Dale Company';

Looking at the two tables we see the current record and the previous record.

SELECT
    CompanyID,
    CompanyName,
    Notes,
    StartTime,
    EndTime
FROM dbo.Company;

SELECT
    CompanyID,
    CompanyName,
    Notes,
    StartTime,
    EndTime
FROM dbo.MSSQL_TemporalHistoryFor_885578193;

Theses table have a start and end date column that indicates the time period that each record was current. An end time of 9999-12-31 23:59:59.9999999 indicates this is the current record.

We can get the record at a point in time with AS OF in the FROM clause.

SELECT
    CompanyID,
    CompanyName,
    Notes,
    StartTime,
    EndTime
FROM dbo.Company FOR SYSTEM_TIME AS OF '2017-03-07 00:08:44'

To get the historical record we need to specify an AS OF value that lies within the start date and end date for that record.

SELECT
    CompanyID,
    CompanyName,
    Notes,
    StartTime,
    EndTime
FROM dbo.Company FOR SYSTEM_TIME AS OF '2017-03-07 00:08:25'

You can also get multiple rows in different time periods with other clauses such as FROM, BETWEEN and CONTAINS. These have different level of inclusiveness. In this example the SELECT clause returns both of my old and new test records.

SELECT
    CompanyID,
    CompanyName,
    Notes,
    StartTime,
    EndTime
FROM dbo.Company FOR SYSTEM_TIME FROM '2017-02-24 02:45:43' TO '2017-02-24 02:50:00'

Very cool.

If we don’t want to use the ugly name for the historical table we can choose our own. I’ll drop the table from the first example. To drop a temporal table and a history table you have to turn of system versioning first.

ALTER TABLE dbo.Company SET (SYSTEM_VERSIONING = OFF)
DROP TABLE dbo.Company
DROP TABLE dbo.MSSQL_TemporalHistoryFor_885578193

We can add the HISTORY_TABLE clause to SYSTEM_VERSIONING clause to indicate the name of the table to use for history. I am also creating this table with the clustered index on a secondary filegroup and the table on a tertiary filegroup because I want to see where the history table gets created.

CREATE TABLE dbo.Company
(
    CompanyID INT IDENTITY(1,1) NOT NULL,
    CompanyName NVARCHAR(100) NOT NULL,
    StateIncorporatedIn char(2) NOT NULL,
    FullAddress NVARCHAR(500) NOT NULL,
    Notes NVARCHAR(MAX) NULL,
    StartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
    EndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
    PERIOD FOR SYSTEM_TIME (StartTime , EndTime),
    CONSTRAINT PK_CompanyTest Primary Key CLUSTERED
(
    CompanyID
) ON [SecondaryFG]
) ON [TertiaryFG]
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CompanyHistory))

We can see from sys.tables that our tables are associated.

SELECT
    ht.name
FROM sys.tables t
INNER JOIN sys.tables ht
    on t.history_table_id = ht.object_id
WHERE
    t.object_id = object_id('dbo.Company')

What filegroup was the history table created on?

SELECT
    t.name,
    fg.name
FROM sys.tables t
INNER JOIN sys.indexes i
    ON t.object_id = i.object_id
INNER JOIN sys.filegroups fg
    on i.data_space_id = fg.data_space_id
WHERE
    t.name IN ('Company','CompanyHistory')
    AND i.index_id IN (1,0)

My history table got created on the Tertiary filegroup which is the filegroup I specified for the table not the clustered index. I don’t see anything in the CREATE TABLE documentation that allows me to choose the filegroup of the history table in the SYSTEM_VERSION clause. It is possible I missed it but to create the history table on a different filegroup than the table requires a different solution.

To do this I will create the history table independently of the the SYSTEM_VERSIONING clause and use the ALTER TABLE clause to associate the primary table to the history table.

Before I drop the tables to start again I want to look at how SQL Server handled the primary key name (since these are unique) and the identity column in the history table.

SELECT
    c.name,
    object_name(c.parent_object_id),
    c.type_desc
FROM sys.key_constraints c
INNER JOIN sys.tables t
    ON c.parent_object_id = t.object_id
WHERE
    t.name IN ('Company','CompanyHistory')

SELECT
    t.name,
    c.is_identity
FROM sys.tables t
INNER JOIN sys.columns c
    ON t.object_id = c.object_id
WHERE
t.name IN ('Company','CompanyHistory')
and c.name IN ('CompanyID')

There is no primary key constraint or identity property on the history table. I’ll leave those out of my history table definition. I’ll also strip out the GENERATED and PERIOD clauses out of the columns definitions because I discovered in testing that SQL Server won’t let me use a table with those defined as a history table.

ALTER TABLE dbo.Company SET (SYSTEM_VERSIONING = OFF)
DROP TABLE dbo.Company
DROP TABLE dbo.CompanyHistory

Create the primary table on the secondary filegroup.

CREATE TABLE dbo.Company
(
    CompanyID INT IDENTITY(1,1) NOT NULL,
    CompanyName NVARCHAR(100) NOT NULL,
    StateIncorporatedIn char(2) NOT NULL,
    FullAddress NVARCHAR(500) NOT NULL,
    Notes NVARCHAR(MAX) NULL,
    StartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
    EndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
    PERIOD FOR SYSTEM_TIME (StartTime , EndTime),
    CONSTRAINT PK_CompanyTest Primary Key CLUSTERED
(
    CompanyID
) on [SecondaryFG]
) on [SecondaryFG]

Create the history table the the tertiary filegroup.

CREATE TABLE dbo.CompanyHistory
(
    CompanyID INT NOT NULL,
    CompanyName NVARCHAR(100) NOT NULL,
    StateIncorporatedIn char(2) NOT NULL,
    FullAddress NVARCHAR(500) NOT NULL,
    Notes NVARCHAR(MAX) NULL,
    StartTime datetime2 NOT NULL,
    EndTime datetime2 NOT NULL
) on [TertiaryFG]

I’ll use the ALTER TABLE to start system versioning for the company table.

ALTER TABLE dbo.Company SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CompanyHistory))

Now verify with the metadata:

SELECT
    t.name,
    fg.name,
    t.temporal_type,
    t.temporal_type_desc
FROM sys.tables t
INNER JOIN sys.indexes i
    ON t.object_id = i.object_id
INNER JOIN sys.filegroups fg
    ON i.data_space_id = fg.data_space_id
WHERE
    t.name IN ('Company','CompanyHistory')
    AND i.index_id IN (1,0)

Since I can create the history table on a different file group this means that I should be able to it on a partitione. I’ll test that in my next post.

 

Anatomy of a hack (SQL Injection woes)

When I first heard about SQL injection attacks and learned how to prevent them I was confused.  I didn’t understand how introducing a potentially destructive string into a batch could produce different results depending on how it was introduced.  Include ‘;DROP TABLE ToDrop-- as a search condition in a WHERE clause and usually your query returns no data. Include it in the construction of a dynamic SQL string and you’ve lost not only your favorite spaghetti rum cookie recipe but everything that was in the table that just got dropped.

Eventually I learned. The reason the string fragment is sometimes dangerous and other times benign is that the string gets introduced to the query processor at different times and, depending on how and when it is introduced, it is treated differently.

The compilation process follows a basic pattern of parsing the T-SQL batch, binding your statements to real objects, optimizing the statements to produce a good-enough execution plan and, finally, executing the query.  Understanding the entire compilation pipeline isn’t necessary to understand SQL injection attacks though. The important thing to know is that during the compilation stage the T-SQL gets parsed by the Query Processor. If someone can introduce an errant string into your T-SQL code text before it is parsed they can modify how your T-SQL code is interpreted.  The parsing of the batch is the danger zone of SQL injection attacks.

A batch can contain multiple independent SQL Statements.  SQL injection attacks take advantage of this by inserting another SQL statement into the batch as it is getting compiled so that will be executed as if it were supposed to be there. Executing a batch that contains two simple SQL statements can look something like this:

SELECT * FROM Test WHERE Col1 = 1;SELECT * FROM Test WHERE Col2 = 'aa'
GO

This produces an execution plan that looks like this:

What this shows us is that two independent SQL Statements were executed as part of a single batch. It also shows us that a carriage return isn’t necessary to separate the two SQL statements.   In T-SQL the semi-colon functions as a statement separator.

Now, imagine giving an external user access to SSMS and the ability to write the commands directly into a query window that has access to your database.  When we use EXEC and dynamic SQL we risk that exact scenario. The EXEC command creates a batch that is independent of the batch that launches it.  This batch follows the same basic compilation process of every other batch with parsing being the first step.  This query, plus a peek at the plan cache, demonstrates that this is the case.

DECLARE @stmt varchar(100)
SET @stmt = 'SELECT * FROM dbo.Test WHERE Col1 = 1'
SELECT * FROM dbo.Test WHERE Col1 = 1
EXEC(@stmt)

Line 3 is the batch created by the T-SQL shown above.  Line 2 is the independent batch created by the EXEC command in that T-SQL.

Knowing that when parsing a batch SQL Server treats the semi-colon as a statement separator and that the EXEC command will result in a new batch being created is the key to understanding how SQL injections attacks work. If we allow an outside individual to affect the string that gets fed to the EXEC command that individual can inject syntax that modifies how the batch is parsed and, ultimately, what gets executed. They can do this if we create a dynamic SQL statement that directly incorporates a string which is passed to us from some web form or win32 application.  This sample simulates this scenario.

DECLARE @inject varchar(100)
DECLARE @var char(1)
DECLARE @stmt varchar(100)
SET @var = '1'
SET @inject = ''''';DROP TABLE ToDrop--'
SET @stmt = 'SELECT * FROM dbo.Test WHERE Col2 = ' + @inject + ' AND Col1 = ' + @var
EXEC(@stmt)

In this sample I am declaring @inject and @var as variables but these could easily be input parameters of an encapsulating stored procedure.  When @inject and @var get substituted into @stmt the resulting string becomes:

SELECT * FROM dbo.Test WHERE Col2 = '';DROP TABLE  ToDrop-- AND Col1 = 1

This creates a T-SQL batch that when passed into EXEC is parsed like every other batch submitted to SQL Server.  The semi-colon causes what was intended to be a single command to be separated into two separate commands.  The -- at the end of the injection causes the remainder of the command to be treated as comments. This keeps the injected string from causing a syntax error since there could easily be more T-SQL in the string whose syntax is no longer compatible with the newly formed T-SQL.  I simulate this by adding a another search condition. If this second statement is syntactically correct and binds to objects that really exist in the database, it will execute.  The query plan produced by the sample above looks like this:

Notice -- AND Col1 = 1 at the end of Query 2.  If AND Col1 = 1 had been added to the end of DROP TABLE ToDrop there would have been a syntax error.

So how do we protect ourselves from SQL injection attacks?  If we must use dynamic SQL then using sp_executesql is the key. Rewriting the previous SQL statement using sp_executesql looks something like this:

DECLARE @inject nvarchar(100)
DECLARE @var int
DECLARE @stmt nvarchar(100)
DECLARE @paramdef nvarchar(500)
SET @var = N'1'
SET @inject = N''''';DROP TABLE ToDrop<code>--</code>'
SET @paramdef <a href="mailto:=N'@param1">=N'@param1</a> varchar(100),@param2 int'
SET @stmt = N'SELECT * FROM dbo.Test WHERE Col2 = @param1  AND Col1 = @param2'
EXEC sp_executesql @stmt,@paramdef,@param1 = @inject,@param2=@var

When the statement in @stmt is parsed during compilation @param1 and @param2 are practically ignored.  SQL Server understands that these are placeholders for values that will be included later and produces a parameterized query.

Notice there is no semi-colon separating two statements in the batch.  There is no drop table statement. And notice the execution plan is safe.

When this query is finally executed the values in @inject and @var are substituted for @param1 and @param2.  The previously dangerous string is now introduced as a simple search condition that returns no results because there is no record where Col2 = ”;DROP TABLE ToDrop--

In the sp_executesql example there is no direct path from @inject to the batch.  It is never included during the parsing of the select statement and thus cannot be injected into the batch. This doesn’t mean that sp_executesql by itself is the panacea for all injection woes.  It must be used correctly.  It is possible to execute the unsafe statement from the first example with sp_execute as well.  To make it safe you must use parameters properly.

If you don’t want to use parameterized queries but still want to be protected from SQL Injection you can use QUOTENAME and REPLICATE to sanitize your input. Examples of that are here.

Inner and Outer Joins (and Cartesian joins)

Joins are operations performed on two sets of data. Using a join we can pull data from one set, associate it to data pulled from another set and then present the resulting joined set to the user and a single data set.  For example: suppose we have basic customer information in one table (name, marital status, etc.) and customer addresses in another table.  We can pull the basic customer information from one table, pull the address information from another table join them together and present them to the user as a single result set that contains both basic customer info and addresses. At this point you might be asking yourself why we would store the basic customer information in a different table than the customer addresses.  That is a good question but won’t be discussed here.

The example I’m using here is for an order tracking system. These tables are extremely scaled down for the purposes of this example but they should suffice in this case.

We have two tables: Orders and ShipMethods.    The Orders table contains, well, orders.

The OrderID is the primary key, Shipped is bit column, 0 means the order hasn’t shipped, 1 means it has.  ShipMethodID is a foreign key to the ShipMethods table.  OrderDesc is self-explanatory.

The ShipMethods table contains the methods by which orders can have been shipped.

The ShipMethodID is the primary key. The ShipMethodID in the Orders table corresponds with a ShipMethodID in this table. We know Order1 was shipped view UPS and Order3 was shipped via Fedex.  We also know that USPS and Bob’s Elephant Freight Shipping company, shortened here to Bob’s Freight (what you’ve never needed to ship and elephant??) have not been used for shipping an order because ShipMethodIDs 3 and 4 are not in the Orders table ShipMethodID field.

Using these two tiny tables we can explain the difference between an inner, outer  and Cartesian joins.

Remember – on a basic level all a join does it pull data from two different data sets, associate them to one another  and present them to user as a single combined data set.  How the data is associated is controlled by the join type.

A Cartesian join pulls each row of data from the first table and associates it with each row from of data from the second table.  In other words, every row from table one gets associated to every row from table two. Using the Orders and ShipMethods table from above this is the result of a Cartisian join:

As you can see by looking down the list Order1 is associated with each and every ship method.  As you can also see UPS is associated with each and every order.  The same is true from all Orders and all ShipMethods.  This is also known as a cross join in SQL Server. In this case a Cartesian join doesn’t make much business sense.  While the join itself is accurate (each row in table one is associated to each row in table two) the business use of this, in this case, is questionable. I put the Cartesian join example first because I think it helps one understand what a join is actually doing and to better understand inner and outer joins.

An inner join is different because it adds a constraint to how the data is joined together.  Rather than have every row in one table joined to every row in another table we can use it to constrain what rows in table one are joined to what rows in table two.

The Orders table contains a ShipMethodID column that contains a number which (not coincidentally) matches to a number in the ShipMethodID column from the ShipMethods table.

Using these two columns and an inner join we can pull data from Orders, associate it to data from ShipMethods but only do so where the ShipMethodID values from each row in each table match.  You can use the Cartesian join from above two predict what the outcome of the inner join will be by looking at the rows where the ShipMethodIDs are the same.  There should be four rows in the result set.

And there they are.

An outer join still constrains how the two data sets are joined together but it is less restrictive.  Looking at the results of the Cartesian join you will notice that Order5 and Order6 will never show up in an inner join of Orders with the ShipMethods table.  Similarly ship methods USPS and Bob’s Elephant Freight Shipping would not show up either.  This is because the ShipOrderIDs for Order5 and Order6 has not been specified and because no Orders specified USPS and Bob’s Elephant Freight Shipping as ship methods. An outer join lessens the constraining nature of the join and will allow those to show up under certain conditions.

There are three basic types of outer joins: left outer join, right outer join and full outer join.

A left outer join will take all of the rows from the left table but only the matching rows from the right table.

You’ll notice that for the rows in the left table for which there are no corresponding records from the right table NULL is returned.

So what is the left table?  The query I used to get this result set was:

SELECT * FROM
Orders O LEFT OUTER JOIN ShipMethods S
ON O.ShipMethodID = S.ShipMethodID

The left table is Orders.   Unlike the inner join which only showed Orders that had a ShipMethodID with a corresponding ShipMethodID in the ShipMethods table this one shows all Orders even ones without a ShipMethodID with a corresponding ShipMethodID in the ShipMethods table.

Now this same query with a right outer join

SELECT * FROM
Orders O RIGHT OUTER JOIN ShipMethods S
ON O.ShipMethodID = S.ShipMethodID

Returns these results:

Notice in this has returned all rows from the ShipMethods table plus those rows in the Orders table with a corresponding ShipMethodID value.

We could get the same results with a left outer join if we change the order that we specify the tables in the join.

SELECT * FROM
ShipMethods S LEFT OUTER JOIN Orders O
ON O.ShipMethodID = S.ShipMethodID

gives us this result:

The left table is now the ShipMethods table. The order the columns are displayed in is different but the data is the same.

Finally there is the full outer join.  The full outer join is like having an inner join, left outer join and right outer joined combined.  You get all of the records from the two datasets with corresponding ShipMethodID values.  You get all of the records from the left table even if there are no corresponding rows in the right table.  And you get all of the rows from the right table even if there are no corresponding rows in the left table.

The results look like this:

That is a not so brief explanation of inner and outer joins.

Temporary Table Existence and Scope

To find the existence of a temp table you can use the OBJECT_ID() function.

If the OJBECT_ID function returns a value then the table exists.  This works for both local and global temporary tables.  This example uses the AdventureWorks test database but any database will do.

--setup
--test to see if the temp tables already exist before trying to create them.
--creating a table that already exists raises an error.
IF NOT (object_id('TempDb..#Test1','U') IS NOT NULL)
CREATE TABLE #Test1 (ID int, Data varchar(10))

IF NOT (object_id('TempDb..##Test2','U') IS NOT NULL)
CREATE TABLE ##Test2 (ID int, Data varchar(10))

--test
--if they exist return the object_id of the table.
IF (object_id('TempDb..#Test1','U') IS NOT NULL)
SELECT 'Local Temp Table Object ID',(OBJECT_ID('TempDB..#Test1'))

IF (object_id('TempDb..##Test2','U') IS NOT NULL)
SELECT 'Global Temp Table Object ID',(OBJECT_ID('TempDB..##Test2'))

--cleanup
--make sure the tables exist before we drop them.
--dropping a non existent table raises an error.
IF (object_id('TempDb..#Test1','U') IS NOT NULL)
DROP TABLE #Test1

IF (object_id('TempDb..##Test2','U') IS NOT NULL)
DROP TABLE ##Test2

To test the scope, run the create and test portions of the script in one query window and the test portion in a second query window. The global temp table is found but the temp table that was local to the other connection is not.

Though not required the object_type parameter of the OBJECT_ID function will help you determine if the temporary object is type of object you think it is.

This also creates a temporary object called #TestProc:

CREATE PROCEDURE #TestProc
AS
SELECT 1

Test this temporary stored procedure with:

EXEC #TestProc

Using the OBJECT_ID function determines that this object does exist:

IF (object_id('TempDb..#TestProc') IS NOT NULL)
SELECT object_id('TempDb..#TestProc')

However if you specify that you are looking for objects that are temporary tables with that name specifying the object type will make it appear that an object named #TestProc does not exist:

IF (object_id('TempDb..#TestProc','U') IS NOT NULL)
SELECT object_id('TempDb..#TestProc')

Whether you use the object_type parameter really depends on what you are trying to do. Since two temp objects in the same scope can’t exist with the same name (even if they are different object types) finding out if an object with the same name exists, regardless of what type of object it is, might be important to you.