Written by Aaron Crouch
From a Business Intelligence and Data Warehousing point of view, the upcoming release of SQL Server 2008 presents us with several new tools designed to aid the B.I. developer with loading, maintaining and reporting on dimensional data warehouses. Though much of the functionality is new to the development world as a whole, many ideas were”borrowed” from other successful DBMS’s and B.I. entities.
I, myself, started out my development life in the land of Oracle where King Ellison and his subjects worked closely with some of the fathers of modern data warehousing to make tangible many of their great ideas. Quite honestly these ideas were leaps and bounds ahead of any other mainstream DBMS on the market, including Microsoft’s new SQL Server 7.
Technologies such as materialized views, table based partitioning, and Rollup and Cube aggregations directly embedded into the SQL worked great but weren’t the easiest technologies to implement thanks to Oracle’s apparent die-hard belief that all things technological must be incredibly unintuitive. Nevertheless, after making the jump into SQL Server (2000), I found myself longing for many of these great tools again and wondering when Microsoft would get on the “B.I. ball”.
When SQL Server 2005 was introduced, my excitement level with Microsoft DBMS’s sky-rocketed and many of the “missing” tools were now back in play for me. However, one technology that impressed me from the Oracle days, automated change data capture, was still missing. Enter SQL Server 2008, and yes, I have change data capture again! But wait… I still do most of my database development in a SQL Server 2005 environment… now what?
There are several shrink wrapped packages that can track changed data in a SQL 2005 environment, most of which rely on monitoring the transaction logs. These applications tend to be expensive, though, especially for smaller businesses. The benefit of this approach is that, for the most part, database performance is not impacted. But what happens when the data files are reorganized through truncation or shrinking? This is a fairly standard practice exercised by most DBA’s for the purpose of optimization. Yet, should the shrinkage take place before the CDC utility has a chance to scan the logs, much of the change data could be lost before it’s utilized.
Another alternative, and the one which I will demonstrate in this entry, relies on triggers.Triggers can be inherently dangerous. However, used correctly they can be of great aid.The same can be said of cursors and dynamic SQL, all of which, if used incorrectly, can cause optimization issues, but all of which I actually employ in the following technique.Are there better techniques? Depending on your circumstances, I imagine so, though I have found that this one works very well in practice and have used it multiple times with success.
First, we’ll want to identify which tables will need to be tracked. Obviously this approach introduces a certain amount of overhead so the fewer the better. These candidate tables will eventually have the logging triggers applied to them.
Next we’ll want to set up the table whose responsibility it is to log the changed data. The table will look like this:
CREATE TABLE [audit].[Audit](
[Audit_PK] [bigint] IDENTITY(1,1) NOT NULL,
[TableName] [varchar](50) NOT NULL,
[ColumnName] [varchar](50) NOT NULL,
[RecordPKey] [bigint] NOT NULL,
[RecordAction] [tinyint] NOT NULL,
[NewValue] [varchar](50) NOT NULL,
[OldValue] [varchar](50) NULL,
[DeletedRecord] [varchar](1000) NOT NULL,
[LastActionDate] [datetime] NOT NULL,
[LastActionUser] [varchar](50) NOT NULL
) ON [PRIMARY]
Now that our auditing data repository is created, we need to develop the trigger(s) that will reside on the table(s) in which we want to capture changed data. This trigger will need to handle Insert, Update, and Delete events as we want to track all 3.
CREATE TRIGGER [dbo].[tblMAIN_BillToAuditIUD] ON [dbo].[tblMAIN_BillTo] FORINSERT,UPDATE,DELETE
AS
– Declare Variables
DECLARE @ParentTable VARCHAR(50)
DECLARE @PriKey VARCHAR(50)
DECLARE @ColumnsUpdated VARBINARY(100)
DECLARE @ColUPD VARCHAR(50)
DECLARE @LastUpdateDate VARCHAR(20)
DECLARE @LastUpdateUser VARCHAR(50)
DECLARE @sql VARCHAR(1000)
DECLARE @EventType TINYINT
We will want to capture the name of the user as well as the date in which they fired off the DML against our tracked table. This information is really only useful for logging and auditing purposes so if we’re watching this table strictly for gathering CDC data, we can leave this bit of logic out.
SET @LastUpdateDate = CAST(GETDATE() AS VARCHAR(20))
SET @LastUpdateUser = SYSTEM_USER
Notice that we’re changing the output of the GETDATE() function to a string. This is, again, because we’ll need to incorporate it into a dynamic SQL statement which we will be building along the way.
Next up, we’ll want to identify which table fired off the trigger and figure out what column the Primary Key is. For the purpose of this example, we’ll assume that the Primary Key is a single identity field but this can be altered for different situations and needs.
SELECT
@ParentTable = object_name(parent_obj)
FROM
sysobjects
WHERE
id = @@procid
SELECT
@PriKey = kcu.Column_Name
FROM
Information_Schema.Table_Constraints tc
INNER JOIN Information_Schema.Key_Column_Usage kcu
ON tc.Constraint_Name = kcu.Constraint_Name
WHERE
tc.Constraint_Type = ‘PRIMARY KEY’
AND tc.Table_Name = @ParentTable
Now we need to know what type of DML just took place, whether an insert, update, or a delete action. To do this we’ll have to look at SQL’s Inserted and Deleted memory resident tables. To figure out which action took place we’ll check for the existence of records in both of these tables. If records exist in the Inserted table but not in Deleted, then we can surmise that an insert action took place. If there are records in the Deleted table but not the Inserted table, then a record deletion occurred. Following, if records are present in both tables, then we know that an update statement was issued.
IF EXISTS(SELECT * FROM Inserted)
IF EXISTS(SELECT * FROM Deleted)
SELECT @EventType = 0
ELSE
SELECT @EventType = 1
ELSE
SELECT @EventType = 2
So, if an insert took place, build an Insert statement to process our new changed data information into our audit table. In this case, we store the newly assigned Primary Key so that we can look it up in the source table if it becomes necessary.
IF @EventType = 1
BEGIN
– Pull contents of “Inserted” pseudo table into temp
– tables for use in dynamic SQL
SELECT * INTO #ins FROM Inserted
– Create and execute Audit table insert statement
SET @sql = ‘INSERT INTO dbo.audit SELECT ”’ + @ParentTable + ”’, ”-New Record-”, pt.’+ @PriKey +
‘, 1, ”-New Record-”, ”-New Record-”, ”N/A”, ”’ + @LastUpdateDate +
”’, ”’ + @LastUpdateUser + ”’ FROM ‘ + @ParentTable + ‘ pt INNER JOIN #ins2 ins ON pt.’+ @PriKey +
‘ = ins.’ + @Prikey
EXEC (@sql)
END
We must also check for a deletion action with the following code. Note that, for auditing purposes, we have captured the entire contents of the deleted record as XML. This allows us to use the same trigger logic on any table and to capture and store the entire deleted record without altering our CDC/audit repository table. This can then be queried with simple xquery syntax later on.
IF @EventType = 2
BEGIN
– Pull contents of “Deleted” pseudo table into temp
– tables for use in dynamic SQL
SELECT * INTO #dlt FROM DELETED
– Create and execute Audit table delete statement
SET @sql = ‘INSERT INTO dbo.audit SELECT ”’ + @ParentTable + ”’, ”-Record Deleted-”, dlt.’ + @PriKey +‘, 2, ”-Record Deleted-”, ”-Record Deleted-”, (SELECT * FROM #dlt2 Where #dlt2.’ +@PriKey +‘ = dlt.’ + @Prikey + ‘ FOR XML AUTO), ”’ + @LastUpdateDate + ”’, ”’ + @LastUpdateUser+
”’ FROM #dlt2 dlt’
EXEC (@sql)
END
Last we need to check for any columns that have been updated. This presents a bit more difficulty but thanks to a few not so well known T-SQL functions we can take care of it relatively easily. Once we identify that an update has indeed taken place, we have to figure out which column or columns were updated and then get the old and new values to place in our changed data/audit table.
The Columns_Updated function returns a bit map indicating which columns were updated.What that means is that it will return a single byte representing the 1st eight columns in a table… 1 bit per column. If a column is updated, its mapped bit is flipped to a 1. So, as an example, if I have a table with 4 columns and the Columns_Updated function returns “100100” we know that the first and fourth columns have been updated.
Next we’ll need to “decode” our bitmap. For this we’ll use the fn_IsBitSetInBitmask system function. By passing the generated bit mask in as well as the table name we’re looking at, it will return the column names. These column names are then fed into a cursor where we loop through them to create the insert statements for the audit/cdc table… 1 record for each updated column. We’ll capture the old and new values as well as the primary key for the updated record.
IF @EventType = 0
BEGIN
– Prepare the COLUMNS_UPDATED function
SET @ColumnsUpdated = COLUMNS_UPDATED()
– Pull contents of “Inserted” and “Deleted” pseudo tables into temp
– tables for use in dynamic SQL
SELECT * INTO #ins FROM Inserted
SELECT * INTO #dlt FROM Deleted
– Loop through each column getting updated
DECLARE cCols CURSOR READ_ONLY
FOR
SELECT
Column_Name
FROM
Information_Schema.Columns Field
WHERE
Table_Name = @ParentTable
AND sys.fn_IsBitSetInBitmask (@ColumnsUpdated,
COLUMNPROPERTY(OBJECT_ID(Table_Schema + ‘.’ + Table_Name),
Column_Name, ‘ColumnID’)) <> 0
OPEN cCols
FETCH NEXT FROM cCols
INTO @ColUPD
WHILE @@FETCH_STATUS = 0
BEGIN
– Create and execute Audit table insert statement
SET @sql = ‘INSERT INTO dbo.audit SELECT ”’ + @ParentTable + ”’, ”’ + @ColUPD +
”’, pt.’ + @PriKey + ‘, 0, ins.’ + @ColUPD + ‘, dlt.’ + @ColUPD + ‘, ”N/A”, ”’ +@LastUpdateDate +
”’, ”’ + @LastUpdateUser + ”’ FROM ‘ + @ParentTable + ‘ pt INNER JOIN #ins ins ON pt.’ +@PriKey +
‘ = ins.’ + @Prikey + ‘ INNER JOIN #dlt dlt ON pt.’ + @PriKey + ‘ = dlt.’ + @Prikey
EXEC (@sql)
END
FETCH NEXT FROM cCols
INTO @ColUPD
END
CLOSE cCols
DEALLOCATE cCols
END
We are now capturing any and all updates, deletes, and inserts against our tracked table.To use this approach for change data capture, we would simply need to carry the keys of the altered records into our star schema. We could then apply the DML here and finally truncate/archive our CDC table, getting it ready for the next round of changes.Alternatively if we’re using this approach for logging and auditing, we would simply continue to amass the information and present it via reports, etc.
An additional technique I’ve employed, in conjunction with the above, is to create a stored procedure which will build the triggers automatically when passed a table name.This is helpful when tracking changed data from many tables.