Loading Hierarchical Data Using a WHILE EXISTS

Last time, we discussed three methods for inserting data into a destination table that has a recursive relationship. In this article we’re going to cover how to insert data into that table without vi0lating the foreign key constraint by using a WHILE EXISTS loop.

If you want to follow along with the examples, you’re going to need my Staging table and my Destination table.

CREATE TABLE Staging (
    AccountID UNIQUEIDENTIFIER NOT NULL 
    , Name NVARCHAR(160) NOT NULL
    , ParentAccountId UNIQUEIDENTIFIER NULL
    , ParentName NVARCHAR(160) NULL
    , BatchID INT
    , ConversionStatus INT
)
--truncate table AccountBaseDestination
CREATE TABLE Destination (
    AccountID UNIQUEIDENTIFIER NOT NULL 
    , Name NVARCHAR(160) NOT NULL
    , ParentAccountId UNIQUEIDENTIFIER NULL
    , ParentName NVARCHAR(160) NULL
    , CONSTRAINT [cndx_PrimaryKey_Account] PRIMARY KEY CLUSTERED ([AccountId] ASC)
)
ALTER TABLE Destination  ADD  CONSTRAINT fk_Destination_Destination__ParentAccountID_AccountID
     FOREIGN KEY(ParentAccountId)
REFERENCES Destination ([AccountId])

Now let’s put in some simple hierarchical data into our staging table. Basically I insert a row, grab the GUID created, and use that for the parentAccountID for the next row. That way each child will have the right GUID assigned for it’s parent. Otherwise, the demo breaks!

--demo: several layers deep

--first: empty the staging table
TRUNCATE TABLE Staging
--we need temp storage for an inserted AccountID to use as a parent account id
DECLARE @ParentAccountID UNIQUEIDENTIFIER
--next we need a table variable to trap inserted uniqueidentifers
DECLARE @InsertedAccount TABLE (
	AccountID UNIQUEIDENTIFIER
)

--Let's insert a root node, and output the inserted Account ID
INSERT INTO Staging
(AccountID, Name)
OUTPUT inserted.AccountID INTO @InsertedAccount	
SELECT 
	NEWID() as accountID
	, 'Root Node' as Name

--I know we'll just have one for this demo, so TOP 1
SELECT TOP 1 
	@ParentAccountID = AccountID 
FROM @InsertedAccount	
--clear out the @InsertedAccount before the next insert
--otherwise you'll have Multiple ParentAccountIDs to deal with.
DELETE FROM	@InsertedAccount

INSERT INTO Staging
(AccountID, Name, ParentAccountID, parentName)	
OUTPUT inserted.AccountID INTO @InsertedAccount
SELECT 
	NEWID() as accountID
	, 'first child' as name
	, @ParentAccountID as ParentAccountID
	, 'Root Node' as parentName

--pull out the accountid again
SELECT TOP 1 
	@ParentAccountID = AccountID 
FROM @InsertedAccount	

DELETE FROM	@InsertedAccount

INSERT INTO Staging
(AccountID, Name, ParentAccountID, parentName)	
OUTPUT inserted.AccountID INTO @InsertedAccount
SELECT 
	NEWID() as accountID
	, 'grandchild' as name
	, @ParentAccountID as ParentAccountID
	, 'first child' as parentName

--pull out the accountid again
SELECT TOP 1 
	@ParentAccountID = AccountID 
FROM @InsertedAccount	

DELETE FROM	@InsertedAccount

INSERT INTO Staging
(AccountID, Name, ParentAccountID, parentName)	
OUTPUT inserted.AccountID INTO @InsertedAccount
SELECT 
	NEWID() as accountID
	, 'great grandchild' as name
	, @ParentAccountID as ParentAccountID
	, 'grandchild' as parentName

--don't have to clear the table variable this time, we're done.
SELECT * FROM Staging

In order to use a WHILE EXISTS loop, you have to be able to define the records you want to work on. For us we want to insert any records from Staging that either have their ParentAccountID value already in the Destination table in the AccountID column, or do not have a ParentAccountID value (NULL).

Here’s my query that shows me the records we could insert.

SELECT *
FROM Staging staging
LEFT JOIN Destination destInsertedCheck
	on staging.AccountID = destInsertedCheck.AccountID
LEFT JOIN Destination destParentInsertedCheck
	on staging.parentAccountID = destParentInsertedCheck.AccountID
WHERE 
	destInsertedCheck.accountID IS NULL --not already inserted
	and (destParentInsertedCheck.AccountID IS NOT NULL 
             OR  
	     staging.parentAccountID IS NULL)--parent is defined or NULL

Notice only our Root Node record is shown. The next thing we need for a WHILE EXISTS loop is the query to do the work, in our case we need to insert these records into the Destination table.

INSERT INTO Destination
(AccountID, Name, ParentAccountID, ParentName)
SELECT staging.*
FROM Staging staging
LEFT JOIN Destination destInsertedCheck
     on staging.AccountID = destInsertedCheck.AccountID
LEFT JOIN Destination destParentInsertedCheck
     on staging.parentAccountID = destParentInsertedCheck.AccountID
WHERE 
     destInsertedCheck.accountID IS NULL --not already inserted
     and (destParentInsertedCheck.AccountID IS NOT NULL 
          OR  
          staging.parentAccountID IS NULL) --parent is defined or NULL

Now, we just need to put it all together. I would suggest changing the SELECT * to SELECT somecolumn in the WHILE EXISTS check. There’s no need to pull back the whole row, one non-nullable column would be fine. Even SELECT 1 would be fine.

Take a look at the full code now.


WHILE EXISTS (
	SELECT staging.AccountID
	FROM Staging staging
	LEFT JOIN Destination destInsertedCheck
		on staging.AccountID = destInsertedCheck.AccountID
	LEFT JOIN Destination destParentInsertedCheck
		on staging.parentAccountID = destParentInsertedCheck.AccountID
	WHERE 
		destInsertedCheck.accountID IS NULL --not already inserted
		and (destParentInsertedCheck.AccountID IS NOT NULL 
			 OR  
			 staging.parentAccountID IS NULL)--parent is defined or NULL
		
	)
BEGIN
	INSERT INTO Destination
	(AccountID, Name, ParentAccountID, ParentName)
	SELECT staging.*
	FROM Staging staging
	LEFT JOIN Destination destInsertedCheck
		on staging.AccountID = destInsertedCheck.AccountID
	LEFT JOIN Destination destParentInsertedCheck
		on staging.parentAccountID = destParentInsertedCheck.AccountID
	WHERE 
		destInsertedCheck.accountID IS NULL --not already inserted
		and (destParentInsertedCheck.AccountID IS NOT NULL 
			 OR  
			 staging.parentAccountID IS NULL) --parent is defined or NULL
END

So now, if you take a look at your Source and Destination tables, you’ll see all the records from your source are in the destination, and you didn’t get any foreign key contraint violations like we saw in the last article. Pretty easy, right? Next time, I’ll show you how you can use a CTE to define what “level” a record is on, and process them from root level to leaf level. Until then, if you have any questions, let me know! I’m here to help.

Leave a Reply Cancel reply