We’ve spent the last few posts working on getting relational data out of our tables and into XML format. Most of the time, that’s what I’ve used XML in SQL to do. But you’ve got to learn to build on what you know to learn. Today, we’re going to begin storing XML data in tables. Starting with 2008 (the first one), you can do this in it’s own data type, rather than stuffing it into a VARCHAR(MAX) field.
The new data type can be used to store either a XML document, or just a fragment. The difference is one is a complete standalone entity that would render in any XML viewer, the other is only a part of a XML document. It wouldn’t render in an XML viewer without throwing an error.
Now that SQL has an actual data type, you need to know a couple things about it.
- The data is actually stored in a VARBINARY(MAX) form. That way it’s capable of storing binary representations of data. I have seen images encoded inside XML documents… it’s nasty, but it is valid.
- Data in the XML Data Type can be typed, or un-typed. Basically the data stored can have a schema collection defined. With that, the Server can validate the data you try to store. Without it… well, the west wasn’t just called wild… it was wild. No rules, whatever you could get away with was legal. Un-typed XML is pretty much the same way.
To type an XML data object, you first have to define a SCHEMA Collection. I’ll give you a simple one to start with, but if you want more details, check out Books On Line.
1: USE AdventureWorks
2: GO
3: IF EXISTS (SELECT 1
4: FROM sys.xml_schema_collections
5: WHERE name='VideoGame')
6: BEGIN
7: DROP XML SCHEMA COLLECTION VideoGames
8: END
9:
10: CREATE XML SCHEMA COLLECTION VideoGames AS
11: '<?xml version="1.0" encoding="utf-8"?>
12: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
13: <xsd:element name="VideoGames">
14: <xsd:complexType>
15: <xsd:attribute name="VideoGameID" type="xsd:integer" />
16: <xsd:attribute name="Name" type="xsd:string" />
17: <xsd:attribute name="Description" type="xsd:string" />
18: </xsd:complexType>
19: </xsd:element>
20: </xsd:schema>'
This schema will let us work with XML like the following.
1: <VideoGame VideoGameID="1" Name="Halo: Reach" Description="Awesome!" />
Really simple. If you want more complex, check out the BOL… They get really advanced, really quickly.
Now that we have the schema, we can create a table that will store that will store XML in that format.
1: CREATE TABLE VideoGames (
2: VideoGameID INT NULL
3: , Name VARCHAR(255) NULL
4: , [Description] VARCHAR(255) NULL
5: , XMLVersion XML(DOCUMENT VideoGame) NULL
6: )
Let’s insert some data. First, here’s an example that works, then one that fails.
1: --works
2: INSERT INTO VideoGames
3: VALUES (1,'Halo:Reach','Awesome!','<VideoGame VideoGameID="1" Name="Halo: Reach" Description="Awesome!" />')
4: --fails
5: INSERT INTO VideoGames
6: VALUES (1,'Halo:Reach','Awesome!','<VideoGame VideoGameID="ONE" Name="Halo: Reach" Description="Awesome!" />')
Notice that when you tried the second query, it failed, telling you that ONE is not valid for VideoGameID. The schema identifies that attribute as an INTEGER. That’s the great thing about using typed XML data types.
Now for the negative news, if you have to make a change to your schema, you’re going to have a long day in front of you. First you’ll have to alter every column that uses that schema, changing them to un-typed. Then, you have to drop and recreate the schema. Then you have to make sure that every column that is going to use the new schema has data that fits the new schema.
If you have any data that doesn’t fit that new schema, you’ll have to update it. After that you have to alter the columns again to use the new schema.
If you have a server that’s pushing capacity as it is… don’t alter the schema during production hours. Since columns that use typed xml actually store the data in the data types you define (text is stored as VARCHAR, numbers as INTEGER or DECIMAL). When you apply schema changes, and you change a type. Your server is going to have to compute every conversion. That’s going cost you a lot of CPU.
Just saying.
Even with this extra cost, I’m still a big fan of typed data columns, since you can’t trust your users to send you data in the right format, knowing they’ve sent bad data as soon as possible saves you a lot of time when you have to figure out what went wrong.
Any questions about setting up XML type columns? If so, send them in. You’ll need to understand this before you can move on to the next topic… Working with the XML data in your tables.