Partitioning was a feature added with Microsoft SQL 2005 to allow users to split up large tables across multiple storage locations. Partitioning can be applied to tables, indexes and indexed views. By partitioning the data across multiple locations you can speed up query times, reduce contention between queries, and improve overall performance under certain conditions.
Partition Functions
In order to partition your object, you must first define how you want to split the data. A partition function will allow you to define the boundary points for the different partitions of data. Consider the following table:
ItemID | ManufacturerID | Name | … |
1 | 10 | Arkanian Heavy Pistol | … |
42 | 12 | Casus Fett’s Heavy Blaster | … |
13 | 31 | Genoharadran Blaster | … |
Let’s say your table held all the items from Knights of the Old Republic, and you were having speed issues with look-ups. Your solution: partition the table by ManufacturerID, that way each manufacturer would be in a separate file group, and you should speed look-ups dramatically (for now this is an assumption, we can look at the actual performance differences in a later article).
An example of a partition function we could use for this table would be:
CREATE PARTITION FUNCTION myPartitionFunction(int) AS RANGE LEFT|RIGHT FOR VALUES (10, 20, 30, 40, 50)
Every partition function you will define requires a name, a data type, and at least one boundary point. I’d like to point out one thing: you can’t define a partition function on a text, ntext, image, varbinary(max), timestamp, xml, varchar(max), or user-defined datatypes. Oh, and if you use a computed column, that column will need to be persisted in order to partition on it. Since the computed column has to be persisted, that means it has to be deterministic too (no variable values allowed).
Think about it, if you’re not storing the values, how can you partition on them?
Notice the Range can be LEFT or RIGHT, Basically this identifies which partition holds the partitioning value, IE which partition would hold 10, 20, etc. If you want the lesser partition to hold the value, then you choose RANGE LEFT, otherwise you choose RANGE RIGHT. The difference will come into play when you try tweaking the performance of your partitioned object, and how many records are stored in each partition.
In our example, by choosing RANGE LEFT, each of the rows you see in the table will appear in a separate partition. If we had chosen RANGE RIGHT Arkanian Heavy Pistol and Casus Fett’s Heavy Blaster would both appear in Partition 2.
There are three points I’d like to make about Partition Functions now:
- Notice the partition function doesn’t have a clause that identifies a specific database object. That means you can reuse this partition function across multiple database objects. That can be really useful if you have multiple tables that could benefit from partitioning (if you have one that could benefit, chances are there are more!)
- You can only have 1000 partitions for a single object. What are the odds you’d need more than that? If you do, check into SQL 2008 Parallel Data Warehouse… you may want to read up on that.
- As a result of having only 1000 partitions, you can only have 999 boundary points defined.
Next time we visit Partitions, I’m going to make sure you are familiar with creating a database with multiple filegroups, and then move on to Creating a Partition Scheme.
Do you have any questions so far? If so, send them in… I’m here to help!