Extract, Transform, and Load
What is ETL?
ETL stands for Extract, Transform and Load, a process used to collect data from various sources, transform the data depending on business rules and load the data into a destination database. The need to use ETL arises from the fact that in modern computing business data resides in multiple locations and in many incompatible formats. For example business data might be stored on the file system in various formats (spreadsheets, plain text, etc), or can be kept in a various database servers like MS SQL Server, Oracle and MySQL for example. Handling all this business information efficiently is a great challenge and ETL plays an important role in solving this problem.
Extract – The first step in the ETL process is extracting the data from various sources. Each of the source systems may store its data in completely different format from the rest. The sources are usually flat files or RDBMS, but almost any data storage can be used as a source for an ETL process.
Transform – Next, transforming the data according to set of business rules. The data transformation may include various operations including but not limited to filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, validating data, etc.
Load – The final ETL step involves loading the transformed data into the destination target, which might be a database or data warehouse.
Many of the biggest software players produce ETL tools, including IBM (IBM InfoSphere DataStage), Oracle (Oracle Warehouse Builder) and of course Microsoft with their SQL Server Integration Services (SSIS) included in certain editions of Microsoft SQL Server 2005 and 2008.