Extract-load-transform integration can offer performance and cost advantages over ETL. Here’s how to pick the right approach.
By Sreedhar Kajeepeta
Can the heavy-lifting of enterprise data integration be optimized just by a clever act of jumbling? Or, to ask it another way, is the age-old Extract, Transform, and Load (ETL) approach going to give way to the Extract, Load and Transform (ELT) alternative for data movement and massaging needs?
This debate is not new, but ELT may be finally coming of age, offering a viable, mainstream alternative to the stored ETL procedures data integrators have been writing for so many years. Mind you, we’re only talking about certain scenarios and data warehouse/business intelligence (DW/BI) workloads, but this shift may be a game change nonetheless.
How ETL and ELT Differ
A key appeal of ELT is loading performance, an advantage it holds because it operates at the infrastructure level, using SQL and related procedural features of the relational database (see diagram below). In contrast, ETL works at the integration server level, using dedicated graphical application development facilities including transformation rules.
ELT is gaining cost-of-performance advantages over ETL in part because software licensing and development costs in DW/BI initiatives now far outweigh the costs of infrastructure. That balance is getting further skewed with the availability of cheaper multi-core (database) servers and related appliances. Add virtualization and cloud computing to the mix, and the numbers stack up in favor of infrastructure, both in terms of direct (lower and pay-as-you-go) costs as well as time savings (for procurement, development, and reduced skill-set). Hence, the interest in ELT, which takes advantage of the processing capacity built into the data warehousing infrastructure.
Given the continued growth of the DW/BI market, the ETL vs. ELT debate has only intensified. Conventional BI environments are the biggest consumers of ETL tools and technologies thanks to complex requirements for disparate data sources and data transformation needs. But even for this camp, the new dynamics related to cost and performance could lead to a bifurcation or blending of ETL and ELT approaches. What’s more, according to an October 2009 forecast from Forrester Research, traditional BI will account for only 40% of a projected $14 billion market for a broader set of BI products expected by 2014. New workloads tied to business activity monitoring (BAM), business analytics, text analytics, and complex event processing have their own unique load and transformation requirements that would be matched against the options available for data integration.
As an enabler of this bigger BI market, the field of data integration tools is expected to grow 17% annually to reach $3B by 2012, according to Gartner estimates. It is in this market that we are likely to see ETL and ELT battling it out for market dominance. With performance and low latency increasingly in demand, you can guess ELT will get a lot of attention.
When to Pick Which Approach
How can we be objective about choosing between ETL and ELT or striking the right balance between the two — a third, blended option sometimes referred to as ETLT? Begin by carefully understanding the requirements related to functionality (data mapping/parsing), volumes, performance, and the economics of the DW/BI initiative. You’ll also need to be sensitive to overarching needs such as centralized data governance (including privacy and security) and network bandwidth/traffic. In considering overall data transport, keep in mind that data is in flight more times with ETL. On developing an adequate skill base and training, ETL requires investments outside of the DBMS whereas ELT assumes the availability of strong DBA skills. When it comes to data connectivity, ETL is more evolved, with plenty of options for connecting to Geographical Information Systems and other less-than-mainstream sources. With ELT, you can rely on the hardware infrastructure for linear scalability, but the RDBMS must be ready to scale up to lend support for Very Large Databases. When it comes to the openness of the technology, keep in mind that ELT poses a greater danger of vendor lock-in.
As a rule of thumb, it’s generally accepted that you should stick with ETL for DW/BI projects when there are ten or more source systems or when the source databases are a terabyte or larger in size. If your project doesn’t fit this description, it’s time to consider ELT and to match tools against your requirements. With this in mind, here’s quick rundown on the ELT roadmaps of leading data integration vendors:
Oracle has been firmly on the side of ELT and is upping the ante with appliances like Oracle/Sun Exadata. Oracle Data Integrator Enterprise Edition (ODI-EE) is a unified solution comprising Oracle Warehouse Builder and Oracle Data Integrator (ODI). ODI came to Oracle with its acquisition of Sunopsis (an ELT vendor that used to work closely with Teradata). Oracle and Teradata have a synergistic relationship around ELT, and Teradata is a reseller of ODI-EE.
Informatica, the pure-play data integrator and a traditional ETL player, now offers the option of deploying the transformation logic to either the Informatica engine (ETL) or the DBMS (ELT), or a hybrid (ETLT).
IBM’s InfoSphere DataStage is an ETL tool that can also be deployed in an ETLT style in conjunction with DB2 or like RDBMSes from Oracle, Teradata, etc. In fact, IBM positions InfoSphere DataStage as a flexible answer to support different data-integration topologies (or permutations/combinations of ETL and ELT) including TELT, TETLT, etc., as required.
Ab Initio’s Graphical Development Environment (GDE) seems well-entrenched in its role as a high-performance ETL technology.
Appliance vendors Netezza and Teradata, with their massively parallel architecture for databases, are definitely architected to benefit from the ELT camp.
With advancements in infrastructure and new delivery models such as Infrastructure-as-a-service (IaaS), ELT may have a broader play as the boundaries of the BI market get broader. By leveraging investments in infrastructure (processors and DBMS’s), ELT may help companies reduce costs related to ETL software and development. ELT is not suitable for every project, but it increasingly looks like the first-stop alternative and blending option for developing a broader enterprise data integration strategy.
Sreedhar Kajeepeta is Global VP & CTO of Technology Consulting Practices for the Global Business Solutions & Services division of CSC. He previously worked for Convansys (now merged with CSC), Cambridge Technology Partners, and Tata Consultancy Services. Write him at firstname.lastname@example.org.