You know about ETL and maybe even ELT as data integration strategies, each of them offering their own strengths and weaknesses—but what if you could adopt a hybrid approach that gives you the best of both worlds? That’s exactly where ETLT comes in.
So what is ETLT exactly, and how does ETLT work? What are the benefits and use cases of ETLT? We’ll go over all of that and more in this comprehensive guide to ETLT.
ETL vs. ELT
Before we get into the question of ETLT, let’s give a quick refresher on ETL vs. ELT:
- ETL (extract, transform, load) is a data integration technique in which information is first transformed before being loaded into the target location (usually a centralized data warehouse or data lake).
- ELT (extract, load, transform) flips the order of the transform and load operations: the loaded data is transformed in-place inside the data warehouse or data lake.
ETL has been the traditional approach to data integration, largely due to cost and technology limitations. However, with the rise of modern, high-performance cloud data warehouses that can perform write operations on massive datasets, ELT is gaining ground.
The difference between ETL and ELT may seem subtle, but it has far-reaching implications. Because data transformations in ELT occur within the data warehouse or data lake itself, there’s usually no need for a staging area for intermediate data processing, as there is with ETL. In addition, many organizations prefer ELT because it offers access to data and insights more quickly, without having to wait for the transformation stage to finish. With ELT, both raw and transformed data often sit next to each other within the target data repository.
However, in certain cases performing data transformation before loading, as with ETL, is beneficial if not essential. According to data security and compliance regulations, for example, personally identifiable information (PII) and other sensitive and confidential data may need to be transformed (i.e. masked, encrypted, or otherwise concealed) before you load it into the target location. ETL is also necessary if your target data warehouse has a strict relational schema that requires your data to be in a given format.
What is ETLT? How Does ETLT Work?
As mentioned above, both ETL and ELT have their pros and cons, which can make it difficult to choose between them. But what if you could get the advantages of ETL and ELT from just a single data integration workflow?
Enter ETLT (extract, transform, load, transform). As the name suggests, ETLT combines ETL and ELT into a unified, coherent data integration strategy. ETLT is a hybrid approach that performs data transformations both before and after loading your information into the target location.
The four stages of ETLT are as follows:
- Extract: Data is extracted from one or more source locations (e.g. websites, flat files, relational and non-relational databases, SaaS applications, etc.).
- Transform: The first transformation stage in ETLT focuses on relatively quick and minor changes to the data. As we’ve discussed, this stage often focuses on data masking, encryption, and pseudonymization for PII and sensitive data. This stage may also include other operations such as data formatting and data cleansing. Importantly, the first transformation stage in ETLT does not join or integrate multiple data sources together; each dataset is processed independently.
- Load: The lightly transformed data is loaded into the destination, usually a data warehouse or data lake.
- Transform: The second transformation stage in ETLT focuses on more complex, heavy-duty operations, including joins, aggregations, and integrations of multiple data sources. These transformations are often done for the purpose of big data analytics, distilling raw information into valuable insights.
ETLT: Benefits and Use Cases
The two transformation stages in ETLT might at first seem like overkill, but as discussed above, they each have their role to play. The first ETLT transformation stage is used for enforcing data security and improving data quality, while the second stage is used for in-depth operations and analytical queries.
When compared to ETL and ELT, ETLT offers a “best of both worlds” approach. Users can enjoy the benefits of ELT, such as flexibility and faster access to data, while still ensuring that they follow the applicable data privacy and security standards. By masking, encrypting, or deleting sensitive information during the first transformation stage, you can comply with regulations such as GDPR, CCPA, and HIPAA, while saving the more in-depth transformations for later.
To sum up, the benefits of ETLT (as compared with ETL or ELT alone) are as follows:
- Faster data integration speeds: Like ELT, ETLT gives users faster access to data versus ETL, which front-loads all of the transformations in a single stage. Rather than having to wait for these transformations to complete, ETLT offers shorter time-to-insight by only performing lightweight and essential transformations up front.
- Greater flexibility: Another benefit of ETLT, which it shares with ELT, is the ability to shake off the rigidity of ETL. With ETL, all of the transformations must take place before loading into the target location, which may limit the kinds of analyses you can perform. On the other hand, ELT and ETLT allow you to transform data on the fly to support many different kinds of ad hoc queries.
- Increased data security: The main selling point for ETLT over ELT is that it allows for compliance with data security and privacy standards. These laws and regulations may require sensitive or confidential information to be masked, encrypted, or otherwise concealed before it is taken up by analytics workflows. By allowing for this extra transformation stage, ETLT greatly reduces the chance of a devastating data breach and promotes trust and goodwill among your customers.
- Lower costs and higher data quality: Cloud data warehouses typically charge users based on the amount of information they store. Because ELT loads all of your information without discriminating, it can be a more expensive alternative than ETL, which typically performs data cleansing and formatting during the transformation stage. With ETLT, you can include these cleansing and formatting operations during the first transformation stage as you would with ETL, which allows you to improve the quality of your data and save on costs for information that you don’t need.
With so many advantages of ETLT (over both ETL and ELT), how can you adopt ETLT for your own data integration workflows? Because ETLT is still a niche strategy for data integration, it can be challenging to find an ETLT tool that fits your needs. Many data integration platforms offer an ETL or ELT approach, but not both. If you want to bring ETLT into your data integration workflows, look for a powerful, feature-rich tool that offers both ETL and ELT capabilities, and the ability to combine them in a single data pipeline.