The Community Blog for Business Analysts

Limor Wainstein
Limor Wainstein

Is ETL Still Relevant?

The title of this article poses a pertinent question for modern enterprises that increasingly make use of powerful high-end analytic data engines, such as Hadoop clusters and cloud-based data warehouses (see this article by Forbes, which deals with similar questions).

The challenge for enterprises that use analytic engines is one of data movement—what is the best way to get data from operational systems into data warehouses and other data platforms so that it can be queried for reporting and analytics purposes?

ETL (Extract, Transform, Load) is one such method for moving data, and it is one of the oldest data movement methods. However, debate continues as to how relevant ETL still is.

Below you'll find out exactly what ETL entails, some use cases for ETL tools, and whether ETL tools are still relevant in a world where enterprises can leverage powerful modern data platforms to transform data inside those systems to be ready for use with their business-critical applications, instead of requiring a separate transformation engine.


What is ETL?

ETL (Extract, Transform, Load) is a process of moving data from source systems, transforming data in a transformation engine for use with target applications by applying a set of rules to the data (standardizing, aggregations), and finally, loading the transformed data into the target systems, which are usually data warehouses or other data repositories. The end goal of using ETL is to enable businesses to make data-driven business decisions.

Some ETL use cases and examples are:

  • Pulling data from different transactional systems into ODS (operational data stores) for operational reporting and decision making. 
  • Retrieving data from transactional databases, transforming it for data warehouse use, and loading the transformed data into the data warehouse.
  • Extracting data from XML files, structuring it, and loading to a data mart to serve a particular community of knowledge workers within organizations.

What Are ETL Tools Used For?

ETL tools are used to manage the flow of data between source systems and target systems with ETL processes. Organizations can create their own ETL tools using scripts—this approach is suitable for a small number of data sources with similar types of data.

For most use cases, ETL tools created by other companies offer more functionality by implementing data transformations easily and consistently across various data sources, such as XML files and relational databases. ETL tools use transformation engines to sort, join, merge, reformat, and aggregate data.

Enterprises and organizations can avail of either commercially available ETL tools or open-source tools.

Useful ETL Tools

There are several different ETL tools on the market—here are some examples of the most useful ETL tools currently available:

  • Apatar—this open source ETL tool comes with a practical user interface that can reduce R&D costs. Apatar is written in Java, and organizations can use the tool to easily populate data warehouses and data marts.
  • Scriptella is another open source ETL tool written in Java that can use SQL or any other scripting language to perform data transformations. You can work with multiple data sources ETL file in Scriptella, and the tool integrates with any integrates with any JDBC/ODBC-compliant driver.
  • Stitch is a commercial self-servicing data pipeline that loads data into data warehouses using ETL processes. Stitch requires no API maintenance or scripting, and it can handle both bulk and incremental data updates.
  • Fivetran is a commercially available fully managed data pipeline solution that integrates data from all your cloud services and databases into a single data warehouse using ETL processes.
  • Camel is Apache's open source data integration framework built using Java. Camel has ETL functionality through its available libraries, which can be used to build programs that perform ETL operations.

Wait a Minute. Is ETL Still Relevant?

Enterprises are increasingly leveraging cloud-based solutions for their data needs, in particular, for BI and analytics. This 2017 report found that cloud BI adoption increased in respondent companies from 29% to 43% from 2013 to 2016, and that 78 percent of organizations plan to increase their cloud use for BI and data management through 2017.

With powerful modern solutions such as cloud-based data warehouses, it's becoming more common and more practical to take an ELT approach to data movement. ELT (Extract, Load, Transform) is an alternative way of moving data from source systems to centralized data repositories without transforming the data before it's loaded into the target systems.

With ELT, all extracted raw data resides in the data warehouse, where powerful architectures can transform the data as needed to serve business needs. In other words, the transformation is performed when the analytic queries are run.

The major upside to ELT is that there is no waiting—all data is accessible at all times. This is in contrast to the traditional ETL approach to data movement, in which analysts and BI users must wait for the full ETL process to complete before accessing data.

ETL, therefore, is outdated for most use cases. Cloud infrastructure makes data more accessible than ever before, without the need to maintain complex scripts that transform the data for analytic use, as in ETL.

However, ETL still has a place in legacy data warehouses used by companies or organizations that don't plan to transition to the cloud. With the adoption of cloud-based data solutions skyrocketing, it won't be long before ETL completely loses its relevance.

Closing Thoughts

  • All companies or organizations that want to analyse their data to make business decisions are faced with a common challenge—how to move data from source systems, such as transactional databases, to the target systems used for data analysis and BI.
  • ETL provides one way of moving data by pulling it from source systems, shaping it to be ready for use with analytic applications, and finally loading the data to the target systems.
  • There are several open source and commercial ETL tools available that can make the ETL process more functional and practical.
  • However, the transition to cloud-based data solutions makes ETL processes less relevant, because there is no need to transform data before it moves to the warehouse or other analytic repository—cloud infrastructures have the resources to efficiently transform raw data as it is needed.
  • ETL is outdated, but it's still useful for legacy data warehouses used by companies that don't plan to move to the cloud or don't foresee much future data growth.

 

This entry was published on Nov 29, 2017 / Limor Wainstein. Posted in Technical Topics. Bookmark the Permalink or E-mail it to a friend.
Like this article:
  0 members liked this article

Related Articles

COMMENTS

Only registered users may post comments.


Blog Information

» What is the Community Blog and what are the Benefits of Contributing?

» Review our Blog Posting Guidelines.

» I am looking for the original Modern Analyst blog posts.



Modern Analyst Blog Latests

Jarett Hailes
Jarett Hailes
As we start a new year many of us will take the time to reflect on our accomplishments from 2012 and plan our goals for 2013. We can set small or large goals. goals that will be accomplished quickly or could take several years. For 2013, I think Business Analysts should look to go beyond our traditional boundaries and set audacious goals. Merriam-...
2 Responses
Howard Podeswa
Howard Podeswa
Recently, I was asked by the IIBA to present a talk at one of their chapter meetings. I am reprinting here my response to that invitation in the hope that it will begin a conversation with fellow EEPs and BAs about an area of great concern to the profession. Hi xx …. Regarding the IIBA talk, there is another issue that I am considering. It's p...
12 Responses
Adrian M.
Adrian M.
Continuing the ABC series for Business Analysts, Howard Podeswa created the next installment titled "BA ABCs: “C” is for Class Diagram" as an article rather than a blog post. You can find the article here: BA ABCs: “C” is for Class Diagram Here are the previous two posts: BA ABCs: “A” is for Activity Diagram BA ABCs: “B” is for BPMN
1 Responses
Featured Digital Library Resources 
Copyright 2006-2018 by Modern Analyst Media LLC