Creating a Single Source of Truth for Data: ETL vs. ELT

By Elesh Mistry, Senior Solution Architect, Rivery

If you have not had the privilege of reading the book “Touching the Void”, I highly recommend it. It’s a story about climbers facing decisions of unprecedented magnitude and consequences. You’re probably wondering why I’m telling you about a story of survival when you’re here to read about data. Let me explain.

While creating coherent data warehouses (DWH) doesn’t compare to the life or death situations described in the book, there are times when you need to make decisions quickly that you can’t back down from. Especially as a leader, you have to ride the wave and be able to explain why a momentary decision creates a ripple effect of good and bad outcomes. It’s not the most receptive topic to touch on, and one that can make or break data teams.

When things don’t go as expected, it can take time and resources to amend any bad decisions. Having a single source of truth, provided by the likes of an extract, transform, and load (ETL) tool can accomplish all your reporting needs and provide data-backed decisions, but it is no easy task. This is something that can take months to get to a point where your data is in a state to be able to accurately report on day-to-day activities. Moreover, throwing in new data sources into the mix or rectifying changes and dealing with source APIs can be detrimental to essential reports.

Data interpretation calls for more

One way organisations have tried to alleviate the pain of creating a single source of truth is through a modern data stack. A modern extract, load and transform (ELT) platform allows you to deliver data into your warehouse of choice quickly. Unlike the ETL method, ELT doesn’t require data transformation before the loading process. Some platforms even deal with source schema drift, and the good ones also allow you to transform the data once in the warehouse with push-down SQL.

The great ones even have Python transformation capabilities which extend the platform’s agility. Even with all of this managed data delivery, there is so much more to be done. It’s one thing managing the transfer of data but being able to interpret it once it’s in the DWH is another. It is not straightforward and can be overwhelming. The problem of interpreting the data becomes your biggest pain. So in the end, all that you’ve accomplished is just having all of the data in one place.

Rather than solving the problem, your data just moved from one spot to another, and you might find that you’re facing the same challenges you had to begin with, except now it exists in your DWH. This is the moment when you’ve touched the void. It’s the moment you’ve realised that everything you have done so far brought you two choices: you either give up now and accept the fact that your data is just bouncing around or you can rethink your stack to give yourself a chance of succeeding and leveraging your data to its fullest.

Every company wants to be more data-driven, but not everyone knows how. Getting the insights is only one step, but it’s about how you translate those insights into actions that will get the ball rolling.

The role of reverse ETL in sorting through your data warehouse

Many providers depend on customers adding additional tools to be able to interpret, move, and activate the data before completing further actions. This can make decision-makers feel helpless and frustrated because you’re just piling one tool on top of another, scraping around for more budget.

This is where reverse ETL comes in. The DWH becomes the source, rather than the target, allowing you to land the data, build your warehouse, gain insights, and subsequently take action on the data once it has landed in your warehouse. The data in your warehouse now becomes the ‘Master Record’. It can be sent back to your original sources to make sure the data is all aligned. By pushing the data back into third-party systems such as business applications, reverse ETL operationalises data through the organisation.

Whether the team works in sales, marketing, or product, the appropriate people can access the accurate and timely data they need, within the appropriate systems that they use. The need to hop between apps is eliminated, offering them more precise data and increasing confidence of the user to make it more actionable. Any anomalies or imperfections in the data found in your single source of truth can be moved back to your sources for any corrections.

Once your data has landed in your warehouse it can be monitored and used to action further transformations or even call external APIs to initiate further processing or even request data refresh to start the journey again. This functionality can be used to create ML use cases and interpret the results to begin extension use cases.

Dissolving the data silos

Data warehouses were introduced to eliminate data silos, but instead, these turned into its own data silos for many companies. Reverse ETL dissolves the barrier between a data warehouse and the rest of the company to close the loop of travelling data, preventing you from touching the void.

Teams will be able to operationalise data, in the systems and processes they feel comfortable with, and act on data to drive results. Running a data team oftentimes requires making difficult decisions that can set up teams for failure or success, but by putting the data in the hands and workflows that need it, when they need it, it helps to reduce risk, maximise resource efficiency, and minimise costs of moving data as it scales.

Technology Dispatch

How decision-makers can eliminate the risk of “touching the void”

You may also want to read