As you consider your data ingestion needs, you’ll inevitably run into common strategies to solve your requirements. Often, these processes are identified via acronym.
That’s good and bad. Acronyms save time if you’re immersed in the world of data ingestion, but they essentially add a language barrier if you’re not.
In this article, let’s unpack some of the jargon.
We’ll look at three common terms: Data ingestion, ETL, and ELT. Understanding each of these concepts will be helpful as you construct and navigate your organization’s data ingestion solutions.
Let’s start with the basics.
Data Ingestion: A Product of the Big Data Age
Data ingestion is increasingly becoming standard parlance, but it’s actually a relatively new piece of jargon, especially in comparison to terms like ETL.
As we’ve written elsewhere, data ingestion is the compilation of data from assorted sources into a storage medium where it can be accessed for use. Today, this process is ubiquitous in enterprise organizations due to the sheer amount of data that’s generated.
For instance, any organization with internet-of-things (IoT) devices almost certainly has a data ingestion pipeline set up so that those devices can be tracked and analyzed. Likewise, any organization with multiple customer data sources likely employs data ingestion to bring those sources together for use.
Again, this is standard practice now. But only a few decades ago, data ingestion was much less common, because much less data was created. (Data ingestion is a consequence of “big data” – which itself wasn’t coined as a term until 2005.)
ETL: A Classic Approach to Database Management
ETL, on the other hand, originates from database processing protocols that were standardized in the 1970s. Back then, storing and using data required extraction, transformation, and loading – or, in acronym form, ETL.
Extract denotes the collection of data.
Transform denotes processes that are performed on that data to configure it for usefulness or storage; for example, data might be sorted, or filtered, or combined with another source. In a traditional ETL process, transformation happens in a staging area outside of the data warehouse.
Finally, load denotes the amount of data sent to a destination (usually a data warehouse or data lake) where it can be used.
The interesting note here is that, technically, ETL is data ingestion – or at least the process can be used as a method of data ingestion.
The confusion around the term stems from the fact that, more recently, cloud computing has made it possible to flip the final two stages so that loading happens before transformation. This is known as ELT.
ELT: A Cloud-Powered Approach to Data Ingestion
Again, ETL required transformation to happen before data entered storage (i.e. a data warehouse), because storage destinations weren’t equipped to process data.
Today, though, cloud computing has enabled many data warehousing solutions to perform transformation in the destination database. In other words, it’s no longer always necessary to have data transformed before it’s loaded – some solutions now flip the script. Enter ELT.
In ELT, transformation happens in the data warehouse itself, usually when the data is queried. When the query comes in, the data is transformed for usability according to that query and then served appropriately.
This eliminates the need for a staging area outside of the data warehouse, and it also makes the process of loading data quicker, because transformation doesn’t have to happen before data can be loaded. However, it requires a good deal of processing power in the destination area, and it can make queries themselves take a bit longer (although this lag is steadily being reduced).
A Summary of the Terms
So, to recap: data ingestion is a (relatively new) general term denoting the compilation of data for usage.
ETL is a traditional method of data processing that can be used for data ingestion. It involves transforming data for use before loading it into its destination.
ELT is a newer method of data processing that can also be used for data ingestion. It involves loading the data to the destination and requires that any transformation of data happens there.
Both ETL and ELT are still viable methods of data ingestion, depending on your business needs, but implementing the right approach requires technical expertise.
If you need a well-defined data workflow, ETL may make more sense, although it’s worth noting that databases such as SQL Server may have different locking policies that can interfere with ETL processing (especially when using real-time replication). Using database settings such as SQL Server READ_COMMITTED_SNAPSHOT is ON will avoid unnecessary locking and poor query performance.
If you have huge amounts of unstructured data that may be queried in many different ways, you may consider ELT.
Ready to Get Started with Data Ingestion?
Hopefully, the information above has helped you to improve your understanding of data ingestion and the processes that power it. If you’re looking for data ingestion services to put data to use in your business context, let’s talk.
At StarQuest, we’re experts at data ingestion. Our powerful SQDR software can be deployed for replication and ingestion from an extensive range of data sources. And, importantly, our customer service team is regarded as some of the best in the business, with clients calling us “The best vendor support I have ever encountered.”
If you’re looking for data ingestion for migration, data warehousing, application development, auditing, disaster recovery, or another use case – we can help.
Get in touch with us to discuss your data ingestion needs. We can set you up with a no-cost trial of our software using your DBMS of choice, and help you take the first step toward a solution that will benefit your business.