Guest Column | May 15, 2020

Diving Into How To Digest Your Data

By Bullett Manale, Idera


A whole range of different data infrastructures have been supporting organizations throughout the past few years as they strive to become more data-driven. For IT teams and company leaders, it can sometimes be difficult to wrap their heads around the best use case for data marts versus vaults and when to utilize a data lake.

At the core, data marts, vaults, lakes, and warehouses are used to manage data effectively so organizations can pull more valuable data. Below, we dive into what these four tools are, how they relate, and their specific use cases.

What Is A Data Warehouse?

A data warehouse is a curated repository of data. Warehouses provide business users with access to the right information in a usable format – and can include both current and historical information. As data enters the data warehouse environment, it is cleansed, transformed, categorized, and tagged – making it easier to manage, use and monitor from a compliance perspective, which is where automation comes in.

The volume and velocity of data experienced by businesses today mean that manually ingesting this data, processing it, and making sure it's stored and accessible in a way that meets compliance requirements within a data warehouse is unfeasible in the modern world. However, with businesses constantly looking to data as the source of both reports and forecasts, a data warehouse is invaluable. Data lakes mustn't subsume the role of a more structured data infrastructure just because of the perceived effort of ingestion. Automation can help speed the ingestion and processing to fast-track time to value with data-driven decision making in a data warehouse.

What Is A Data Mart?

A data mart is often used for curated data on one specific subject area that needs to be easily accessible in a short amount of time. Due to the specific use cases, it is often quicker and cheaper to build than a data warehouse. The one downfall is that they are unable to manage data to form a full business picture to inform decisions.

What Is A Data Vault?

Data vault modeling is an approach to data warehousing which looks to address some of the challenges posed by transforming data as part of the data warehousing process. One of the great advantages of a data vault is that it does not assess as to what data is "valuable" and what isn't, whereas once data is processed and cleansed into a warehouse environment, this decision has typically been made. Data vaults have the flexibility to manage this, and to address changing sources of data, leading the data vault approach to be credited with providing a "single version of the facts" rather than a "single version of the truth."

For enterprises with large, growing and disparate datasets, a data vault approach to data warehousing can help tame the beast of Big Data into a manageable, business-centric solution, but can take time to set up. Data vault automation is a critical component to ensuring organizations can deliver and maintain data vaults that adhere to the stringent requirements of the Data Vault 2.0 methodology and will be able to do so in a practical, cost-effective, and timely manner.

What Is A Data Lake?

Data lakes are huge collections of data, ranging from raw data that has not been organized or processed, through to varying levels of curated data sets. One of their benefits from an analytics purpose is that the varying types of consumers can access appropriate data for their needs. This makes it perfect for some of the newer use cases such as Data Science, AI, and machine learning, which are viewed by many companies as the future of analytics work. It is a great way to store masses of raw data on scalable storage solutions without attempting traditional ETL or ELT (extract, transform, load), which can be expensive at this volume. However, for more traditional analytics, this type of data environment can be unwieldy and confusing – which is why organizations turn to other solutions to manage essential data in more structured environments.

In terms of positioning within a data infrastructure, data lakes are, if you like, up-stream of other data infrastructure, and can be used as a staging area for a more structured approach such as a data warehouse, as well as providing for data exploration and data science.

Each approach has its role to play in deriving value from data across a company. By having a broad understanding of how they merge, IT managers and business leaders can deploy the best practices for their organization. Automation can be a helpful tool to establish and manage these tools to help reduce the time to value and help a business make decisions to create a competitive edge.

About The Author

Bullett Manale is the vice president of sales engineering at Idera, which recently acquired data automation company WhereScape.