What’s Damming Data Lakes? The State of Data Warehousing and Top Challenges

A data lake is a way of storing a vast amount of data in its natural format.

While data warehouses store data in hierarchies, data lakes have a flat architecture. In some ways, data lakes replace data warehouses but are most often used in addition to governed data storage.

This 2016 trend has many experts conflicted while others are excited to explore the potential.

what's damming data lakes cover

The initial challenge is that data lakes run the risk of becoming “data swamps” — a term coined by 2014 Turing Award winner Michael Stonebraker. Data lakes combine unstructured and structured data, and when this diverse and voluminous data has not been properly curated, it’s near impossible to derive insights.

When the data is curated, commonly through the use of meta data, professionals are able to apply schema on read. In the past, a requirement of data warehousing was to define how the data will be read before the warehouse was built.

Instead, with schema on read, data experts don’t have the same limitations for deriving insights because they can easily run tests and play with the data in imaginative ways. Data lakes are also described as “sandboxes” that can spur innovation.

Data warehousing automation tools have adapted to schema on read practices through late-binding and machine learning techniques. Modern tools used to curate and consolidate data have quickly advanced, and it’s important to remember these practices are entirely new.

So, What’s Damming Data Lakes?

While it may sound like a no-brainer and the natural next step for data warehousing, there are also many obstacles.

Here, we’ve explained key differences between data lakes and data warehouses through a fishing metaphor, and include details about the professionals in charge of data storage practices such as the languages they must know and their key responsibilities.

Experts from Tableau, Qlik and Logi Analytics offered us their advice and predictions. We’ve also included a comment from Gartner on this trend.

Feel free to share this page and the image below to help others understand the next generation of data storage.

data lake vs data warehouse Better Buys

Selecting The Right BI Vendor:
The Ultimate Guide

Choosing a BI vendor is all about finding the right fit. Our exclusive report will walk you through the process and help you select the perfect solution.Download Now

Comments

  1. Very helpful post with interesting infographic 🙂

  2. No, no, no!!! Data lakes DO NOT replace data warehouses!

    A data lake is just that – a place where data is pooled together, usually in its original form.

    A data warehouse is an ARCHITECTURE formulated using one of several methodologies – Third Normal Form, Data Vault or Star Schema.

    You still need to cleanse and structure the data in order to perform any meaningful analysis from it. You can’t just chuck your data in a data lake and hope for the best.

    Building a proper data warehouse is not a trivial task, and ETL tools are to be avoided. Now there are Data Warehouse Automation tools, and these should be considered before embarking on any Business Intelligence project.

    • Julia Scavicchio says:

      Thanks for your comment, Ian! I completely agree. The best way I’ve had this described is as a “sandbox” to explore relations in addition to having a structured warehouse.

  3. It’s a serious asset that data lakes enable both structured and unstructured data to be stored in the same place. While this is looked at as a positive thing, the technology is still fairly new, which means there are some kinks to iron out. It might be best to stick with what has been known to work until the technology is more reliable.

    • Julia Scavicchio says:

      On that thought, I think Brinkman said it best: “A good analogy is the the state of data warehousing and business intelligence circa 1996.”

Speak Your Mind

*