Data lakehouses exist to resolve the challenges of data warehouses and data lakes and to bring their benefits under one data architecture.
For instance, data warehouses are more performant than data lakes, both storing and transforming enterprise data. However, data warehousing requires strict schemas (typically the star schema and the snowflake schema).
Therefore, data warehouses don’t work well with unstructured or semi-structured data, which are critical for artificial intelligence (AI) and ML use cases. They are also limited in their ability to scale.
Data lakes, on the other hand, allow organizations to aggregate all data types—structured data, unstructured data and semi-structured data—from diverse data sources and in one location. They enable more scalable and affordable data storage, but do not have built-in data processing tools.
Data lakehouses merge aspects of data warehouses and data lakes. They use cloud object storage to store data in any format at a low cost. And, on top of that cloud storage sits a warehouse-style analytics infrastructure, which supports high-performance queries, near real-time analytics and business intelligence (BI) efforts.