The idea of data lake is to have a single store of all data in the enterprise ranging from raw data to transformed data which is used for various tasks including reportingvisualizationanalytics and machine learning.

The data lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data store accommodating all forms of data.