Subscribe to our newsletter and stay informed

Check out our list of top companies

Check out our carefully compiled lists of the most relevant and impactful companies within their fields.

Check out our list of top unicorns

Read and learn about the biggest companies that various countries have produced, how they made it, and what the future looks like for them.

Databricks Unveils LakeFlow for Streamlined Data Pipelines

Databricks, the powerhouse in data analytics, has just unveiled LakeFlow, a game-changing solution
June 13, 2024

Databricks, renowned for its robust data analytics platform, has unveiled LakeFlow, an integrated data engineering solution set to revolutionize how users handle data ingestion, transformation, and orchestration. This announcement, made at the annual Data + AI Summit, marks a significant shift for Databricks, which has traditionally relied on partners like Fivetran, Rudderstack, and dbt for these functions.

With LakeFlow, Databricks users can now build seamless data pipelines, ingesting data from databases such as MySQL, Postgres, SQL Server, and Oracle, as well as from enterprise applications like Salesforce, Dynamics, SharePoint, Workday, NetSuite, and Google Analytics. This move aims to simplify the data management process by eliminating the need for third-party solutions.

Databricks co-founder and CEO, Ali Ghodsi, shared the reasoning behind this strategic pivot. During a discussion at the Databricks CIO Forum two years ago, Ghodsi anticipated requests for advanced machine learning features but was instead met with a strong demand for enhanced data ingestion capabilities. “Everybody in the audience said: we just want to be able to get data in from all these SaaS applications and databases into Databricks,” Ghodsi recounted. Despite initially pointing to the company’s existing partners, it became clear that many customers were building custom solutions to meet their specific needs.

This revelation spurred Databricks to explore creating its own data engineering solution, leading to the acquisition of real-time data replication service Arcion last November. Ghodsi emphasized that while Databricks will continue to support its partner ecosystem, there is a significant market segment that prefers an integrated service built directly into the Databricks platform. “They just want that data to be in Databricks,” he explained.

LakeFlow promises to offer an end-to-end solution for enterprises, allowing them to ingest data from diverse sources, transform it in near real-time, and develop production-ready applications. The system is comprised of three core components:

1. LakeFlow Connect: This component handles the connectors between various data sources and Databricks. It integrates with the Unity Data Catalog for data governance and uses technology from Arcion to support large-scale workloads. Current supported sources include SQL Server, Salesforce, Workday, ServiceNow, and Google Analytics, with MySQL and Postgres integration coming soon.

2. LakeFlow Pipelines: Building on Databricks’ Delta Live Tables framework, this component facilitates data transformation and ETL processes using SQL or Python. It offers low-latency data delivery and incremental processing to sync only changes to the original data.

3. LakeFlow Jobs: This automation engine orchestrates data workflows and ensures data health and delivery. It enables users to perform various actions, such as updating dashboards or training machine learning models, within Databricks.

Ghodsi noted that many Databricks customers are looking to reduce costs and consolidate services, a trend seen across enterprises recently. By offering an integrated data ingestion and transformation service, Databricks aims to address this need.

The rollout of LakeFlow will be phased, with LakeFlow Connect set to be available for preview soon. This comprehensive approach not only streamlines data engineering for Databricks users but also positions the company as a leader in the evolving data analytics landscape.

More about:  |

Last related articles

chevron-down linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram