Databricks announces $400M round on $6.2B valuation as analytics platform continues to grow

Databricks is a SaaS business built on top of a bunch of open source tools, and apparently it’s been going pretty well on the business side of things. In fact, the company claims to be one of the fastest growing enterprise cloud companies ever. Today the company announced a massive $400 million Series F funding round on a hefty $6.2 billion valuation. Today’s funding brings the total raised to almost a $900 million.

Andreessen Horowitz’s Late Stage Venture Fund led the round with new investors BlackRock, Inc., T. Rowe Price Associates, Inc. and Tiger Global Management also participating. The institutional investors are particularly interesting here because as a late stage startup, Databricks likely has its eye on a future IPO, and having those investors on board already could give them a head start.

CEO Ali Ghodsi was coy when it came to the IPO, but it sure sounded like that’s a direction he wants to go. “We are one of the fastest growing cloud enterprise software companies on record, which means we have a lot of access to capital as this fundraise shows. The revenue is growing gangbusters, and the brand is also really well known. So an IPO is not something that we’re optimizing for, but it’s something that’s definitely going to happen down the line in the not-too-distant future,” Ghodsi told TechCrunch.

The company announced as of Q3 it’s on a $200 million run rate, and it has a platform that consists of four products, all built on foundational open source: Delta Lake, an open source data lake product; MLflow, an open source project that helps data teams operationalize machine learning; Koalas, which creates a single machine frame work for Spark and Pandos, greatly simplifying working with the two tools; and finally, Spark, the open source analytics engine.

You can download the open source version of all of these tools for free, but they are not easy to use or manage. The way that Databricks make money is by offering each of these tools in the form of Software as a Service. They handle all of the management headaches associated with using these tools and they charge you a subscription price.

It’s a model that seems to be working as the company is growing like crazy. It raised $250 million just last February on a $2.75 billion valuation. Apparently the investors saw room for a lot more growth in the intervening six months, as today’s $6.2 billion valuation shows.


By Ron Miller

Databricks brings its Delta Lake project to the Linux Foundation

Databricks, the big data analytics service founded by the original developers of Apache Spark, today announced that it is bringing its Delta Lake open-source project for building data lakes to the Linux Foundation and under an open governance model. The company announced the launch of Delta Lake earlier this year and even though it’s still a relatively new project, it has already been adopted by many organizations and has found backing from companies like Intel, Alibaba and Booz Allen Hamilton.

“In 2013, we had a small project where we added SQL to Spark at Databricks […] and donated it to the Apache Foundation,” Databricks CEO and co-founder Ali Ghodsi told me. “Over the years, slowly people have changed how they actually leverage Spark and only in the last year or so it really started to dawn upon us that there’s a new pattern that’s emerging and Spark is being used in a completely different way than maybe we had planned initially.”

This pattern, he said, is that companies are taking all of their data and putting it into data lakes and then do a couple of things with this data, machine learning and data science being the obvious ones. But they are also doing things that are more traditionally associated with data warehouses, like business intelligence and reporting. The term Ghodsi uses for this kind of usage is ‘Lake House.’ More and more, Databricks is seeing that Spark is being used for this purpose and not just to replace Hadoop and doing ETL (extract, transform, load). “This kind of Lake House patterns we’ve seen emerge more and more and we wanted to double down on it.”

Spark 3.0, which is launching today, enables more of these use cases and speeds them up significantly, in addition to the launch of a new feature that enables you to add a pluggable data catalog to Spark.

Data Lake, Ghodsi said, is essentially the data layer of the Lake House pattern. It brings support for ACID transactions to data lakes, scalable metadata handling, and data versioning, for example. All the data is stored in the Apache Parquet format and users can enforce schemas (and change them with relative ease if necessary).

It’s interesting to see Databricks choose the Linux Foundation for this project, given that its roots are in the Apache Foundation. “We’re super excited to partner with them,” Ghodsi said about why the company chose the Linux Foundation. “They run the biggest projects on the planet, including the Linux project but also a lot of cloud projects. The cloud-native stuff is all in the Linux Foundation.”

“Bringing Delta Lake under the neutral home of the Linux Foundation will help the open source community dependent on the project develop the technology addressing how big data is stored and processed, both on-prem and in the cloud,” said Michael Dolan, VP of Strategic Programs at the Linux Foundation. “The Linux Foundation helps open source communities leverage an open governance model to enable broad industry contribution and consensus building, which will improve the state of the art for data storage and reliability.”


By Frederic Lardinois

Databricks raises $250M at a $2.75B valuation for its analytics platform

Databricks, the company behind the Apache Spark big data analytics engine, today announced that it has raised a $250 million Series E round led by Andreessen Horowitz. Coatue Management, Microsoft and NEA, also participated in this round, which brings the company’s total funding to $498.5 million. Microsoft’s involvement here is probably a bit of a surprise, but it’s worth noting that it also worked with Databricks on the launch of Azure Databricks as a first-party service on the platform, something that’s still a rarity in the Azure cloud.

As Databricks also today announced, its annual recurring revenue now exceeds $100 million. The company didn’t share whether it’s cash flow-positive at this point, but Databricks CEO and co-founder Ali Ghodsi shared that the company’s valuation is now $2.75 billion.

Current customers, which the company says number around 2,000, include the likes of Nielsen, Hotels.com, Overstock, Bechtel, Shell and HP.

While Databricks is obviously known for its contributions to Apache Spark, the company itself monetizes that work by offering its Unified Analytics platform on top of it. This platform allows enterprises to build their data pipelines across data storage systems and prepare data sets for data scientists and engineers. To do this, Databricks offers shared notebooks and tools for building, managing and monitoring data pipelines, and then uses that data to build machine learning models, for example. Indeed, training and deploying these models is one of the company’s focus areas these days, which makes sense, given that this is one of the main use cases for big data, after all.

On top of that, Databricks also offers a fully managed service for hosting all of these tools.

“Databricks is the clear winner in the big data platform race,” said Ben Horowitz, co-founder and general partner at Andreessen Horowitz, in today’s announcement. “In addition, they have created a new category atop their world-beating Apache Spark platform called Unified Analytics that is growing even faster. As a result, we are thrilled to invest in this round.”

Ghodsi told me that Horowitz was also instrumental in getting the company to re-focus on growth. The company was already growing fast, of course, but Horowitz asked him why Databricks wasn’t growing faster. Unsurprisingly, given that it’s an enterprise company, that means aggressively hiring a larger sales force — and that’s costly. Hence the company’s need to raise at this point.

As Ghodsi told me, one of the areas the company wants to focus on is the Asia Pacific region, where overall cloud usage is growing fast. The other area the company is focusing on is support for more verticals like mass media and entertainment, federal agencies and fintech firms, which also comes with its own cost, given that the experts there don’t come cheap.

Ghodsi likes to call this “boring AI,” since it’s not as exciting as self-driving cars. In his view, though, the enterprise companies that don’t start using machine learning now will inevitably be left behind in the long run. “If you don’t get there, there’ll be no place for you in the next 20 years,” he said.

Engineering, of course, will also get a chunk of this new funding, with an emphasis on relatively new products like MLFlow and Delta, two tools Databricks recently developed and that make it easier to manage the life cycle of machine learning models and build the necessary data pipelines to feed them.


By Frederic Lardinois