Databricks raises $1.6B at $38B valuation as it blasts past $600M ARR

Databricks this morning confirmed earlier reports that it was raising new capital at a higher valuation. The data- and AI-focused company has secured a $1.6 billion round at a $38 billion valuation, it said. Bloomberg first reported last week that Databricks was pursuing new capital at that price.

The Series H was led by Counterpoint Global, a Morgan Stanley fund. Other new investors included Baillie Gifford, UC Investments and ClearBridge. A grip of prior investors also kicked in cash to the round.

The new funding brings Databricks’ total private funding raised to $3.5 billion. Notably, its latest raise comes just seven months after the late-stage startup raised $1 billion on a $28 billion valuation. Its new valuation represents paper value creation in excess of $1 billion per month.

The company, which makes open source and commercial products for processing structured and unstructured data in one location, views its market as a new technology category. Databricks calls the technology a data “lakehouse,” a mashup of data lake and data warehouse.

Databricks CEO and co-founder Ali Ghodsi believes that its new capital will help his company secure market leadership.

For context, since the 1980s, large companies have stored massive amounts of structured data in data warehouses. More recently, companies like Snowflake and Databricks have provided a similar solution for unstructured data called a data lake.

In Ghodsi’s view, combining structured and unstructured data in a single place with the ability for customers to execute data science and business-intelligence work without moving the underlying data is a critical change in the larger data market.

“[Data lakehouses are] a new category, and we think there’s going to be lots of vendors in this data category. So it’s a land grab. We want to quickly race to build it and complete the picture,” he said in an interview with TechCrunch.

Ghodsi also pointed out that he is going up against well-capitalized competitors and that he wants the funds to compete hard with them.

“And you know, it’s not like we’re up against some tiny startups that are getting seed funding to build this. It’s all kinds of [large, established] vendors,” he said. That includes Snowflake, Amazon, Google and others who want to secure a piece of the new market category that Databricks sees emerging.

The company’s performance indicates that it’s onto something.

Growth

Databricks has reached the $600 million annual recurring revenue (ARR) milestone, it disclosed as part of its funding announcement. It closed 2020 at $425 million ARR, to better illustrate how quickly it is growing at scale.

Per the company, its new ARR figure represents 75% growth, measured on a year-over-year basis.

That’s quick for a company of its size; per the Bessemer Cloud Index, top-quartile public software companies are growing at around 44% year over year. Those companies are worth around 22x their forward revenues.

At its new valuation, Databricks is worth 63x its current ARR. So Databricks isn’t cheap, but at its current pace should be able to grow to a size that makes its most recent private valuation easily tenable when it does go public, provided that it doesn’t set a new, higher bar for its future performance by raising again before going public.

Ghodsi declined to share timing around a possible IPO, and it isn’t clear whether the company will pursue a traditional IPO or if it will continue to raise private funds so that it can direct list when it chooses to float. Regardless, Databricks is now sufficiently valuable that it can only exit to one of a handful of mega-cap technology giants or go public.

Why hasn’t the company gone public? Ghodsi is enjoying a rare position in the startup market: He has access to unlimited capital. Databricks had to open another $100 million in its latest round, which was originally set to close at just $1.5 billion. It doesn’t lack for investor interest, allowing its CEO to bring aboard the sort of shareholder he wants for his company’s post-IPO life — while enjoying limited dilution.

This also enables him to hire aggressively, possibly buy some smaller companies to fill in holes in Databricks’ product roadmap, and grow outside of the glare of Wall Street expectations from a position of capital advantage. It’s the startup equivalent of having one’s cake and eating it too.

But staying private longer isn’t without risks. If the larger market for software companies was rapidly devalued, Databricks could find itself too expensive to go public at its final private valuation. However, given the long bull market that we’ve seen in recent years for software shares, and the confidence Ghodsi has in his potential market, that doesn’t seem likely.

There’s still much about Databricks’ financial position that we don’t yet know — its gross margin profile, for example. TechCrunch is also incredibly curious what all its fundraising and ensuing spending have done to near-term Databricks operating cash flow results, as well as how long its gross-margin adjusted CAC payback has evolved since the onset of COVID-19. If we ever get an S-1, we might find out.

For now, winsome private markets are giving Ghodsi and crew space to operate an effectively public company without the annoyances that come with actually being public. Want the same thing for your company? Easy: Just reach $600 million ARR while growing 75% year over year.


By Alex Wilhelm

Databricks launches SQL Analytics

AI and data analytics company Databricks today announced the launch of SQL Analytics, a new service that makes it easier for data analysts to run their standard SQL queries directly on data lakes. And with that, enterprises can now easily connect their business intelligence tools like Tableau and Microsoft’s Power BI to these data repositories as well.

SQL Analytics will be available in public preview on November 18.

In many ways, SQL Analytics is the product Databricks has long been looking to build and that brings its concept of a ‘lake house’ to life. It combines the performance of a data warehouse, where you store data after it has already been transformed and cleaned, with a data lake, where you store all of your data in its raw form. The data in the data lake, a concept that Databrick’s co-founder and CEO Ali Ghodsi has long championed, is typically only transformed when it gets used. That makes data lakes cheaper, but also a bit harder to handle for users.

Image Credits: Databricks

“We’ve been saying Unified Data Analytics, which means unify the data with the analytics. So data processing and analytics, those two should be merged. But no one picked that up,” Ghodsi told me. But ‘lake house’ caught on as a term.

“Databricks has always offered data science, machine learning. We’ve talked about that for years. And with Spark, we provide the data processing capability. You can do [extract, transform, load]. That has always been possible. SQL Analytics enables you to now do the data warehousing workloads directly, and concretely, the business intelligence and reporting workloads, directly on the data lake.”

The general idea here is that with just one copy of the data, you can enable both traditional data analyst use cases (think BI) and the data science workloads (think AI) Databricks was already known for. Ideally, that makes both use cases cheaper and simpler.

The service sits on top of an optimized version of Databricks’ open-source Delta Lake storage layer to enable the service to quickly complete queries. In addition, Delta Lake also provides auto-scaling endpoints to keep the query latency consistent, even under high loads.

While data analysts can query these data sets directly, using standard SQL, the company also built a set of connectors to BI tools. Its BI partners include Tableau, Qlik, Looker and Thoughtspot, as well as ingest partners like Fivetran, Fishtown Analytics, Talend and Matillion.

Image Credits: Databricks

“Now more than ever, organizations need a data strategy that enables speed and agility to be adaptable,” said Francois Ajenstat, Chief Product Officer at Tableau. “As organizations are rapidly moving their data to the cloud, we’re seeing growing interest in doing analytics on the data lake. The introduction of SQL Analytics delivers an entirely new experience for customers to tap into insights from massive volumes of data with the performance, reliability and scale they need.”

In a demo, Ghodsi showed me what the new SQL Analytics workspace looks like. It’s essentially a stripped-down version of the standard code-heavy experience that Databricks users are familiar with. Unsurprisingly, SQL Analytics provides a more graphical experience that focuses more on visualizations and not Python code.

While there are already some data analysts on the Databricks platform, this obviously opens up a large new market for the company — something that would surely bolster its plans for an IPO next year.


By Frederic Lardinois

Microsoft’s Azure Synapse Analytics bridges the gap between data lakes and warehouses

At its annual Ignite conference in Orlando, Fla., Microsoft today announced a major new Azure service for enterprises: Azure Synapse Analytics, which Microsoft describes as “the next evolution of Azure SQL Data Warehouse.” Like SQL Data Warehouse, it aims to bridge the gap between data warehouses and data lakes, which are often completely separate. Synapse also taps into a wide variety of other Microsoft services, including Power BI and Azure Machine Learning, as well as a partner ecosystem that includes Databricks, Informatica, Accenture, Talend, Attunity, Pragmatic Works and Adatis. It’s also integrated with Apache Spark.

The idea here is that Synapse allows anybody working with data in those disparate places to manage and analyze it from within a single service. It can be used to analyze relational and unstructured data, using standard SQL.

Screen Shot 2019 10 31 at 10.11.48 AM

Microsoft also highlights Synapse’s integration with Power BI, its easy to use business intelligence and reporting tool, as well as Azure Machine Learning for building models.

With the Azure Synapse studio, the service provides data professionals with a single workspace for prepping and managing their data, as well as for their big data and AI tasks. There’s also a code-free environment for managing data pipelines.

As Microsoft stresses, businesses that want to adopt Synapse can continue to use their existing workloads in production with Synapse and automatically get all of the benefits of the service. “Businesses can put their data to work much more quickly, productively, and securely, pulling together insights from all data sources, data warehouses, and big data analytics systems,” writes Microsoft CVP of Azure Data, Rohan Kumar.

In a demo at Ignite, Kumar also benchmarked Synapse against Google’s BigQuery. Synapse ran the same query over a petabyte of data in 75% less time. He also noted that Synapse can handle thousands of concurrent users — unlike some of Microsoft’s competitors.


By Frederic Lardinois

Snowflake co-founder and president of product Benoit Dageville is coming to TC Sessions: Enterprise

When it comes to a cloud success story, Snowflake checks all the boxes. It’s a SaaS product going after industry giants. It has raised bushels of cash and grown extremely rapidly — and the story is continuing to develop for the cloud data lake company.

In September, Snowflake’s co-founder and president of product Benoit Dageville will join us at our inaugural TechCrunch Sessions: Enterprise event on September 5 in San Francisco.

Dageville founded the company in 2012 with Marcin Zukowski and Thierry Cruanes with a mission to bring the database, a market that had been dominated for decades by Oracle, to the cloud. Later, the company began focusing on data lakes or data warehouses, massive collections of data, which had been previously stored on premises. The idea of moving these elements to the cloud was a pretty radical notion in 2012.

It began by supporting its products on AWS, and more recently expanded to include support for Microsoft Azure and Google Cloud.

The company started raising money shortly after its founding, modestly at first, then much, much faster in huge chunks. Investors included a Silicon Valley who’s who such as Sutter Hill, Redpoint, Altimeter, Iconiq Capital and Sequoia Capital .

Snowflake fund raising by round. Chart: Crunchbase

Snowflake fund raising by round. Chart: Crunchbase

The most recent rounds came last year, starting with a massive $263 million investment in January. The company went back for more in October with an even larger $450 million round.

It brought on industry veteran Bob Muglia in 2014 to lead it through its initial growth spurt. Muglia left the company earlier this year and was replaced by former ServiceNow chairman and CEO Frank Slootman.

TC Sessions: Enterprise (September 5 at San Francisco’s Yerba Buena Center) will take on the big challenges and promise facing enterprise companies today. TechCrunch’s editors will bring to the stage founders and leaders from established and emerging companies to address rising questions, like the promised revolution from machine learning and AI, intelligent marketing automation and the inevitability of the cloud, as well as the outer reaches of technology, like quantum computing and blockchain.

Tickets are now available for purchase on our website at the early-bird rate of $395.

Student tickets are just $245 – grab them here.

We have a limited number of Startup Demo Packages available for $2,000, which includes four tickets to attend the event.

For each ticket purchased for TC Sessions: Enterprise, you will also be registered for a complimentary Expo Only pass to TechCrunch Disrupt SF on October 2-4.


By Ron Miller

MongoDB gets a data lake, new security features and more

MongoDB is hosting its developer conference today and unsurprisingly, the company has quite a few announcements to make. Some are straightforward, like the launch of MongoDB 4.2 with some important new security features, while others, like the launch of the company’s Atlas Data Lake, point the company beyond its core database product.

“Our new offerings radically expand the ways developers can use MongoDB to better work with data,” said Dev Ittycheria, the CEO and President of MongoDB. “We strive to help developers be more productive and remove infrastructure headaches — with additional features along with adjunct capabilities like full-text search and data lake. IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public cloud. It’s our mission to give developers better ways to work with data wherever it resides, including in public and private clouds.”

The highlight of today’s set of announcements is probably the launch of MongoDB Atlas Data Lake. Atlas Data Lake allows users to query data, using the MongoDB Query Language, on AWS S3, no matter their format, including JSON, BSON, CSV, TSV, Parquet and Avro. To get started, users only need to point the service at their existing S3 buckets. They don’t have to manage servers or other infrastructure. Support for Data Lake on Google Cloud Storage and Azure Storage is in the works and will launch in the future.

Also new is Full-Text Search, which gives users access to advanced text search features based on the open-source Apache Lucene 8.

In addition, MongoDB is also now starting to bring together Realm, the mobile database product it acquired earlier this year, and the rest of its product lineup. Using the Realm brand, Mongo is merging its serverless platform, MongoDB Stitch, and Realm’s mobile database and synchronization platform. Realm’s synchronization protocol will now connect to MongoDB Atlas’ cloud database, while Realm Sync will allow developers to bring this data to their applications. 

“By combining Realm’s wildly popular mobile database and synchronization platform with the strengths of Stitch, we will eliminate a lot of work for developers by making it natural and easy to work with data at every layer of the stack, and to seamlessly move data between devices at the edge to the core backend,”  explained Eliot Horowitz, the CTO and co-founder of MongoDB.

As for the latest release of MongoDB, the highlight of the release is a set of new security features. With this release, Mongo is implementing client-side Field Level Encryption. Traditionally, database security has always relied on server-side trust. This typically leaves the data accessible to administrators, even if they don’t have client access. If an attacker breaches the server, that’s almost automatically a catastrophic event.

With this new security model, Mongo is shifting access to the client and to the local drivers. It provides multiple encryptions options and for developers to make use of this, they will use a new ‘encrypt’ JSON scheme attribute.

This ensures that all application code can generally run unmodified and even the admins won’t get access to the database or its logs and backups unless they get client access rights themselves. Since the logic resides in the drivers, the encryption is also handled totally separate from the actual database.

Other new features in MongoDB 4.2 include support for distributed transactions and the ability to manage MongoDB deployments from a single Kubernetes control plane.


By Frederic Lardinois

Databricks open-sources Delta Lake to make data lakes more reliable

Databricks, the company founded by the original developers of the Apache Spark big data analytics engine, today announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these vast data repositories.

Delta Lake, which has long been a proprietary part of Databrick’s offering, is already in production use by companies like Viacom, Edmunds, Riot Games and McGraw Hill.

The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

“Today nearly every company has a data lake they are trying to gain insights from, but data lakes have proven to lack data reliability. Delta Lake has eliminated these challenges for hundreds of enterprises. By making Delta Lake open source, developers will be able to easily build reliable data lakes and turn them into ‘Delta Lakes’,” said Ali Ghodsi, co-founder and CEO at Databricks.

What’s important to note here is that Delta lake runs on top of existing data lakes and is compatible with the Apache spark APIs.

The company is still looking at how the project will be governed in the future. “We are still exploring different models of open source project governance, but the GitHub model is well understood and presents a good trade-off between the ability to accept contributions and governance overhead,” Ghodsi said. “One thing we know for sure is we want to foster a vibrant community, as we see this as a critical piece of technology for increasing data reliability on data lakes. This is why we chose to go with a permissive open source license model: Apache License v2, same license that Apache Spark uses.”

To invite this community, Databricks plans to take outside contributions, just like the Spark project.

“We want Delta Lake technology to be used everywhere on-prem and in the cloud by small and large enterprises,” said Ghodsi. “This approach is the fastest way to build something that can become a standard by having the community provide direction and contribute to the development efforts.” That’s also why the company decided against a Commons Clause licenses that some open-source companies now use to prevent others (and especially large clouds) from using their open source tools in their own commercial SaaS offerings. “We believe the Commons Clause license is restrictive and will discourage adoption. Our primary goal with Delta Lake is to drive adoption on-prem as well as in the cloud.”


By Frederic Lardinois

Why Daimler moved its big data platform to the cloud

Like virtually every big enterprise company, a few years ago, the German auto giant Daimler decided to invest in its own on-premises data centers. And while those aren’t going away anytime soon, the company today announced that it has successfully moved its on-premises big data platform to Microsoft’s Azure cloud. This new platform, which the company calls eXtollo, is Daimler’s first major service to run outside of its own data centers, though it’ll probably not be the last.

As Daimler’s head of its corporate center of excellence for advanced analytics and big data Guido Vetter told me, that the company started getting interested in big data about five years ago. “We invested in technology — the classical way, on-premise — and got a couple of people on it. And we were investigating what we could do with data because data is transforming our whole business as well,” he said.

By 2016, the size of the organization had grown to the point where a more formal structure was needed to enable the company to handle its data at a global scale. At the time, the buzzword was ‘data lakes’ and the company started building its own in order to build out its analytics capacities.

Electric Line-Up, Daimler AG

“Sooner or later, we hit the limits as it’s not our core business to run these big environments,” Vetter said. “Flexibility and scalability are what you need for AI and advanced analytics and our whole operations are not set up for that. Our backend operations are set up for keeping a plant running and keeping everything safe and secure.” But in this new world of enterprise IT, companies need to be able to be flexible and experiment — and, if necessary, throw out failed experiments quickly.

So about a year and a half ago, Vetter’s team started the eXtollo project to bring all the company’s activities around advanced analytics, big data and artificial intelligence into the Azure Cloud and just over two weeks ago, the team shut down its last on-premises servers after slowly turning on its solutions in Microsoft’s data centers in Europe, the U.S. and Asia. All in all, the actual transition between the on-premises data centers and the Azure cloud took about nine months. That may not seem fast, but for an enterprise project like this, that’s about as fast as it gets (and for a while, it fed all new data into both its on-premises data lake and Azure).

If you work for a startup, then all of this probably doesn’t seem like a big deal, but for a more traditional enterprise like Daimler, even just giving up control over the physical hardware where your data resides was a major culture change and something that took quite a bit of convincing. In the end, the solution came down to encryption.

“We needed the means to secure the data in the Microsoft data center with our own means that ensure that only we have access to the raw data and work with the data,” explained Vetter. In the end, the company decided to use thethAzure Key Vault to manage and rotate its encryption keys. Indeed, Vetter noted that knowing that the company had full control over its own data was what allowed this project to move forward.

Vetter tells me that the company obviously looked at Microsoft’s competitors as well, but he noted that his team didn’t find a compelling offer from other vendors in terms of functionality and the security features that it needed.

Today, Daimler’s big data unit uses tools like HD Insights and Azure Databricks, which covers more than 90 percents of the company’s current use cases. In the future, Vetter also wants to make it easier for less experienced users to use self-service tools to launch AI and analytics services.

While cost is often a factor that counts against the cloud since renting server capacity isn’t cheap, Vetter argues that this move will actually save the company money and that storage cost, especially, are going to be cheaper in the cloud than in its on-premises data center (and chances are that Daimler, given its size and prestige as a customer, isn’t exactly paying the same rack rate that others are paying for the Azure services).

As with so many big data AI projects, predictions are the focus of much of what Daimler is doing. That may mean looking at a car’s data and error code and helping the technician diagnose an issue or doing predictive maintenance on a commercial vehicle. Interestingly, the company isn’t currently bringing any of its own IoT data from its plants to the cloud. That’s all managed in the company’s on-premises data centers because it wants to avoid the risk of having to shut down a plant because its tools lost the connection to a data center, for example.


By Frederic Lardinois

Microsoft Azure sets its sights on more analytics workloads

Enterprises now amass huge amounts of data, both from their own tools and applications, as well as from the SaaS applications they use. For a long time, that data was basically exhaust. Maybe it was stored for a while to fulfill some legal requirements, but then it was discarded. Now, data is what drives machine learning models, and the more data you have, the better. It’s maybe no surprise, then, that the big cloud vendors started investing in data warehouses and lakes early on. But that’s just a first step. After that, you also need the analytics tools to make all of this data useful.

Today, it’s Microsoft turn to shine the spotlight on its data analytics services. The actual news here is pretty straightforward. Two of these are services that are moving into general availability: the second generation of Azure Data Lake Storage for big data analytics workloads and Azure Data Explorer, a managed service that makes easier ad-hoc analysis of massive data volumes. Microsoft is also previewing a new feature in Azure Data Factory, its graphical no-code service for building data transformation. Data Factory now features the ability to map data flows.

Those individual news pieces are interesting if you are a user or are considering Azure for your big data workloads, but what’s maybe more important here is that Microsoft is trying to offer a comprehensive set of tools for managing and storing this data — and then using it for building analytics and AI services.

(Photo credit:Josh Edelson/AFP/Getty Images)

“AI is a top priority for every company around the globe,” Julia White, Microsoft’s corporate VP for Azure, told me. “And as we are working with our customers on AI, it becomes clear that their analytics often aren’t good enough for building an AI platform.” These companies are generating plenty of data, which then has to be pulled into analytics systems. She stressed that she couldn’t remember a customer conversation in recent months that didn’t focus on AI. “There is urgency to get to the AI dream,” White said, but the growth and variety of data presents a major challenge for many enterprises. “They thought this was a technology that was separate from their core systems. Now it’s expected for both customer-facing and line-of-business applications.”

Data Lake Storage helps with managing this variety of data since it can handle both structured and unstructured data (and is optimized for the Spark and Hadoop analytics engines). The service can ingest any kind of data — yet Microsoft still promises that it will be very fast. “The world of analytics tended to be defined by having to decide upfront and then building rigid structures around it to get the performance you wanted,” explained White. Data Lake Storage, on the other hand, wants to offer the best of both worlds.

Likewise, White argued that while many enterprises used to keep these services on their on-premises servers, many of them are still appliance-based. But she believes the cloud has now reached the point where the price/performance calculations are in its favor. It took a while to get to this point, though, and to convince enterprises. White noted that for the longest time, enterprises that looked at their analytics projects thought $300 million projects took forever, tied up lots of people and were frankly a bit scary. “But also, what we had to offer in the cloud hasn’t been amazing until some of the recent work,” she said. “We’ve been on a journey — as well as the other cloud vendors — and the price performance is now compelling.” And it sure helps that if enterprises want to meet their AI goals, they’ll now have to tackle these workloads, too.


By Frederic Lardinois

AWS Lake Formation makes setting up data lakes easier

The concept of data lakes has been around for a long time, but being able to set up one of these systems, which store vast amounts of raw data in its native formats, was never easy. AWS wants to change this with the launch of AWS Lake Formation. At its core, this new service, which is available today, allows developers to create a secure data lake within a few days.

While “a few days” may still sound like a long time in this age of instant gratification, it’s nothing in the world of enterprise software.

“Everybody is excited about data lakes,” said AWS CEO Andy Jassy in today’s AWS re:Invent keynote. “People realize that there is significant value in moving all that disparate data that lives in your company in different silos and make it much easier by consolidating it in a data lake.”

Setting up a data lake today means you have to, among other things, configure your storage and (on AWS) S3 buckets, move your data, add metadata and add that to a catalog. And then you have to clean up that data and set up the right security policies for the data lake. “This is a lot of work and for most companies, it takes them several months to set up a data lake. It’s frustrating,” said Jassy.

Lake Formation is meant to handle all of these complications with just a few clicks. It sets up the right tags and cleans up and dedupes the data automatically. And it provides admins with a list of security policies to help secure that data.

“This is a step-level change for how easy it is to set up data lakes,” said Jassy.

more AWS re:Invent 2018 coverage


By Frederic Lardinois