“Developers, as you know, do not like to pay for things”

In the previous part of this EC-1, we looked at the technical details of CockroachDB and how it provides accurate data instantaneously anywhere on the planet. In this installment, we’re going to take a look at the product side of Cockroach, with a particular focus on developer relations.

As a business, Cockroach Labs has many things going for it. The company’s approach to distributed database technology is novel. And, as more companies operate on a global level, CockroachDB has the potential to gain some significant market share internationally. The company is seven years into a typical 10-year maturity model for databases, has raised $355 million, and holds a $2 billion market value. It’s considered a double unicorn. Few database companies can say this.

The company is now aggressively expanding into the database-as-a-service space, offering its own technology in a fully managed package, expanding the spectrum of clients who can take immediate advantage of its products.

But its growth depends upon securing the love of developers while also making its product easier to use for new customers. To that end, I’m going to analyze the company’s pivot to the cloud as well as its extensive outreach to developers as it works to set itself up for long-term, sustainable success.

Cockroach Labs looks to the cloud

These days, just about any company of consequence provides services via the internet, and a growing number of these services are powered by products and services from native cloud providers. Gartner forecasted in 2019 that cloud services are growing at an annual rate of 17.5%, and there’s no sign that the growth has abated at all.

Its founders’ history with Google back in the mid-2000s has meant that Cockroach Labs has always been aware of the impact of cloud services on the commercial web. Unsurprisingly, CockroachDB could run cloud native right from its first release, given that its architecture presupposes the cloud in its operation — as we saw in part 2 of this EC-1.


By Danny Crichton

The CockroachDB EC-1

Every application is a palimpsest of technologies, each layer forming a base that enables the next layer to function. Web front ends rely on JavaScript and browser DOM, which rely on back-end APIs, which themselves rely on databases.

As one goes deeper down the stack, engineering decisions become ever more conservative — changing the location of a button in a web app is an inconvenience; changing a database engine can radically upend an entire project.

It’s little surprise then that database technologies are among the longest-lasting engineering projects in the modern software developer toolkit. MySQL, which remains one of the most popular database engines in the world, was first released in the mid-1990s, and Oracle Database, launched more than four decades ago, is still widely used in high-performance corporate environments.

Database technology can change the world, but the world in these parts changes very, very slowly. That’s made building a startup in the sector a tough equation: Sales cycles can be painfully slow, even when new features can dramatically expand a developer’s capabilities. Competition is stiff and comes from some of the largest and most entrenched tech companies in the world. Exits have also been few and far between.

That challenge — and opportunity — is what makes studying Cockroach Labs so interesting. The company behind CockroachDB attempts to solve a long-standing problem in large-scale, distributed database architecture: How to make it so that data created in one place on the planet is always available for consumption by applications that are thousands of miles away, immediately and accurately. Making global data always available immediately and accurately might sound like a simple use case, but in reality it’s quite the herculean task. Cockroach Labs’ story is one of an uphill struggle, but one that saw it turn into a next-generation, $2-billion-valued database contender.

The lead writer of this EC-1 is Bob Reselman. Reselman has been writing about the enterprise software market for more than two decades, with a particular emphasis on teaching and educating engineers on technology. The lead editor for this package was Danny Crichton, the assistant editor was Ram Iyer, the copy editor was Richard Dal Porto, figures were designed by Bob Reselman and stylized by Bryce Durbin, and illustrations were drawn by Nigel Sussman.

CockroachDB had no say in the content of this analysis and did not get advance access to it. Reselman has no financial ties to CockroachDB or other conflicts of interest to disclose.

The CockroachDB EC-1 comprises four main articles numbering 9,100 words and a reading time of 37 minutes. Here’s what we’ll be crawling over:

We’re always iterating on the EC-1 format. If you have questions, comments or ideas, please send an email to TechCrunch Managing Editor Danny Crichton at [email protected].


By Danny Crichton

CockroachDB, the database that just won’t die

There is an art to engineering, and sometimes engineering can transform art. For Spencer Kimball and Peter Mattis, those two worlds collided when they created the widely successful open-source graphics program, GIMP, as college students at Berkeley.

That project was so successful that when the two joined Google in 2002, Sergey Brin and Larry Page personally stopped by to tell the new hires how much they liked it and explained how they used the program to create the first Google logo.

Cockroach Labs was started by developers and stays true to its roots to this day.

In terms of good fortune in the corporate hierarchy, when you get this type of recognition in a company such as Google, there’s only one way you can go — up. They went from rising stars to stars at Google, becoming the go-to guys on the Infrastructure Team. They could easily have looked forward to a lifetime of lucrative employment.

But Kimball, Mattis and another Google employee, Ben Darnell, wanted more — a company of their own. To realize their ambitions, they created Cockroach Labs, the business entity behind their ambitious open-source database CockroachDB. Can some of the smartest former engineers in Google’s arsenal upend the world of databases in a market spotted with the gravesites of storage dreams past? That’s what we are here to find out.

Berkeley software distribution

Mattis and Kimball were roommates at Berkeley majoring in computer science in the early-to-mid-1990s. In addition to their usual studies, they also became involved with the eXperimental Computing Facility (XCF), an organization of undergraduates who have a keen, almost obsessive interest in CS.


By Danny Crichton

PingCAP, the open-source developer behind TiDB, closes $270 million Series D

PingCAP, the open-source software developer best known for NewSQL database TiDB, has raised a $270 million Series D. TiDB handles hybrid transactional and analytical processing (HTAP), and is aimed at high-growth companies, including payment and e-commerce services, that need to handle increasingly large amounts of data.

The round’s lead investors were GGV Capital, Access Technology Ventures, Anatole Investment, Jeneration Capital and 5Y Capital (formerly known as Morningside Venture Capital). It also included participation from Coatue, Bertelsmann Asia Investment Fund, FutureX Capital, Kunlun Capital, Trustbridge Partners, and returning investors Matrix Partners China and Yunqi Partners.

The funding brings PingCAP’s total raised so far to $341.6 million. Its last round, a Series C of $50 million, was announced back in September 2018.

PingCAP says TiDB has been adopted by about 1,500 companies across the world. Some examples include Square; Japanese mobile payments company PayPay; e-commerce app Shopee; video-sharing platform Dailymotion; and ticketing platfrom BookMyShow. TiDB handles online transactional processing (OLTP) and online analytical processing (OLAP) in the same database, which PingCAP says results in faster real-time analytics than other distributed databases.

In June, PingCAP launched TiDB Cloud, which it describes as fully-managed “TiDB as a Service,” on Amazon Web Services and Google Cloud. The company plans to add more platforms, and part of the funding will be used to increase TiDB Cloud’s global user base.


By Catherine Shu

Microsoft now lets you bring your own data types to Excel

Over the course of the last few years, Microsoft started adding the concept of ‘data types’ to Excel, that is, the ability to pull in geography and real-time stock data from the cloud, for example. Thanks to its partnership with Wolfram, Excel now features over 100 of these data types that can flow into a spreadsheet. But you won’t be limited to only these pre-built data types for long. Soon, Excel will also let you bring in your own data types.

That means you can have a ‘customer’ data type, for example, that can bring in rich customer data from a third-party service into Excel. The conduit fort his is either Power BI, which now allows Excel to pull in any data you previously published there, or Microsoft’s Power Query feature in Excel that lets you connect to a wide variety of data sources, including common databases like SQL Server, MySQL and PostreSQL, as well as third-party services like Teradata and Facebook.

“Up to this point, the Excel grid has been flat… it’s two dimensional,” Microsoft’ head of product for Excel, Brian Jones, writes in today’s announcement. “You can lay out numbers, text, and formulas across the flexible grid, and people have built amazing things with those capabilities. Not all data is flat though and forcing data into that 2D structure has its limits. With Data Types we’ve added a 3rd dimension to what you can build with Excel. Any cell can now contain a rich set of structured data… in just a single cell.”

The promise here is that this will make Excel more flexible and I’m sure a lot of enterprises will adapt these capabilities. These companies aren’t likely to move to Airtable or similar Excel-like tools anytime soon but have data analysis needs that are only increasing now that every company gathers more data than it knows what to do with. This is also a feature that none of Excel’s competitors currently offer, including Google Sheets.


By Frederic Lardinois

Hasura raises $25 million Series B and adds MySQL support to its GraphQL service

Hasura, a service that provides developers with an open-source engine that provides them a GraphQL API to access their databases, today announced that it has raised a $25 million Series B round led by Lightspeed Venture Partners. Previous investors Vertex Ventures US, Nexus Venture Partners, Strive VC and SAP.iO Fund also participated in this round.

The new round, which the team raised after the COVID-19 pandemic had already started, comes only six months after the company announced its $9.9 million Series A round. In total, Hasura has now raised $36.5 million.

“We’ve been seeing rapid enterprise traction in 2020. We’ve wanted to accelerate our efforts investing in the Hasura community and our cloud product that we recently launched and to ensure the success of our enterprise customers. Given the VC inbound interest, a fundraise made sense to help us step on the gas pedal and give us room to grow comfortably,” Hasura co-founder and CEO Tanmai Gopa told me.

In addition to the new funding, Hasura also today announced that it has added support for MySQL databases to its service. Until now, the company’s service only worked with PostgreSQL databases.

Rajoshi Ghosh, co-founder and COO (left) and Tanmai Gopal, co-founder and CEO (right).

Rajoshi Ghosh, co-founder and COO (left) and Tanmai Gopal, co-founder and CEO (right).

As the company’s CEO and co-founder Tanmai Gopal told me, MySQL support has long been at the top of the most requested features by the service’s users. Many of these users — who are often in the health care and financial services industry — are also working with legacy systems they are trying to connect to modern applications and MySQL plays an important role there, given how long it has been around.

In addition to adding MySQL support, Hasura is also adding support for SQL Server to its line-up, but for now, that’s in early access.

“For MySQL and SQL Server, we’ve seen a lot of demand from our healthcare and financial services / fin-tech users,” Gopa said. “They have a lot of existing online data, especially in these two databases, that they want to activate to build new capabilities and use while modernizing their applications.

Today’s announcement also comes only a few months after the company launched a fully-managed managed cloud service for its service, which complements its existing paid Pro service for enterprises.

“We’re very impressed by how developers have taken to Hasura and embraced the GraphQL approach to building applications,” said Gaurav Gupta, partner at Lightspeed Venture Partners and Hasura board member. “Particularly for front-end developers using technologies like React, Hasura makes it easy to connect applications to existing databases where all the data is without compromising on security and performance. Hasura provides a lovely bridge for re-platforming applications to cloud-native approaches, so we see this approach being embraced by enterprise developers as well as front-end developers more and more.”

The company plans to use the new funding to add support for more databases and to tackle some of the harder technical challenges around cross-database joins and the company’s application-level data caching system. “We’re also investing deeply in company building so that we can grow our GTM and engineering in tandem and making some senior hires across these functions,” said Gopa.


By Frederic Lardinois

Google Cloud launches its Business Application Platform based on Apigee and AppSheet

Unlike some of its competitors, Google Cloud has recently started emphasizing how its large lineup of different services can be combined to solve common business problems. Instead of trying to sell individual services, Google is focusing on solutions and the latest effort here is what it calls its Business Application Platform, which combines the API management capabilities of Apigee with the no-code application development platform of AppSheet, which Google acquired earlier this year.

As part of this process, Google is also launching a number of new features for both services today. The company is launching the beta of a new API Gateway, built on top of the open-source Envoy project, for example. This is a fully-managed service that is meant o makes it easier for developers to secure and manage their API across Google’s cloud computing services and serverless offerings like Cloud Functions and Cloud Run. The new gateway, which has been in alpha for a while now, offers all the standard features you’d expect, including authentication, key validation and rate limiting.

As for its low-code service AppSheet, the Google Cloud team is now making it easier to bring in data from third-party applications thanks to the general availability to Apigee as a data source for the service. AppSheet already supported standard sources like MySQL, Salesforce and G Suite, but this new feature adds a lot of flexibility to the service.

With more data comes more complexity, so AppSheet is also launching new tools for automating processes inside the service today, thanks to the early access launch of AppSheet Automation. Like the rest of AppSheet, the promise here is that developers won’t have to write any code. Instead, AppSheet Automation provides a visual interface, that according to Google, “provides contextual suggestions based on natural language inputs.” 

“We are confident the new category of business application platforms will help empower both technical and line of business developers with the core ability to create and extend applications, build and automate workflows, and connect and modernize applications,” Google notes in today’s announcement. And indeed, this looks like a smart way to combine the no-code environment of AppSheet with the power of Apigee .


By Frederic Lardinois

Microsoft launches Azure Synapse Link to help enterprises get faster insights from their data

At its Build developer conference, Microsoft today announced Azure Synapse Link, a new enterprise service that allows businesses to analyze their data faster and more efficiently, using an approach that’s generally called ‘hybrid transaction/analytical processing’ (HTAP). That’s a mouthful, it essentially enables enterprises to use the same database system for analytical and transactional workloads on a single system. Traditionally, enterprises had to make some tradeoffs between either building a single system for both that was often highly over-provisioned or to maintain separate systems for transactional and analytics workloads.

Last year, at its Ignite conference, Microsoft announced Azure Synapse Analytics, an analytics service that combines analytics and data warehousing to create what the company calls “the next evolution of Azure SQL Data Warehouse.” Synapse Analytics brings together data from Microsoft’s services and those from its partners and makes it easier to analyze.

“One of the key things, as we work with our customers on their digital transformation journey, there is an aspect of being data-driven, of being insights-driven as a culture, and a key part of that really is that once you decide there is some amount of information or insights that you need, how quickly are you able to get to that? For us, time to insight and a secondary element, which is the cost it takes, the effort it takes to build these pipelines and maintain them with an end-to-end analytics solution, was a key metric we have been observing for multiple years from our largest enterprise customers,” said Rohan Kumar, Microsoft’s corporate VP for Azure Data.

Synapse Link takes the work Microsoft did on Synaps Analytics a step further by removing the barriers between Azure’s operational databases and Synapse Analytics, so enterprises can immediately get value from the data in those databases without going through a data warehouse first.

“What we are announcing with Synapse Link is the next major step in the same vision that we had around reducing the time to insight,” explained Kumar. “And in this particular case, a long-standing barrier that exists today between operational databases and analytics systems is these complex ETL (extract, transform, load) pipelines that need to be set up just so you can do basic operational reporting or where, in a very transactionally consistent way, you need to move data from your operational system to the analytics system, because you don’t want impact the performance of the operational system in any way because that’s typically dealing with, depending on the system, millions of transactions per second.”

ETL pipelines, Kumar argued, are typically expensive and hard to build and maintain, yet enterprises are now building new apps — and maybe even line of business mobile apps — where any action that consumers take and that is registered in the operational database is immediately available for predictive analytics, for example.

From the user perspective, enabling this only takes a single click to link the two, while it removes the need for managing additional data pipelines or database resources. That, Kumar said, was always the main goal for Synapse Link. “With a single click, you should be able to enable real-time analytics on you operational data in ways that don’t have any impact on your operational systems, so you’re not using the compute part of your operational system to do the query, you actually have to transform the data into a columnar format, which is more adaptable for analytics, and that’s really what we achieved with Synapse Link.”

Because traditional HTAP systems on-premises typically share their compute resources with the operational database, those systems never quite took off, Kumar argued. In the cloud, with Synapse Link, though, that impact doesn’t exist because you’re dealing with two separate systems. Now, once a transaction gets committed to the operational database, the Synapse Link system transforms the data into a columnar format that is more optimized for the analytics system — and it does so in real time.

For now, Synapse Link is only available in conjunction with Microsoft’s Cosmos DB database. As Kumar told me, that’s because that’s where the company saw the highest demand for this kind of service, but you can expect the company to add support for available in Azure SQL, Azure Database for PostgreSQL and Azure Database for MySQL in the future.


By Frederic Lardinois

They scaled YouTube. Now they’ll shard everyone with PlanetScale

When the former CTOs of YouTube, Facebook, and Dropbox seed fund a database startup, you know there’s something special going on under the hood. Jiten Vaidya and Sugu Sougoumarane saved YouTube from a scalability nightmare by inventing and open sourcing Vitess, a brilliant relational data storage system. But in the decade since working there, the pair have been inundated with requests from tech companies desperate for help building the operational scaffolding needed to actually integrate Vitess.

So today the pair are revealing their new startup PlanetScale that makes it easy to build multi-cloud databases that handle enormous amounts of information without locking customers into Amazon, Google, or Microsoft’s infrastructure. Battletested at YouTube, the technology could allow startups to fret less about their backend and focus more on their unique value proposition. “Now they don’t have to reinvent the wheel” Vaidya tells me. “A lot of companies facing this scaling problem end up solving it badly in-house and now there’s a way to solve that problem by using us to help.”

PlanetScale has quietly raised a $3 million seed round in April led by SignalFire and joined by a who’s who of engineering luminaries. They include YouTube co-founder and CTO Steve Chen, Quora CEO and former Facebook CTO Adam D’Angelo, former Dropbox CTO Aditya Agarwal, PayPal and Affirm co-founder Max Levchin, MuleSoft co-founder and CTO Ross Mason, Google director of engineering Parisa Tabriz, and Facebook’s first female engineer and South Park Commons Founder Ruchi Sanghvi. If anyone could foresee the need for Vitess implementation services, it’s these leaders who’ve dealt with scaling headaches at tech’s top companies.

But how can a scrappy startup challenge the tech juggernauts for cloud supremacy? First, by actually working with them. The PlanetScale beta that’s now launching lets companies spin up Vitess clusters on its database-as-a-service, their own through a licensing deal, or on AWS with Google Cloud and Microsoft Azure coming shortly. Once these integrations with the tech giants are established, PlanetScale clients can use it as an interface for a multi-cloud setup where they could keep their data master copies on AWS US-West with replicas on Google Cloud in Ireland and elsewhere. That protects companies from becoming dependent on one provider and then getting stuck with price hikes or service problems.

PlanetScale also promises to uphold the principles that undergirded Vitess. “It’s our value that we will keep everything in the query pack completely open source so none of our customers ever have to worry about lock-in” Vaidya says.

PlanetScale co-founders (from left): Jiten Vaidya and Sugu Sougoumarane

Battletested, YouTube Approved

He and Sougoumarane met 25 years ago while at Indian Institute Of Technology Bombay. Back in 1993 they worked at pioneering database company Informix together before it flamed out. Sougoumarane was eventually hired by Elon Musk as an early engineer for X.com before it got acquired by PayPal, and then left for YouTube. Vaidya was working at Google and the pair were reunited when it bought YouTube and Sougoumarane pulled him on to the team.

“YouTube was growing really quickly and the relationship database they were using with MySQL was sort of falling apart at the seams” Vaidya recalls. Adding more CPU and memory to the database infra wasn’t cutting it, so the team created Vitess. The horizontal scaling sharding middleware for MySQL let users segment their database to reduce memory usage while still being able to rapidly run operations. YouTube has smoothly ridden that infrastructure to 1.8 billion users ever since.

“Sugu and Mike Solomon invented and made Vitess open source right from the beginning since 2010 because they knew the scaling problem wasn’t just for YouTube, and they’ll be at other companies 5 or 10 years later trying to solve the same problem” Vaidya explains. That proved true, and now top apps like Square and HubSpot run entirely on Vitess, with Slack now 30 percent onboard.

Vaidya left YouTube in 2012 and became the lead engineer at Endorse, which got acquired by Dropbox where he worked for four years. But in the meantime, the engineering community strayed towards MongoDB-style key-value store databases, which Vaidya considers inferior. He sees indexing issues and says that if the system hiccups during an operation, data can become inconsistent — a big problem for banking and commerce apps. “We think horizontally-scaled relationship databases are more elegant and are something enterprises really need.

Database Legends Reunite

Fed up with the engineering heresy, a year ago Vaidya committed to creating PlanetScale. It’s composed of four core offerings: professional training in Vitess, on-demand support for open source Vitess users, Vitess database-as-a-service on Planetscale’s servers, and software licensing for clients that want to run Vitess on premises or through other cloud providers. It lets companies re-shard their databases on the fly to relocate user data to comply with regulations like GDPR, safely migrate from other systems without major codebase changes, make on-demand changes, and run on Kubernetes.

The PlanetScale team

PlanetScale’s customers now include Indonesian ecommerce giant Bukalapak, and it’s helping Booking.com, GitHub, and New Relic migrate to open source Vitess. Growth is suddenly ramping up due to inbound inquiries. Last month around when Square Cash became the number one app, its engineering team published a blog post extolling the virtues of Vitess. Now everyone’s seeking help with Vitess sharding, and PlanetScale is waiting with open arms. “Jiten and Sugu are legends and know firsthand what companies require to be successful in this booming data landscape” says Ilya Kirnos, founding partner and CTO of SignalFire.

The big cloud providers are trying to adapt to the relational database trend, with Google’s Cloud Spanner and Cloud SQL, and Amazon’s AWS SQL and AWS Aurora. Their huge networks and marketing war chests could pose a threat. But Vaidya insists that while it might be easy to get data into these systems, it can be a pain to get it out. PlanetScale is designed to give them freedom of optionality through its multi-cloud functionality so their eggs aren’t all in one basket.

Finding product market fit is tough enough. Trying to suddenly scale a popular app while also dealing with all the other challenges of growing a company can drive founders crazy. But if it’s good enough for YouTube, startups can trust PlanetScale to make databases one less thing they have to worry about.


By Josh Constine

MariaDB acquires Clusterix

MariaDB, the company behind the eponymous MySQL drop-in replacement database, today announced that it has acquired Clusterix, which itself is a MySQL drop-in replacement database, but with a focus on scalability. MariaDB will integrate Clusterix’s technology into its own database, which will allow it to offer its users a more scalable database service in the long run.

That by itself would be an interesting development for the popular open source database company. But there’s another angle to this story, too. In addition to the acquisition, MariaDB also today announced that cloud computing company ServiceNow is investing in MariaDB, an investment that helped it get to today’s acquisition. ServiceNow doesn’t typically make investments, though it has made a few acquisitions. It is a very large MariaDB user, though, and it’s exactly the kind of customer that will benefit from the Clusterix acquisition.

MariaDB CEO Michael Howard tells me that ServiceNow current supports about 80,000 instances of MariaDB. With this investment (which is actually an add-on to MariaDB’s 2017 Series C round), ServiceNow’s SVP of Development and Operations Pat Casey will join MariaDB’s board.

Why would MariaDB acquire a company like Clusterix, though? When I asked Howard about the motivation, he noted that he’s now seeing more companies like ServiceNow that are looking at a more scalable way to run MariaDB. Howard noted that it would take years to build a new database engine from the ground up.

“You can hire a lot of smart people individually, but not necessarily have that experience built into their profile,” he said. “So that was important and then to have a jumpstart in relation to this market opportunity — this mandate from our market. It typically takes about nine years, to get a brand new, thorough database technology off the ground. It’s not like a SaaS application where you can get a front-end going in about a year or so.

Howard also stressed that the fact that the teams at Clusterix and MariaDB share the same vocabulary, given that they both work on similar problems and aim to be compatible with MySQL, made this a good fit.

While integrating the Clusterix database technology into MariaDB won’t be trivial, Howard stressed that the database was always built to accommodate external database storage engines. MariaDB will have to make some changes to its APIs to be ready for the clustering features of Clusterix. “It’s not going to be a 1-2-3 effort,” he said. “It’s going to be a heavy-duty effort for us to do this right. But everyone on the team wants to do it because it’s good for the company and our customers.

MariaDB did not disclose the price of the acquisition. Since it was founded in 2006, though, the Y Combinator-incubated Clusterix had raised just under $72 million, though. MariaDB has raised just under $100 million so far, so it’s probably a fair guess that Clusterix didn’t necessarily sell for a large multiple of that.

 


By Frederic Lardinois