How engineers fought the CAP theorem in the global war on latency

CockroachDB was intended to be a global database from the beginning. The founders of Cockroach Labs wanted to ensure that data written in one location would be viewable immediately in another location 10,000 miles away. The use case was simple, but the work needed to make it happen was herculean.

The company is betting the farm that it can solve one of the largest challenges for web-scale applications. The approach it’s taking is clever, but it’s a bit complicated, particularly for the non-technical reader. Given its history and engineering talent, the company is in the process of pulling it off and making a big impact on the database market, making it a technology well worth understanding. In short, there’s value in digging into the details.

Using CockroachDB’s multiregion feature to segment data according to geographic proximity fulfills Cockroach Labs’ primary directive: To get data as close to the user as possible.

In part 1 of this EC-1, I provided a general overview and a look at the origins of Cockroach Labs. In this installment, I’m going to cover the technical details of the technology with an eye to the non-technical reader. I’m going to describe the CockroachDB technology through three questions:

  1. What makes reading and writing data over a global geography so hard?
  2. How does CockroachDB address the problem?
  3. What does it all mean for those using CockroachDB?

What makes reading and writing data over a global geography so hard?

Spencer Kimball, CEO and co-founder of Cockroach Labs, describes the situation this way:

There’s lots of other stuff you need to consider when building global applications, particularly around data management. Take, for example, the question and answer website Quora. Let’s say you live in Australia. You have an account and you store the particulars of your Quora user identity on a database partition in Australia.

But when you post a question, you actually don’t want that data to just be posted in Australia. You want that data to be posted everywhere so that all the answers to all the questions are the same for everybody, anywhere. You don’t want to have a situation where you answer a question in Sydney and then you can see it in Hong Kong, but you can’t see it in the EU. When that’s the case, you end up getting different answers depending where you are. That’s a huge problem.

Reading and writing data over a global geography is challenging for pretty much the same reason that it’s faster to get a pizza delivered from across the street than from across the city. The essential constraints of time and space apply. Whether it’s digital data or a pepperoni pizza, the further away you are from the source, the longer stuff takes to get to you.


By Danny Crichton

“Developers, as you know, do not like to pay for things”

In the previous part of this EC-1, we looked at the technical details of CockroachDB and how it provides accurate data instantaneously anywhere on the planet. In this installment, we’re going to take a look at the product side of Cockroach, with a particular focus on developer relations.

As a business, Cockroach Labs has many things going for it. The company’s approach to distributed database technology is novel. And, as more companies operate on a global level, CockroachDB has the potential to gain some significant market share internationally. The company is seven years into a typical 10-year maturity model for databases, has raised $355 million, and holds a $2 billion market value. It’s considered a double unicorn. Few database companies can say this.

The company is now aggressively expanding into the database-as-a-service space, offering its own technology in a fully managed package, expanding the spectrum of clients who can take immediate advantage of its products.

But its growth depends upon securing the love of developers while also making its product easier to use for new customers. To that end, I’m going to analyze the company’s pivot to the cloud as well as its extensive outreach to developers as it works to set itself up for long-term, sustainable success.

Cockroach Labs looks to the cloud

These days, just about any company of consequence provides services via the internet, and a growing number of these services are powered by products and services from native cloud providers. Gartner forecasted in 2019 that cloud services are growing at an annual rate of 17.5%, and there’s no sign that the growth has abated at all.

Its founders’ history with Google back in the mid-2000s has meant that Cockroach Labs has always been aware of the impact of cloud services on the commercial web. Unsurprisingly, CockroachDB could run cloud native right from its first release, given that its architecture presupposes the cloud in its operation — as we saw in part 2 of this EC-1.


By Danny Crichton

The CockroachDB EC-1

Every application is a palimpsest of technologies, each layer forming a base that enables the next layer to function. Web front ends rely on JavaScript and browser DOM, which rely on back-end APIs, which themselves rely on databases.

As one goes deeper down the stack, engineering decisions become ever more conservative — changing the location of a button in a web app is an inconvenience; changing a database engine can radically upend an entire project.

It’s little surprise then that database technologies are among the longest-lasting engineering projects in the modern software developer toolkit. MySQL, which remains one of the most popular database engines in the world, was first released in the mid-1990s, and Oracle Database, launched more than four decades ago, is still widely used in high-performance corporate environments.

Database technology can change the world, but the world in these parts changes very, very slowly. That’s made building a startup in the sector a tough equation: Sales cycles can be painfully slow, even when new features can dramatically expand a developer’s capabilities. Competition is stiff and comes from some of the largest and most entrenched tech companies in the world. Exits have also been few and far between.

That challenge — and opportunity — is what makes studying Cockroach Labs so interesting. The company behind CockroachDB attempts to solve a long-standing problem in large-scale, distributed database architecture: How to make it so that data created in one place on the planet is always available for consumption by applications that are thousands of miles away, immediately and accurately. Making global data always available immediately and accurately might sound like a simple use case, but in reality it’s quite the herculean task. Cockroach Labs’ story is one of an uphill struggle, but one that saw it turn into a next-generation, $2-billion-valued database contender.

The lead writer of this EC-1 is Bob Reselman. Reselman has been writing about the enterprise software market for more than two decades, with a particular emphasis on teaching and educating engineers on technology. The lead editor for this package was Danny Crichton, the assistant editor was Ram Iyer, the copy editor was Richard Dal Porto, figures were designed by Bob Reselman and stylized by Bryce Durbin, and illustrations were drawn by Nigel Sussman.

CockroachDB had no say in the content of this analysis and did not get advance access to it. Reselman has no financial ties to CockroachDB or other conflicts of interest to disclose.

The CockroachDB EC-1 comprises four main articles numbering 9,100 words and a reading time of 37 minutes. Here’s what we’ll be crawling over:

We’re always iterating on the EC-1 format. If you have questions, comments or ideas, please send an email to TechCrunch Managing Editor Danny Crichton at [email protected].


By Danny Crichton

CockroachDB, the database that just won’t die

There is an art to engineering, and sometimes engineering can transform art. For Spencer Kimball and Peter Mattis, those two worlds collided when they created the widely successful open-source graphics program, GIMP, as college students at Berkeley.

That project was so successful that when the two joined Google in 2002, Sergey Brin and Larry Page personally stopped by to tell the new hires how much they liked it and explained how they used the program to create the first Google logo.

Cockroach Labs was started by developers and stays true to its roots to this day.

In terms of good fortune in the corporate hierarchy, when you get this type of recognition in a company such as Google, there’s only one way you can go — up. They went from rising stars to stars at Google, becoming the go-to guys on the Infrastructure Team. They could easily have looked forward to a lifetime of lucrative employment.

But Kimball, Mattis and another Google employee, Ben Darnell, wanted more — a company of their own. To realize their ambitions, they created Cockroach Labs, the business entity behind their ambitious open-source database CockroachDB. Can some of the smartest former engineers in Google’s arsenal upend the world of databases in a market spotted with the gravesites of storage dreams past? That’s what we are here to find out.

Berkeley software distribution

Mattis and Kimball were roommates at Berkeley majoring in computer science in the early-to-mid-1990s. In addition to their usual studies, they also became involved with the eXperimental Computing Facility (XCF), an organization of undergraduates who have a keen, almost obsessive interest in CS.


By Danny Crichton

Cockroach Labs scores $86.6M Series D as scalable database resonates

Cockroach Labs, the NYC enterprise database company, announced an $86.6 million Series D funding round today. The company was in no mood to talk valuations, but was happy to have a big chunk of money to help build on its recent success and ride out the current economic malaise.

Altimeter Capital and Bond co-led the round with participation from Benchmark, GV, Index Ventures, Redpoint Ventures, Sequoia Capital and Tiger Capital. Today’s funding comes on top of a $55 million Series C last August, and brings the total raised to $195 million, according to the company.

Cockroach has a tough job. It’s battling both traditional databases like Oracle and modern ones from the likes of Amazon, but investors see a company with a lot of potential market building an open source, on prem and cloud database product. In particular, the open source product provides a way to attract users and turn some percentage of those into potential customers, an approach investors tend to favor.

CEO and co-founder Spenser Kimball says that the company had been growing fast before the pandemic hit. “I think the biggest change between now and last year has just been our go to market which is seeing pretty explosive growth. By number of customers, we’ve grown by almost 300%,” Kimball told TechCrunch.

He says having that three-pronged approach of open source, cloud an on-prem products has really helped fuel that growth. The company launched the cloud service in 2018 and it has helped expand its market. Whereas the on-prem version was mostly aimed at larger customers, the managed service puts Cockroach in reach of individual developers and teams, who might not want to deal with all of the overhead of managing a complex database on their own.

Kimball says it’s really too soon to say what impact the pandemic will have on his business. He recognizes that certain verticals like travel, hospitality and some retail business are probably going to suffer, but other businesses that are accelerating in the crisis could make use of a highly scalable database like CockroachDB.

“Obviously it’s a new world right now. I think there are going to be some losers and some winners, but on balance I think [our] momentum will continue to grow for something that really does represent a best in class solution for businesses, whether they are startups or big enterprises, as they’re trying to figure out how to build for a cloud native future,” Kimball said.

The company intends to keep hiring through this, but is being careful and regularly evaluating what its needs are much more carefully than it might have done prior to this crisis with a much more open mind toward remote work.

Kimball certainly recognizes that it’s not an easy time to be raising this kind of cash and he is grateful to have the confidence of investors to keep growing his company, come what may.


By Ron Miller

Pulumi brings support for more languages to its infrastructure-as-code platform

Seattle-based Pulumi has quickly made a name for itself as a modern platform that lets developers specify their infrastructure through writing code in their preferred programming language — and not YAML. With the launch of Pulumi 2.0, those languages now include JavaScript, TypeScript, Go and .NET, in addition to its original support for Python. It’s also now extending its reach beyond its core infrastructure features to include deeper support for policy enforcement, testing and more.

As the company also today announced, it now has over 10,000 users and more than 100 paying customers. With that, it’s seeing a 10x increase in its year-over-year annual run rate, though without knowing the exact numbers, it’s obviously hard to know what exactly to make of that number. Current customers include the likes of Cockroach Labs, Mercedes-Benz and Tableau .

When the company first launched, its messaging was very much around containers and serverless. But as Pulumi founder and CEO Joe Duffy told me, today the company is often directly engaging with infrastructure teams that are building the platforms for the engineers in their respective companies.

As for Pulumi 2.0, Duffy says that “this is really taking the original Pulumi vision of infrastructure as code — using your favorite language — and augmenting it with what we’re calling superpowers.” That includes expanding the product’s overall capabilities from infrastructure provisioning to the adjacent problem spaces. That includes continuous delivery, but also policy-as-code. This extends the original Pulumi vision beyond just infrastructure but now also lets developers encapsulate their various infrastructure policies as code, as well.

Another area is testing. Because Pulumi allows developers to use “real” programming languages, they can also use the same testing techniques they are used to from the application development world to test the code they use to build their underlying infrastructure and catch mistakes before they go into production. And with all of that, developers can also use all of the usual tools they use to write code for defining the infrastructure that this code will then run on.

“The underlying philosophy is taking our heritage of using the best of what we know and love about programming languages — and really applying that to the entire spectrum of challenges people face when it comes to cloud infrastructure, from development to infrastructure teams to security engineers, really helping the entire organization be more productive working together,” said Duffy. “I think that’s the key: moving from infrastructure provisioning to something that works for the whole organization.”

Duffy also highlighted that many of the company’s larger enterprise users are relying on Pulumi to encode their own internal architectures as code and then roll them out across the company.

“We still embrace what makes each of the clouds special. AWS, Azure, Google Cloud and Kubernetes,” Duffy said. “We’re not trying to be a PaaS that abstracts over all. We’re just helping to be the consistent workflow across the entire team to help people adopt the modern approaches.”


By Frederic Lardinois

Cockroach Labs announces $55M Series C to battle industry giants

Cockroach Labs, makers of CockroachDB, sits in a tough position in the database market. On one side, it has traditional database vendors like Oracle, and on the other there’s AWS and its family of databases. It takes some good technology and serious dollars to compete with those companies. Cockroach took care of the latter with a $55 million Series C round today.

The round was led by Altimeter Capital and Tiger Global along with existing investor GV. Other existing investors including Benchmark, Index Ventures, Redpoint Ventures, FirstMark Capital and Work-Bench also participated. Today’s investment brings the total raised to over $110 million, according to the company.

Spencer Kimball, co-founder and CEO, says the company is building a modern database to compete with these industry giants. “CockroachDB is architected from the ground up as a cloud native database. Fundamentally, what that means is that it’s distributed, not just across nodes in a single data center, which is really table stakes as the database gets bigger, but also across data centers to be resilient. It’s also distributed potentially across the planet in order to give a global customer base what feels like a local experience to keep the data near them,” Kimball explained.

At the same time, even while it has a cloud product hosted on AWS, it also competes with several AWS database products including Amazon Aurora, Redshift and DynamoDB. Much like MongoDB, which changed its open source licensing structure last year, Cockroach did as well, for many of the same reasons. They both believed bigger players were taking advantage of the open source nature of their products to undermine their markets.

“If you’re trying to build a business around an open source product, you have to be careful that a much bigger player doesn’t come along and extract too much of the value out of the open source product that you’ve been building and maintaining,” Kimball explained.

As the company deals with all of these competitive pressures, it takes a fair bit of money to continue building a piece of technology to beat the competition, while going up against much deeper-pocketed rivals. So far the company has been doing well with Q1 revenue this year doubling all of last year. Kimball indicated that Q2 could double Q1, but he wants to keep that going, and that takes money.

“We need to accelerate that sales momentum and that’s usually what the Series C is about. Fundamentally, we have, I think, the most advanced capabilities in the market right now. Certainly we do if you look at the differentiator around just global capability. We nevertheless are competing with Oracle on one side, and Amazon on the other side. So a lot of this money is going towards product development too,” he said.

Cockroach Labs was founded in 2015, and is based in New York City.


By Ron Miller

Cockroach Labs launches CockroachDB as managed service

Cockroach Lab’s open source SQL database, CockroachDB, has been making inroads since it launched last year, but as any open source technology matures, in order to move deeper into markets it has to move beyond technical early adopters to a more generalized audience. To help achieve that, the company announced a new CockroachDB managed service today.

The service has been designed to be cloud-agnostic, and for starters it’s going to be available on Amazon Web Services and Google Cloud Platform. Cockroach, which launched in 2015, has always positioned itself as modern cloud alternative to the likes of Oracle or even Amazon’s Aurora database.

As company co-founder and CEO Spencer Kimball told me in an interview in May, those companies involve too much vendor lock-in for his taste. His company launched as open alternative to all of that. “You can migrate a Cockroach cluster from one cloud to another with no down time,” Kimball told TechCrunch in May.

He believes having that kind of flexibility is a huge advantage over what other vendors are offering, and today’s announcement carries that a step further. Instead of doing all the heavy lifting of setting up and managing a database and the related infrastructure, Cockroach is now offering CockroachDB as a service to handle all of that for you.

Kimball certainly recognizes that by offering his company’s product in this format, it will help grow his market. “We’ve been seeing significant migration activity away from Oracle, AWS Aurora, and Cassandra, and we’re now able to get our customers to market faster with Managed CockroachDB,” Kimball said in a statement.

The database itself offers the advantage of being ultra-resilient, meaning it stays up and running under most circumstances and that’s a huge value proposition for any database product. It achieves up time through replication, so if one version of itself goes down, the next can take over.

As an open source tool, it has been making money up until now by offering an enterprise version, which includes backup, support and other premium pieces. With today’s announcement, the company can get a more direct revenue stream from customers subscribing to the database service.

A year ago, the company announced version 1.0 of CockroachDB and $27 million in Series B financing, which was led by Redpoint with participation from Benchmark, GV, Index Ventures and FirstMark. They’ve obviously been putting that money to good use developing this new managed service.


By Ron Miller

Timescale is leading the next wave of NYC database tech

Data is the lifeblood of the modern corporation, yet acquiring, storing, processing, and analyzing it remains a remarkably challenging and expensive project. Every time data infrastructure finally catches up with the streams of information pouring in, another source and more demanding decision-making makes the existing technology obsolete.

Few cities rely on data the same way as New York City, nor has any other city so shaped the technology that underpins our data infrastructure. Back in the 1960s, banks and accounting firms helped to drive much of the original computation industry with their massive finance applications. Today, that industry has been supplanted by finance and advertising, both of which need to make microsecond decisions based on petabyte datasets and complex statistical models.

Unsurprisingly, the city’s hunger for data has led to waves of database companies finding their home in the city.

As web applications became increasingly popular in the mid-aughts, SQL databases came under increasing strain to scale, while also proving to be inflexible in terms of their data schemas for the fast-moving startups they served. That problem spawned Manhattan-based MongoDB, whose flexible “NoSQL” schemas and horizontal scaling capabilities made it the default choice for a generation of startups. The company would go on to raise $311 million according to Crunchbase, and debuted late last year on NASDAQ, trading today with a market cap of $2 billion.

At the same time that the NoSQL movement was hitting its stride, academic researchers and entrepreneurs were exploring how to evolve SQL to scale like its NoSQL competitors, while retaining the kinds of features (joining tables, transactions) that make SQL so convenient for developers.

One leading company in this next generation of database tech is New York-based Cockroach Labs, which was founded in 2015 by a trio of former Square, Viewfinder, and Google engineers. The company has gone on to raise more than $50 million according to Crunchbase from a luminary list of investors including Peter Fenton at Benchmark, Mike Volpi at Index, and Satish Dharmaraj at Redpoint, along with GV and Sequoia.

While web applications have their own peculiar data needs, the rise of the internet of things (IoT) created a whole new set of data challenges. How can streams of data from potentially millions of devices be stored in an easily analyzable manner? How could companies build real-time systems to respond to that data?

Mike Freedman and Ajay Kulkarni saw that problem increasingly manifesting itself in 2015. The two had been roommates at MIT in the late 90s, and then went on separate paths into academia and industry respectively. Freedman went to Stanford for a PhD in computer science, and nearly joined the spinout of Nicira, which sold to VMware in 2012 for $1.26 billion. Kulkarni joked that “Mike made the financially wise decision of not joining them,” and Freedman eventually went to Princeton as an assistant professor, and was awarded tenure in 2013. Kulkarni founded and worked at a variety of startups including GroupMe, as well as receiving an MBA from MIT.

The two had startup dreams, and tried building an IoT platform. As they started building it though, they realized they would need a real-time database to process the data streams coming in from devices. “There are a lot of time series databases, [so] let’s grab one off the shelf, and then we evaluated a few,” Kulkarni explained. They realized what they needed was a hybrid of SQL and NoSQL, and nothing they could find offered the feature set they required to power their platform. That challenge became the problem to be solved, and Timescale was born.

In many ways, Timescale is how you build a database in 2018. Rather than starting de novo, the team decided to build on top of Postgres, a popular open-source SQL database. “By building on top of Postgres, we became the more reliable option,” Kulkarni said of their thinking. In addition, the company opted to make the database fully open source. “In this day and age, in order to get wide adoption, you have to be an open source database company,” he said.

Since the project’s first public git commit on October 18, 2016, the company’s database has received nearly 4,500 stars on Github, and it has raised $16.1 million from Benchmark and NEA .

Far more important though are their customers, who are definitely not the typical tech startup roster and include companies from oil and gas, mining, and telecommunications. “You don’t think of them as early adopters, but they have a need, and because we built it on top of Postgres, it integrates into an ecosystem that they know,” Freedman explained. Kulkarni continued, “And the problem they have is that they have all of this time series data, and it isn’t sitting in the corner, it is integrated with their core service.”

New York has been a strong home for the two founders. Freedman continues to be a professor at Princeton, where he has built a pipeline of potential grads for the company. More widely, Kulkarni said, “Some of the most experienced people in databases are in the financial industry, and that’s here.” That’s evident in one of their investors, hedge fund Two Sigma. “Two Sigma had been the only venture firm that we talked to that already had built out their own time series database,” Kulkarni noted.

The two also benefit from paying customers. “I think the Bay Area is great for open source adoption, but a lot of Bay Area companies, they develop their own database tech, or they use an open source project and never pay for it,” Kulkarni said. Being in New York has meant closer collaboration with customers, and ultimately more revenues.

Open source plus revenues. It’s the database way, and the next wave of innovation in the NYC enterprise infrastructure ecosystem.