Temporal raises $18.75M for its microservices orchestration platform

Temporal, a Seattle-based startup that is building an open-source, stateful microservices orchestration platform, today announced that it has raised an $18.75 million Series A round led by Sequoia Ventures. Existing investors Addition Ventures and Amplify Partners also joined, together with new investor Madrona Venture Group. With this, the company has now raised a total of $25.5 million.

Founded by Maxim Fateev (CEO) and Samar Abbas (CTO), who created the open-source Cadence orchestration engine during their time at Uber, Temporal aims to make it easier for developers and operators to run microservices in production. Current users include the likes of Box and Snap.

“Before microservices, coding applications was much simpler,” Temporal’s Fateev told me. “Resources were always located in the same place — the monolith server with a single DB — which meant developers didn’t have to codify a bunch of guessing about where things were. Microservices, on the other hand, are highly distributed, which means developers need to coordinate changes across a number of servers in different physical locations.”

Those servers could go down at any time, so engineers often spend a lot of time building custom reliability code to make calls to these services. As Fateev argues, that’s table stakes and doesn’t help these developers create something that builds real business value. Temporal gives these developers access to a set of what the team calls ‘reliability primitives’ that handle these use cases. “This means developers spend far more time writing differentiated code for their business and end up with a more reliable application than they could have built themselves,” said Fateev.

Temporal’s target use is virtually any developer who works with microservices — and wants them to be reliable. Because of this, the company’s tool — despite offering a read-only web-based user interface for administering and monitoring the system — isn’t the main focus here. The company also doesn’t have any plans to create a no-code/low-code workflow builder, Fateev tells me. However, since it is open-source, quite a few Temporal users build their own solutions on top of it.

The company itself plans to offer a cloud-based Temporal-as-a-Service offering soon. Interestingly, Fateev tells me that the team isn’t looking at offering enterprise support or licensing in the near future, though. “After spending a lot of time thinking it over, we decided a hosted offering was best for the open-source community and long term growth of the business,” he said.

Unsurprisingly, the company plans to use the new funding to improve its existing tool and build out this cloud service, with plans to launch it into general availability next year. At the same time, the team plans to say true to its open-source roots and host events and provide more resources to its community.

“Temporal enables Snapchat to focus on building the business logic of a robust asynchronous API system without requiring a complex state management infrastructure,” said Steven Sun, Snap Tech Lead, Staff Software Engineer. “This has improved the efficiency of launching our services for the Snapchat community.”


By Frederic Lardinois

Kong launches Kong Konnect, its cloud-native connectivity platform

At its (virtual) Kong Summit 2020, API platform Kong today announced the launch of Kong Konnect, its managed end-to-end cloud-native connectivity platform. The idea here is to give businesses a single service that allows them to manage the connectivity between their APIs and microservices and help developers and operators manage their workflows across Kong’s API Gateway, Kubernetes Ingress and King Service Mesh runtimes.

“It’s a universal control plane delivery cloud that’s consumption-based, where you can manage and orchestrate API gateway runtime, service mesh runtime, and Kubernetes Ingress controller runtime — and even Insomnia for design — all from one platform,” Kong CEO and co-founder Augusto ‘Aghi’ Marietti told me.

The new service is now in private beta and will become generally available in early 2021.

Image Credits: Kong

At the core of the platform is Kong’s new so-called ServiceHub, which provides that single pane of glass for managing a company’s services across the organization (and make them accessible across teams, too).

As Marietti noted, organizations can choose which runtime they want to use and purchase only those capabilities of the service that they currently need. The platform also includes built-in monitoring tools and supports any cloud, Kubernetes provider or on-premises environment, as long as they are Kubernetes-based.

The idea here, too, is to make all these tools accessible to developers and not just architects and operators. “I think that’s a key advantage, too,” Marietti said. “We are lowering the barrier by making a connectivity technology easier to be used by the 50 million developers — not just by the architects that were doing big grand plans at a large company.”

To do this, Konnect will be available as a self-service platform, reducing the friction of adopting the service.

Image Credits: Kong

This is also part of the company’s grander plan to go beyond its core API management services. Those services aren’t going away, but they are now part of the larger Kong platform. With its open-source Kong API Gateway, the company built the pathway to get to this point, but that’s a stable product now and it’s now clearly expanding beyond that with this cloud connectivity play that takes the company’s existing runtimes and combines them to provide a more comprehensive service.

“We have upgraded the vision of really becoming an end-to-end cloud connectivity company,” Marietti said. “Whether that’s API management or Kubernetes Ingress, […] or Kuma Service Mesh. It’s about connectivity problems. And so the company uplifted that solution to the enterprise.”

 


By Frederic Lardinois

Google makes converting VMs to containers easier with the GA of Migrate for Anthos

At its Cloud Next event in London, Google today announced a number of product updates around its managed Anthos platform, as well as Apigee and its Cloud Code tools for building modern applications that can then be deployed to Google Cloud or any Kubernetes cluster.

Anthos is one of the most important recent launches for Google, as it expands the company’s reach outside of Google Cloud and into its customers’ data centers and, increasingly, edge deployments. At today’s event, the company announced that it is taking Anthos Migrate out of beta and into general availability. The overall idea behind Migrate is that it allows enterprises to take their existing, VM-based workloads and convert them into containers. Those machines could come from on-prem environments, AWS, Azure or Google’s Compute Engine, and — once converted — can then run in Anthos GKE, the Kubernetes service that’s part of the platform.

“That really helps customers think about a leapfrog strategy, where they can maintain the existing VMs but benefit from the operational model of Kubernetes,” Google VP of product management Jennifer Lin told me. “So even though you may not get all of the benefits of a cloud-native container day one, what you do get is consistency in the operational paradigm.”

As for Anthos itself, Lin tells me that Google is seeing some good momentum. The company is highlighting a number of customers at today’s event, including Germany’s Kaeser Kompressoren and Turkey’s Denizbank.

Lin noted that a lot of financial institutions are interested in Anthos. “A lot of the need to do data-driven applications, that’s where Kubernetes has really hit that sweet spot because now you have a number of distributed datasets and you need to put a web or mobile front end on [them],” she explained. “You can’t do it as a monolithic app, you really do need to tap into a number of datasets — you need to do real-time analytics and then present it through a web or mobile front end. This really is a sweet spot for us.”

Also new today is the general availability of Cloud Code, Google’s set of extensions for IDEs like Visual Studio Code and IntelliJ that helps developers build, deploy and debug their cloud-native applications more quickly. The idea, here, of course, is to remove friction from building containers and deploying them to Kubernetes.

In addition, Apigee hybrid is now also generally available. This tool makes it easier for developers and operators to manage their APIs across hybrid and multi-cloud environments, a challenge that is becoming increasingly common for enterprises. This makes it easier to deploy Apigee’s API runtimes in hybrid environments and still get the benefits of Apigees monitoring and analytics tools in the cloud. Apigee hybrid, of course, can also be deployed to Anthos.


By Frederic Lardinois

Zoho launches Catalyst, a new developer platform with a focus on microservices

Zoho may be one of the most underrated tech companies. The 23-year-old company, which at this point offers more than 45 products, has never taken outside funding and has no ambition to go public, yet it’s highly profitable and runs its own data centers around the world. And today, it’s launching Catalyst, a cloud-based developer platform with a focus on microservices that it hopes can challenge those of many of its larger competitors.

The company already offered a low-code tool for building business apps. But Catalyst is different. Zoho isn’t following in the footsteps of Google or Amazon here and offering a relatively unopinionated platform for running virtual machines and containers. Indeed, it does nothing of the sort. The company is 100% betting on serverless as the next major technology for building enterprise apps and the whole platform has been tuned for this purpose.

Catalyst Zia AI

“Historically, when you look at cloud computing, when you look at any public clouds, they pretty much range from virtualizing your servers and renting our virtual servers all the way up the stack,” Raju Vegesna, Zoho’s chief evangelist, said when I asked him about this decision to bet on serverless. “But when you look at it from a developer’s point of view, you still have to deal with a lot of baggage. You still have to figure out the operating system, you still have to figure out the database. And then you have to scale and manage the updates. All of that has to be done at the application infrastructure level.” In recent years, though, said Vegesna, the focus has shifted to the app logic side, with databases and file servers being abstracted away. And that’s the trend Zoho is hoping to capitalize on with Catalyst.

What Catalyst does do is give advanced developers a platform to build, run and manage event-driven microservice-based applications that can, among other things, also tap into many of the tools that Zoho built for running its own applications, like a grammar checker for Zoho Writer, document previews for Zoho Drive or access to its Zia AI tools for OCR, sentiment analysis and predictions. The platform gives developers tools to orchestrate the various microservices, which obviously means it’ll make it easy to scale applications as needed, too. It integrates with existing CI/CD pipelines and IDEs.

Catalyst Functions

Catalyst also complies with the SOC Type II and ISO 27001 certifications, as well as GDPR. It also offers developers the ability to access data from Zoho’s own applications, as well as third-party tools, all backed by Zoho’s Unified Data Model, a relational datastore for server-side and client deployment.

“The infrastructure that we built over the last several years is now being exposed,” said Vegesna. He also stressed that Zoho is launching the complete platform in one go (though it will obviously add to it over time). “We are bringing everything together so that you can develop a mobile or web app from a single interface,” he said. “We are not just throwing 50 different disparate services out there.” At the same time, though, the company is also opting for a very deliberate approach here with its focus on serverless. That, Vegesna believes, will allow Zoho Catalyst to compete with its larger competitors.

It’s also worth noting that Zoho knows that it’s playing the long-game here, something it is familiar with, given that it launched its first product, Zoho Writer, back in 2005 before Google had launched its productivity suite.

Catalyst Homepage

 


By Frederic Lardinois

Kong acquires Insomnia, launches Kong Studio for API development

API and microservices platform Kong today announced that it has acquired Insomnia, a popular open-source tool for debugging APIs. The company, which also recently announced that it had raised a $43 million Series C round, has already put this acquisition to work by using it to build Kong Studio, a tool for designing, building and maintaining APIs for both REST and GraphQL endpoints.

As Kong CEO and co-founder Augusto Marietti told me, the company wants to expand its platform to cover the full service life cycle. So far, it has mostly focused on the runtime, but now it wants to enable developers to also design and test their services. “We looked at the space and Insomnia is the number one open source API testing platform,” he told me. “And we thought that by having Insomnia in our portfolio, we will get the pre-production part of things and on top of that, we’ll be able to build Kong Studio, which is kind of the other side of Insomnia that allows you to design APIs.”

For Oct. 2 Kong News Kong Service Control Platform

Insomnia launched in 2015, as a side project of its sole developer, Greg Schier. Schier quit his job in 2016 to focus on Insomnia full-time and then open-sourced it in 2017. Today, the project has 100 contributors and the tool is used by “hundreds of thousands of developers,” according to Schier.

Marietti says both the open-source project and the paid Insomnia Plus service will continue to operate as before.

In addition to Kong Studio and the Insomnia acquisition, the company also today launched the latest version of its Enterprise service, the aptly named Kong Enterprise 2020. New features here include support for REST, Kafka Streams and GraphQL. Kong also launched Kong Gateway 2.0 with additional GraphQL support and the ability to write plugins in Go.


By Frederic Lardinois

Microsoft makes a push for service mesh interoperability

Services meshes. They are the hot new thing in the cloud native computing world. At Kubecon, the bi-annual festival of all things cloud native, Microsoft today announced that it is teaming up with a number of companies in this space to create a generic service mesh interface. This will make it easier for developers to adopt the concept without locking them into a specific technology.

In a world where the number of network endpoints continues to increase as developers launch new micro-services, containers and other systems at a rapid clip, they are making the network smarter again by handling encryption, traffic management and other functions so that the actual applications don’t have to worry about that. With a number of competing service mesh technologies, though, including the likes of Istio and Linkerd, developers currently have to chose which one of these to support.

“I’m really thrilled to see that we were able to pull together a pretty broad consortium of folks from across the industry to help us drive some interoperability in the service mesh space,” Gabe Monroy, Microsoft’s lead product manager for containers and the former CTO of Deis, told me. “This is obviously hot technology — and for good reasons. The cloud-native ecosystem is driving the need for smarter networks and smarter pipes and service mesh technology provides answers.”

The partners here include Buoyant, HashiCorp, Solo.io, Red Hat, AspenMesh, Weaveworks, Docker, Rancher, Pivotal, Kinvolk and VMWare. That’s a pretty broad coalition, though it notably doesn’t include cloud heavyweights like Google, the company behind Istio, and AWS.

“In a rapidly evolving ecosystem, having a set of common standards is critical to preserving the best possible end-user experience,” said Idit Levine, founder and CEO of Solo.io. “This was the vision behind SuperGloo – to create an abstraction layer for consistency across different meshes, which led us to the release of Service Mesh Hub last week. We are excited to see service mesh adoption evolve into an industry level initiative with the SMI specification.”

For the time being, the interoperability features focus on traffic policy, telemetry and traffic management. Monroy argues that these are the most pressing problems right now. He also stressed that this common interface still allows the different service mesh tools to innovate and that developers can always work directly with their APIs when needed. He also stressed that the Service Mesh Interface (SMI), as this new specification is called, does not provide any of its own implementations of these features. It only defines a common set of APIs.

Currently, the most well-known service mesh is probably Istio, which Google, IBM and Lyft launched about two years ago. SMI may just bring a bit more competition to this market since it will allow developers to bet on the overall idea of a service mesh instead of a specific implementation.

In addition to SMI, Microsoft also today announced a couple of other updates around its cloud-native and Kubernetes services. It announced the first alpha of the Helm 3 package manager, for example, as well as the 1.0 release of its Kubernetes extension for Visual Studio Code and the general availability of its AKS virtual nodes, using the open source Virtual Kubelet project.

 


By Frederic Lardinois

Solo.io wants to bring order to service meshes with centralized management hub

As containers and microservices have proliferated, a new kind of tool called the service mesh has developed to help manage and understand interactions between services. While Kubernetes has emerged as the clear container orchestration tool of choice, there is much less certainty in the service mesh market. Solo.io announced a new open source tool called Service Mesh Hub today, designed to help companies manage multiple service meshes in a single interface.

It is early days for the service mesh concept, but there are already multiple offerings including Istio, Linkerd (pronounced Linker-Dee) and Convoy. While the market sorts itself it out, it requires a new set of tools, a management layer, so that developers and operations can monitor and understand what’s happening inside the various service meshes they are running.

Idit Levine, founder and CEO at Solo, say she formed the company because she saw an opportunity to develop a set of tooling for a nascent market. Since founding the company in 2017, it has developed several open source tools to fill that service mesh tool vacuum.

Levine says that she recognized that companies would be using multiple service meshes for multiple situations and that not every company would have the technical capabilities to manage this. That is where the idea for the Service Mesh Hub was born.

It’s a centralized place for companies to add the different service mesh tools they are using, understand the interactions happening within the mesh and add extensions to each one from a kind of extension app store. Solo wants to make adding these tools a simple matter of pointing and clicking. While it obviously still requires a certain level of knowledge about how these tools work, it removes some of the complexity around managing them.

Solo.io Service Mesh Hub

Solo.io Service Mesh Hub. Screenshot: Solo.io

“The reason we created this is because we believe service mesh is something big, and we want people to use it, and we feel it’s hard to adopt right now. We believe by creating that kind of framework or platform, it will make it easier for people to actually use it,” Levine told TechCrunch.

The vision is that eventually companies will be able to add extensions to the store for free, or even at some point for a fee, and it is through these paid extensions that the company will be able to make money. She recognized that some companies will be creating extensions for internal use only, and in those cases, they can add them to the hub and mark them as private and only that company can see them.

For every abstraction it seems, there is a new set of problems to solve. The service mesh is a response to the problem of managing multiple services. It solves three key issues, according to Levine. It allows a company to route the microservices, have visibility into them to see logs and metrics of the mesh and to provide security to manage which services can talk to each other.

Levine’s company is a response to the issues that have developed around understanding and managing the service meshes themselves. She says she doesn’t worry about a big company coming in and undermining her mission because she says that they are too focused on their own tools to create a set of uber-management tool like these (but that doesn’t mean the company wouldn’t be an attractive acquisition target).

So far, the company has taken over $13 million in funding, according to Crunchbase data.


By Ron Miller

Instana raises $30M for its application performance monitoring service

Instana, an application performance monitoring (APM) service with a focus on modern containerized services, today announced that it has raised a $30 million Series C funding round. The round was led by Meritech Capital, with participation from existing investor Accel. This brings Instana’s total funding to $57 million.

The company, which counts the likes of Audi, Edmunds.com, Yahoo Japan and Franklin American Mortgage as its customers, considers itself an APM 3.0 player. It argues that its solution is far lighter than those of older players like New Relic and AppDynamics (which sold to Cisco hours before it was supposed to go public). Those solutions, the company says, weren’t built for modern software organizations (though I’m sure they would dispute that).

What really makes Instana stand out is its ability to automatically discover and monitor the ever-changing infrastructure that makes up a modern application, especially when it comes to running containerized microservices. The service automatically catalogs all of the endpoints that make up a service’s infrastructure, and then monitors them. It’s also worth noting that the company says that it can offer far more granular metrics that its competitors.

Instana says that its annual sales grew 600 percent over the course of the last year, something that surely attracted this new investment.

“Monitoring containerized microservice applications has become a critical requirement for today’s digital enterprises,” said Meritech Capital’s Alex Kurland. “Instana is packed with industry veterans who understand the APM industry, as well as the paradigm shifts now occurring in agile software development. Meritech is excited to partner with Instana as they continue to disrupt one of the largest and most important markets with their automated APM experience.”

The company plans to use the new funding to fulfill the demand for its service and expand its product line.


By Frederic Lardinois

The Istio service mesh hits version 1.0

Istio, the service mesh for microservices from Google, IBM, Lyft, Red Hat and many other players in the open source community, launched version 1.0 of its tools today.

If you’re not into service meshes, that’s understandable. Few people are. But Istio is probably one of the most important new open source projects out there right now. It sits at the intersection of a number of industry trends like containers, microservices and serverless computing and makes it easier for enterprises to embrace them. Istio now has over 200 contributors and the code has seen over 4,000 check-ins since the launch of version 0.1.

Istio, at its core, handles the routing, load balancing, flow control and security needs of microservices. It sits on top of existing distributed applications and basically helps them talk to each other securely, while also providing logging, telemetry and the necessarypolicies that keep things under control (and secure). It also features support for canary releases, which allow developers to test updates with a few users before launching them to a wider audience, something that Google and other webscale companies have long done internally.

“In the area of microservices, things are moving so quickly,” Google product manager Jennifer Lin told me. “Andwith the successofKubernetesandthe abstraction aroundcontainer orchestration, Istio wasformed as an open source project to really take the next step in terms of a substrate formicroservice developmentas well as a path for VM-based workloads to move into more ofaservice management layer. So it’s really focused around the right level of abstractionsfor services and creatingaconsistent environment for managing that.”

Even before the 1.0 release, a number of companies already adopted Istio in production, including the likes of eBay and Auto Trader UK. Lin argues that this is a sign that Istio solves a problem that a lot of businesses are facing today as they adopt microservices. “A number of more sophisticated customers tried to build their own service management layer and while we hadn’t yet declared1.0, we hard a number of customers — including a surprising number of large enterprise customer– say, ‘you know, even though you’re not 1.0, I’m very comfortable putting this in production because what I’m comparing it to is much more raw.’”

IBM Fellow and VP of Cloud Jason McGee agrees with this and notes that “our mission sinceIstio’s launch has been to enable everyone to succeed with microservices, especially in the enterprise. This is why we’ve focused the community around improving security and scale, and heavily leaned our contributions on what we’ve learned from building agile cloud architectures for companies of all sizes.”

A lot of the large cloud players now support Istio directly, too. IBM supports it on top of its Kubernetes Service, for example, and Google even announced a managed Istio service for its Google Cloud users, as well as some additional open source tooling for serverless applications built on top of Kubernetes and Istio.

Two names missing from today’s party are Microsoft and Amazon. I think that’ll change over time, though, assuming the project keeps its momentum.

Istioalso isn’t part of any major open source foundation yet. The Cloud Native Computing Foundation (CNCF), the home of Kubernetes, is backing linkerd, a project that isn’t all that dissimilar from Istio. Once a 1.0 release of these kinds of projects rolls around, the maintainers often start looking for a foundation that can shepherd the development of the project over time. I’m guessing its only a matter of time before we hear more about where Istio will land.


By Frederic Lardinois

Google Cloud goes all-in on hybrid with its new Cloud Services Platform

The cloud isn’t right for every business, be that because of latency constraints at the edge, regulatory requirements or because it’s simply cheaper to own and operate their own data centers for their specific workloads. Given this, it’s maybe no surprise that the vast majority of enterprises today use both public and private clouds in parallel. That’s something Microsoft has long been betting on as part of its strategy for its Azure cloud, and Google, too, is now taking a number of steps in this direction.

With the open-source Kubernetes project, Google launched one of the fundamental building blocks that make running and managing applications in hybrid environments easier for large enterprises. What Google hadn’t done until today, though, is launch a comprehensive solution that includes all of the necessary parts for this kind of deployment. With its new Cloud Services Platform, though, the company is now offering businesses an integrated set of cloud services that can be deployed on both the Google Cloud Platform and in on-premise environments.

As Google Cloud engineering director Chen Goldberg noted in a press briefing ahead of today’s announcement, many businesses also simply want to be able to manage their own workloads on-premise but still be able to access new machine learning tools in the cloud, for example. “Today, to achieve this, use cases involve a compromise between cost, consistency, control and flexibility,” she said. “And this all negatively impacts the desired result.”

Goldberg stressed that the idea behind the Cloud Services Platform is to meet businesses where they are and then allow them to modernize their stack at their own pace. But she also noted that businesses want more than just the ability to move workloads between environments. “Portability isn’t enough,” she said. “Users want consistent experiences so that they can train their team once and run anywhere — and have a single playbook for all environments.”

The two services at the core of this new offering are the Kubernetes container orchestration tool and Istio, a relatively new but quickly growing tool for connecting, managing and securing microservices. Istio is about to hit its 1.0 release.

We’re not simply talking about a collection of open-source tools here. The core of the Cloud Services Platform, Goldberg noted, is “custom configured and battle-tested for enterprises by Google.” In addition, it is deeply integrated with other services in the Google Cloud, including the company’s machine learning tools.

GKE On-Prem

Among these new custom-configured tools are a number of new offerings, which are all part of the larger platform. Maybe the most interesting of these is GKE On-Prem. GKE, the Google Kubernetes Engine, is the core Google Cloud service for managing containers in the cloud. And now Google is essentially bringing this service to the enterprise data center, too.

The service includes access to all of the usual features of GKE in the cloud, including the ability to register and manage clusters and monitor them with Stackdriver, as well as identity and access management. It also includes a direct line to the GCP Marketplace, which recently launched support for Kubernetes-based applications.

Using the GCP Console, enterprises can manage both their on-premise and GKE clusters without having to switch between different environments. GKE on-prem connects seamlessly to a Google Cloud Platform environment and looks and behaves exactly like the cloud version.

Enterprise users also can get access to professional services and enterprise-grade support for help with managing the service.

“Google Cloud is the first and only major cloud vendor to deliver managed Kubernetes on-prem,” Goldberg argued.

GKE Policy Management

Related to this, Google also today announced GKE Policy Management, which is meant to provide Kubernetes administrators with a single tool for managing all of their security policies across clusters. It’s agnostic as to where the Kubernetes cluster is running, but you can use it to port your existing Google Cloud identity-based policies to these clusters. This new feature will soon launch in alpha.

Managed Istio

The other major new service Google is launching is Managed Istio (together with Apigee API Management for Istio) to help businesses manage and secure their microservices. The open source Istio service mesh gives admins and operators the tools to manage these services and, with this new managed offering, Google is taking the core of Istio and making it available as a managed service for GKE users.

With this, users get access to Istio’s service discovery mechanisms and its traffic management tools for load balancing and routing traffic to containers and VMs, as well as its tools for getting telemetry back from the workloads that run on these clusters.

In addition to these three main new services, Google is also launching a couple of auxiliary tools around GKE and the serverless computing paradigm today. The first of these is the GKE serverless add-on, which makes it easy to run serverless workloads on GKE with a single-step deploy process. This, Google says, will allow developers to go from source code to container “instantaneously.” This tool is currently available as a preview and Google is making parts of this technology available under the umbrella of its new native open source components. These are the same components that make the serverless add-on possible.

And to wrap it all up, Google also today mentioned a new fully managed continuous integration and delivery service, Google Cloud Build, though the details around this service remain under wraps.

So there you have it. By themselves, all of those announcements may seem a bit esoteric. As a whole, though, they show how Google’s bet on Kubernetes is starting to pay off. As businesses opt for containers to deploy and run their new workloads (and maybe even bring older applications into the cloud), GKE has put Google Cloud on the map to run them in a hosted environment. Now, it makes sense for Google to extend this to its users’ data centers, too. With managed Kubernetes from large and small companies like SUSE, Platform 9, containership is starting to become a big business. It’s no surprise the company that started it all wants to get a piece of this pie, too.


By Frederic Lardinois

How Yelp (mostly) shut down its own data centers and moved to AWS

Back in 2013, Yelp was a 9-year old company built on a set of internal systems. It was coming to the realization that running its own data centers might not be the most efficient way to run a business that was continuing to scale rapidly. At the same time, the company understood that the tech world had changed dramatically from 2004 when it launched and it needed to transform the underlying technology to a more modern approach.

That’s a lot to take on in one bite, but it wasn’t something that happened willy-nilly or overnight says Jason Yellen, SVP of engineering at Yelp . The vast majority of the company’s data was being processed in a massive Python repository that was getting bigger all the time. The conversation about shifting to a microservices architecture began in 2012.

The company was also running the massive Yelp application inside its own datacenters, and as it grew it was increasingly becoming limited by long lead times required to procure and get new hardware online. It saw this was an unsustainable situation over the long-term and began a process of transforming from running a huge monolithic application on-premises to one built on microservices running in the cloud. It was a quite a journey.

The data center conundrum

Yellen described the classic scenario of a company that could benefit from a shift to the cloud. Yelp had a small operations team dedicated to setting up new machines. When engineering anticipated a new resource requirement, they had to give the operations team sufficient lead time to order new servers and get them up and running, certainly not the most efficient way to deal with a resource problem, and one that would have been easily solved by the cloud.

“We kept running into a bottleneck, I was running a chunk of the search team [at the time] and I had to project capacity out to 6-9 months. Then it would take a few months to order machines and another few months to set them up,” Yellen explained. He emphasized that the team charged with getting these machines going was working hard, but there were too few people and too many demands and something had to give.

“We were on this cusp. We could have scaled up that team dramatically and gotten [better] at building data centers and buying servers and doing that really fast, but we were hearing a lot of AWS and the advantages there,” Yellen explained.

To the cloud!

They looked at the cloud market landscape in 2013 and AWS was the clear leader technologically. That meant moving some part of their operations to EC2. Unfortunately, that exposed a new problem: how to manage this new infrastructure in the cloud. This was before the notion of cloud-native computing even existed. There was no Kubernetes. Sure, Google was operating in a cloud-native fashion in-house, but it was not really an option for most companies without a huge team of engineers.

Yelp needed to explore new ways of managing operations in a hybrid cloud environment where some of the applications and data lived in the cloud and some lived in their data center. It was not an easy problem to solve in 2013 and Yelp had to be creative to make it work.

That meant remaining with one foot in the public cloud and the other in a private data center. One tool that helped ease the transition was AWS Direct Connect, which was released the prior year and enabled Yelp to directly connect from their data center to the cloud.

Laying the groundwork

About this time, as they were figuring out how AWS works, another revolutionary technological change was occurring when Docker emerged and began mainstreaming the notion of containerization. “That’s another thing that’s been revolutionary. We could suddenly decouple the context of the running program from the machine it’s running on. Docker gives you this container, and is much lighter weight than virtualization and running full operating systems on a machine,” Yellen explained.

Another thing that was happening was the emergence of the open source data center operating system called Mesos, which offered a way to treat the data center as a single pool of resources. They could apply this notion to wherever the data and applications lived. Mesos also offered a container orchestration tool called Marathon in the days before Kubernetes emerged as a popular way of dealing with this same issue.

“We liked Mesos as a resource allocation framework. It abstracted away the fleet of machines. Mesos abstracts many machines and controls programs across them. Marathon holds guarantees about what containers are running where. We could stitch it all together into this clear opinionated interface,” he said.

Pulling it all together

While all this was happening, Yelp began exploring how to move to the cloud and use a Platform as a Service approach to the software layer. The problem was at the time they started, there wasn’t really any viable way to do this. In the buy versus build decision making that goes on in large transformations like this one, they felt they had little choice but to build that platform layer themselves.

In late 2013 they began to pull together the idea of building this platform on top of Mesos and Docker, giving it the name PaaSTA, an internal joke that stood for Platform as a Service, Totally Awesome. It became simply known as Pasta.

Photo: David Silverman/Getty Images

The project had the ambitious goal of making their infrastructure work as a single fabric, in a cloud-native fashion before most anyone outside of Google was using that term. Pasta developed slowly with the first developer piece coming online in August 2014 and the first  production service later that year in December. The company actually open sourced the technology the following year.

“Pasta gave us the interface between the applications and development teams. Operations had to make sure Pasta is up and running, while Development was responsible for implementing containers that implemented the interface,” Yellen said.

Moving to deeper into the public cloud

While Yelp was busy building these internal systems, AWS wasn’t sitting still. It was also improving its offerings with new instance types, new functionality and better APIs and tooling. Yellen reports this helped immensely as Yelp began a more complete move to the cloud.

He says there were a couple of tipping points as they moved more and more of the application to AWS — including eventually, the master database. This all happened in more recent years as they understood better how to use Pasta to control the processes wherever they lived. What’s more, he said that adoption of other AWS services was now possible due to tighter integration between the in-house data centers and AWS.

Photo: erhui1979/Getty Images

The first tipping point came around 2016 as all new services were configured for the cloud. He said they began to get much better at managing applications and infrastructure in AWS and their thinking shifted from how to migrate to AWS to how to operate and manage it.

Perhaps the biggest step in this years-long transformation came last summer when Yelp moved its master database from its own data center to AWS. “This was the last thing we needed to move over. Otherwise it’s clean up. As of 2018, we are serving zero production traffic through physical data centers,” he said. While they still have two data centers, they are getting to the point, they have the minimum hardware required to run the network backbone.

Yellen said they went from two weeks to a month to get a service up and running before this was all in place to just a couple of minutes. He says any loss of control by moving to the cloud has been easily offset by the convenience of using cloud infrastructure. “We get to focus on the things where we add value,” he said — and that’s the goal of every company.


By Ron Miller