Couchbase raises $105M Series G funding round

Couchbase. the Santa Clara-based company behind the eponymous NoSQL cloud database service, today announced that it has raised a $105 million all-equity Series G round “to expand product development and global go-to-market capabilities.”

The oversubscribed round was led by GPI Capital, with participation from existing investors Accel, Sorenson Capital, North Bridge Venture Partners, Glynn Capital, Adams Street Partners and Mayfield. With this, the company has now raised a total of $251 million, according to Crunchbase.

Back in 2016, Couchbase raised a $30 million down round, which at the time was meant to be the company’s last round before an IPO. That IPO hasn’t materialized, but the company continues to grow, with 30 percent of the Fortune 100 now using its database. Couchbase also today announced that, over the course of the last fiscal year, it saw 70 percent total contract value growth, more than 50 percent new business growth and over 35 percent growth in average subscription deal size. In total, Couchbase said today, it is now seeing almost $100 million in committed annual recurring revenue.

“To be competitive today, enterprises must transform digitally, and use technology to get closer to their customers and improve the productivity of their workforces,” said Couchbase President and CEO Matt Cain in today’s announcement. “To do so, they require a cloud-native database built specifically to support modern web, mobile and IoT applications.  Application developers and enterprise architects rely on Couchbase to enable agile application development on a platform that performs at scale, from the public cloud to the edge, and provides operational simplicity and reliability. More and more, the largest companies in the world truly run their businesses on Couchbase, architecting their most business-critical applications on our platform.”

The company is playing in a large but competitive market, with the likes of MongoDB, DataStax and all the major cloud vendors vying for similar customers in the NoSQL space. One feature that has always made Couchbase stand out is Couchbase Mobile, which extends the service to the cloud. Like some of its competitors, the company has also recently placed its bets on the Kubernetes container orchestration tools with, for example the launch of its Autonomous Operator for Kubernetes 2.0. More importantly, though, the company also introduced its fully-managed Couchbase Cloud Database-as-a-Service in February, which allows businesses to run the database within their own virtual private cloud on public clouds like AWS and Microsoft Azure.

“We are excited to partner with Couchbase and view Couchbase Server’s highly performant, distributed architecture as purpose-built to support mission-critical use cases at scale,” said Alex Migon, a partner at GPI Capital and a new member of the company’s board of directors. “Couchbase has developed a truly enterprise-grade product, with leading support for cutting-edge application development and deployment needs.  We are thrilled to contribute to the next stage of the company’s growth.”

The company tells me that it plans to use the new funding to continue its “accelerated trajectory with investment in each of their three core pillars: sustained differentiation, profitable growth, and world class teams.” Of course, Couchbase will also continue to build new features for its NoSQL server, mobile platform and Couchbase Cloud — and in addition, the company will continue to expand geographically to serve its global customer operations.


By Frederic Lardinois

Microsoft launches Azure Synapse Link to help enterprises get faster insights from their data

At its Build developer conference, Microsoft today announced Azure Synapse Link, a new enterprise service that allows businesses to analyze their data faster and more efficiently, using an approach that’s generally called ‘hybrid transaction/analytical processing’ (HTAP). That’s a mouthful, it essentially enables enterprises to use the same database system for analytical and transactional workloads on a single system. Traditionally, enterprises had to make some tradeoffs between either building a single system for both that was often highly over-provisioned or to maintain separate systems for transactional and analytics workloads.

Last year, at its Ignite conference, Microsoft announced Azure Synapse Analytics, an analytics service that combines analytics and data warehousing to create what the company calls “the next evolution of Azure SQL Data Warehouse.” Synapse Analytics brings together data from Microsoft’s services and those from its partners and makes it easier to analyze.

“One of the key things, as we work with our customers on their digital transformation journey, there is an aspect of being data-driven, of being insights-driven as a culture, and a key part of that really is that once you decide there is some amount of information or insights that you need, how quickly are you able to get to that? For us, time to insight and a secondary element, which is the cost it takes, the effort it takes to build these pipelines and maintain them with an end-to-end analytics solution, was a key metric we have been observing for multiple years from our largest enterprise customers,” said Rohan Kumar, Microsoft’s corporate VP for Azure Data.

Synapse Link takes the work Microsoft did on Synaps Analytics a step further by removing the barriers between Azure’s operational databases and Synapse Analytics, so enterprises can immediately get value from the data in those databases without going through a data warehouse first.

“What we are announcing with Synapse Link is the next major step in the same vision that we had around reducing the time to insight,” explained Kumar. “And in this particular case, a long-standing barrier that exists today between operational databases and analytics systems is these complex ETL (extract, transform, load) pipelines that need to be set up just so you can do basic operational reporting or where, in a very transactionally consistent way, you need to move data from your operational system to the analytics system, because you don’t want impact the performance of the operational system in any way because that’s typically dealing with, depending on the system, millions of transactions per second.”

ETL pipelines, Kumar argued, are typically expensive and hard to build and maintain, yet enterprises are now building new apps — and maybe even line of business mobile apps — where any action that consumers take and that is registered in the operational database is immediately available for predictive analytics, for example.

From the user perspective, enabling this only takes a single click to link the two, while it removes the need for managing additional data pipelines or database resources. That, Kumar said, was always the main goal for Synapse Link. “With a single click, you should be able to enable real-time analytics on you operational data in ways that don’t have any impact on your operational systems, so you’re not using the compute part of your operational system to do the query, you actually have to transform the data into a columnar format, which is more adaptable for analytics, and that’s really what we achieved with Synapse Link.”

Because traditional HTAP systems on-premises typically share their compute resources with the operational database, those systems never quite took off, Kumar argued. In the cloud, with Synapse Link, though, that impact doesn’t exist because you’re dealing with two separate systems. Now, once a transaction gets committed to the operational database, the Synapse Link system transforms the data into a columnar format that is more optimized for the analytics system — and it does so in real time.

For now, Synapse Link is only available in conjunction with Microsoft’s Cosmos DB database. As Kumar told me, that’s because that’s where the company saw the highest demand for this kind of service, but you can expect the company to add support for available in Azure SQL, Azure Database for PostgreSQL and Azure Database for MySQL in the future.


By Frederic Lardinois

Microsoft partners with Redis Labs to improve its Azure Cache for Redis

For a few years now, Microsoft has offered Azure Cache for Redis, a fully managed caching solution built on top of the open-source Redis project. Today, it is expanding this service by adding Redis Enterprise, Redis Lab’s commercial offering, to its platform. It’s doing so in partnership with Redis Labs and while Microsoft will offer some basic support for the service, Redis Labs will handle most of the software support itself.

Julia Liuson, Microsoft’s corporate VP of its developer tools division, told me that the company wants to be seen as a partner to open-source companies like Redis Labs, which was among the first companies to change its license to prevent cloud vendors from commercializing and repackaging their free code without contributing back to the community. Last year, Redis Labs partnered with Google Cloud to bring its own fully managed service to its platform and so maybe it’s no surprise that we are now seeing Microsoft make a similar move.

Liuson tells me that with this new tier for Azure Cache for Redis, users will get a single bill and native Azure management, as well as the option to deploy natively on SSD flash storage. The native Azure integration should also make it easier for developers on Azure to integrate Redis Enterprise into their applications.

It’s also worth noting that Microsoft will support Redis Labs’ own Redis modules, including RediSearch, a Redis-powered search engine, as well as RedisBloom and RedisTimeSeries, which provide support for new datatypes in Redis.

“For years, developers have utilized the speed and throughput of Redis to produce unbeatable responsiveness and scale in their applications,” says Liuson. “We’ve seen tremendous adoption of Azure Cache for Redis, our managed solution built on open source Redis, as Azure customers have leveraged Redis performance as a distributed cache, session store, and message broker. The incorporation of the Redis Labs Redis Enterprise technology extends the range of use cases in which developers can utilize Redis, while providing enhanced operational resiliency and security.”


By Frederic Lardinois

Fishtown Analytics raises $12.9M Series A for its open-source analytics engineering tool

Philadelphia-based Fishtown Analytics, the company behind the popular open-source data engineering tool dbt, today announced that it has raised a $12.9 million Series A round led by Andreessen Horowitz, with the firm’s general partner Martin Casada joining the company’s board.

“I wrote this blog post in early 2016, essentially saying that analysts needed to work in a fundamentally different way,” Fishtown founder and CEO Tristan Handy told me, when I asked him about how the product came to be. “They needed to work in a way that much more closely mirrored the way the software engineers work and software engineers have been figuring this shit out for years and data analysts are still like sending each other Microsoft Excel docs over email.”

The dbt open-source project forms the basis of this. It allows anyone who can write SQL queries to transform data and then load it into their preferred analytics tools. As such, it sits in-between data warehouses and the tools that load data into them on one end, and specialized analytics tools on the other.

As Casada noted when I talked to him about the investment, data warehouses have now made it affordable for businesses to store all of their data before it is transformed. So what was traditionally “extract, transform, load” (ETL) has now become “extract, load, transform” (ELT). Andreessen Horowitz is already invested in Fivetran, which helps businesses move their data into their warehouses, so it makes sense for the firm to also tackle the other side of this business.

“Dbt is, as far as we can tell, the leading community for transformation and it’s a company we’ve been tracking for at least a year,” Casada said. He also argued that data analysts — unlike data scientists — are not really catered to as a group.

Before this round, Fishtown hadn’t raised a lot of money, even though it has been around for a few years now, except for a small SAFE round from Amplify.

But Handy argued that the company needed this time to prove that it was on to something and build a community. That community now consists of more than 1,700 companies that use the dbt project in some form and over 5,000 people in the dbt Slack community. Fishtown also now has over 250 dbt Cloud customers and the company signed up a number of big enterprise clients earlier this year. With that, the company needed to raise money to expand and also better service its current list of customers.

“We live in Philadelpha. The cost of living is low here and none of us really care to make a quadro-billion dollars, but we do want to answer the question of how do we best serve the community,” Handy said. “And for the first time, in the early part of the year, we were like, holy shit, we can’t keep up with all of the stuff that people need from us.”

The company plans to expand the team from 25 to 50 employees in 2020 and with those, the team plans to improve and expand the product, especially its IDE for data analysts, which Handy admitted could use a bit more polish.


By Frederic Lardinois

Collibra nabs another $112.5M at a $2.3B valuation for its big data management platform

GDPR and other data protection and privacy regulations — as well as a significant (and growing) number of data breaches and exposées of companies’ privacy policies — have put a spotlight on not just on the vast troves of data that businesses and other organizations hold on us, but also how they handle it. Today, one of the companies helping them cope with that data trove in a better and legal way is announcing a huge round of funding to continue that work. Collibra, which provides tools to manage, warehouse, store and analyse data troves, is today announcing that it has raised $112.5 million in funding, at a post-money valuation of $2.3 billion.

The funding — a Series F from the looks of it — represents a big bump for the startup, which last year raised $100 million at a valuation of just over $1 billion. This latest round was co-led by ICONIQ Capital, Index Ventures, and Durable Capital Partners LIP, with previous investors CapitalG (Google’s growth fund), Battery Ventures, and Dawn Capital also participating.

Collibra, originally a spin-out from Vrije Universiteit in Brussels, Belgium, today works with some 450 enterprises and other large organizations — customers include Adobe, Verizon (which owns TechCrunch), insurers AXA, and a number of healthcare providers. Its products cover a range of services focused around company data, including tools to help customers comply with local data protection policies, store it securely, and to run analytics and more.

These are all tools that have long had a place in enterprise big data IT, but have become increasingly more used and in-demand both as data policies have expanded, and as the prospects of what can be discovered through big data analytics have become more advanced. With that growth, many companies have realised that they are not in a position to use and store their data in the best possible way, and that is where companies like Collibra step in.

“Most large organizations are in data chaos,” Felix Van de Maele, co-founder and CEO, previously told us. “We help them understand what data they have, where they store it and [understand] whether they are allowed to use it.”

As you would expect with a big IT trend, Collibra is not the only company chasing this opportunity. Competitors include Informatica, IBM, Talend, Egnyte, among a number of others, but the market position of Collibra, and its advanced technology, is what has continued to impress investors.

“Durable Capital Partners invests in innovative companies that have significant potential to shape growing industries and build larger companies,” said Henry Ellenbogen, founder and chief investment officer for Durable Capital Partners LP, in a statement (Ellenbogen is formerly an investment manager a T. Rowe Price, and this is his first investment in Collibra under Durable). “We believe Collibra is a leader in the Data Intelligence category, a space that could have a tremendous impact on global business operations and a space that we expect will continue to grow as data becomes an increasingly critical asset.”

“We have a high degree of conviction in Collibra and the importance of the company’s mission to help organizations benefit from their data,” added Matt Jacobson, general partner at ICONIQ Capital and Collibra board member, in his own statement. “There is an increasing urgency for enterprises to harness their data for strategic business decisions. Collibra empowers organizations to use their data to make critical business decisions, especially in uncertain business environments.”


By Ingrid Lunden

Datastax acquires The Last Pickle

Data management company Datastax, one of the largest contributors to the Apache Cassandra project, today announced that it has acquired The Last Pickle (and no, I don’t know what’s up with that name either), a New Zealand-based Cassandra consulting and services firm that’s behind a number of popular open-source tools for the distributed NoSQL database.

As Datastax Chief Strategy Officer Sam Ramji, who you may remember from his recent tenure at Apigee, the Cloud Foundry Foundation, Google and Autodesk, told me, The Last Pickle is one of the premier Apache Cassandra consulting and services companies. The team there has been building Cassandra-based open source solutions for the likes of Spotify, T Mobile and AT&T since it was founded back in 2012. And while The Last Pickle is based in New Zealand, the company has engineers all over the world that do the heavy lifting and help these companies successfully implement the Cassandra database technology.

It’s worth mentioning that Last Pickle CEO Aaron Morton first discovered Cassandra when he worked for WETA Digital on the special effects for Avatar, where the team used Cassandra to allow the VFX artists to store their data.

“There’s two parts to what they do,” Ramji explained. “One is the very visible consulting, which has led them to become world experts in the operation of Cassandra. So as we automate Cassandra and as we improve the operability of the project with enterprises, their embodied wisdom about how to operate and scale Apache Cassandra is as good as it gets — the best in the world.” And The Last Pickle’s experience in building systems with tens of thousands of nodes — and the challenges that its customers face — is something Datastax can then offer to its customers as well.

And Datastax, of course, also plans to productize The Last Pickle’s open-source tools like the automated repair tool Reaper and the Medusa backup and restore system.

As both Ramji and Datastax VP of Engineering Josh McKenzie stressed, Cassandra has seen a lot of commercial development in recent years, with the likes of AWS now offering a managed Cassandra service, for example, but there wasn’t all that much hype around the project anymore. But they argue that’s a good thing. Now that it is over ten years old, Cassandra has been battle-hardened. For the last ten years, Ramji argues, the industry tried to figure out what the de factor standard for scale-out computing should be. By 2019, it became clear that Kubernetes was the answer to that.

“This next decade is about what is the de facto standard for scale-out data? We think that’s got certain affordances, certain structural needs and we think that the decades that Cassandra has spent getting harden puts it in a position to be data for that wave.”

McKenzie also noted that Cassandra provides users with a number of built-in features like support for mutiple data centers and geo-replication, rolling updates and live scaling, as well as wide support across programming languages, give it a number of advantages over competing databases.

“It’s easy to forget how much Cassandra gives you for free just based on its architecture,” he said. “Losing the power in an entire datacenter, upgrading the version of the database, hardware failing every day? No problem. The cluster is 100 percent always still up and available. The tooling and expertise of The Last Pickle really help bring all this distributed and resilient power into the hands of the masses.”

The two companies did not disclose the price of the acquisition.


By Frederic Lardinois

Satori Cyber raises $5.25M to help businesses protect their data flows

The amount of data that most companies now store — and the places they store it — continues to increase rapidly. With that, the risk of the wrong people managing to get access to this data also increases, so it’s no surprise that we’re now seeing a number of startups that focus on protecting this data and how it flows between clouds and on-premises servers. Satori Cyber, which focuses on data protecting and governance, today announced that it has raised a $5.25 million seed round led by YL Ventures.

“We believe in the transformative power of data to drive innovation and competitive advantage for businesses,” the company says. “We are also aware of the security, privacy and operational challenges data-driven organizations face in their journey to enable broad and optimized data access for their teams, partners and customers. This is especially true for companies leveraging cloud data technologies.”

Satori is officially coming out of stealth mode today and launching its first product, the Satori Cyber Secure Data Access Cloud. This service provides enterprises with the tools to provide access controls for their data, but maybe just as importantly, it also offers these companies and their security teams visibility into their data flows across cloud and hybrid environments. The company argues that data is “a moving target” because it’s often hard to know how exactly it moves between services and who actually has access to it. With most companies now splitting their data between lots of different data stores, that problem only becomes more prevalent over time and continuous visibility becomes harder to come by.

“Until now, security teams have relied on a combination of highly segregated and restrictive data access and one-off technology-specific access controls within each data store, which has only slowed enterprises down,” said Satori Cyber CEO and Co-founder Eldad Chai. “The Satori Cyber platform streamlines this process, accelerates data access and provides a holistic view across all organizational data flows, data stores and access, as well as granular access controls, to accelerate an organization’s data strategy without those constraints.”

Both co-founders previously spent nine years building security solutions at Imperva and Incapsula (which acquired Imperva in 2014). Based on this experience, they understood that onboarding had to be as easy as possible and that operations would have to be transparent to the users. “We built Satori’s Secure Data Access Cloud with that in mind, and have designed the onboarding process to be just as quick, easy and painless. On-boarding Satori involves a simple host name change and does not require any changes in how your organizational data is accessed or used,” they explain.

 

 


By Frederic Lardinois

Microsoft’s Azure Synapse Analytics bridges the gap between data lakes and warehouses

At its annual Ignite conference in Orlando, Fla., Microsoft today announced a major new Azure service for enterprises: Azure Synapse Analytics, which Microsoft describes as “the next evolution of Azure SQL Data Warehouse.” Like SQL Data Warehouse, it aims to bridge the gap between data warehouses and data lakes, which are often completely separate. Synapse also taps into a wide variety of other Microsoft services, including Power BI and Azure Machine Learning, as well as a partner ecosystem that includes Databricks, Informatica, Accenture, Talend, Attunity, Pragmatic Works and Adatis. It’s also integrated with Apache Spark.

The idea here is that Synapse allows anybody working with data in those disparate places to manage and analyze it from within a single service. It can be used to analyze relational and unstructured data, using standard SQL.

Screen Shot 2019 10 31 at 10.11.48 AM

Microsoft also highlights Synapse’s integration with Power BI, its easy to use business intelligence and reporting tool, as well as Azure Machine Learning for building models.

With the Azure Synapse studio, the service provides data professionals with a single workspace for prepping and managing their data, as well as for their big data and AI tasks. There’s also a code-free environment for managing data pipelines.

As Microsoft stresses, businesses that want to adopt Synapse can continue to use their existing workloads in production with Synapse and automatically get all of the benefits of the service. “Businesses can put their data to work much more quickly, productively, and securely, pulling together insights from all data sources, data warehouses, and big data analytics systems,” writes Microsoft CVP of Azure Data, Rohan Kumar.

In a demo at Ignite, Kumar also benchmarked Synapse against Google’s BigQuery. Synapse ran the same query over a petabyte of data in 75% less time. He also noted that Synapse can handle thousands of concurrent users — unlike some of Microsoft’s competitors.


By Frederic Lardinois

Microsoft acquires data privacy and governance service BlueTalon

Microsoft today announced that it has acquired BlueTalon, a data privacy and governance service that helps enterprises set policies for how their employees can access their data. The service then enforces those policies across most popular data environments and provides tools for auditing policies and access, too.

Neither Microsoft nor BlueTalon disclosed the financial details of the transaction. Ahead of today’s acquisition, BlueTalon had raised about $27.4 million, according to Crunchbase. Investors include Bloomberg Beta, Maverick Ventures, Signia Venture Partners and Stanford’s StartX fund.

BlueTalon Policy Engine How it works

“The IP and talent acquired through BlueTalon brings a unique expertise at the apex of big data, security and governance,” writes Rohan Kumar, Microsoft’s corporate VP for Azure Data. “This acquisition will enhance our ability to empower enterprises across industries to digitally transform while ensuring right use of data with centralized data governance at scale through Azure.”

Unsurprisingly, the BlueTalon team will become part of the Azure Data Governance group, where the team will work on enhancing Microsoft’s capabilities around data privacy and governance. Microsoft already offers access and governance control tools for Azure, of course. As virtually all businesses become more data-centric, though, the need for centralized access controls that work across systems is only going to increase and new data privacy laws aren’t making this process easier.

“As we began exploring partnership opportunities with various hyperscale cloud providers to better serve our customers, Microsoft deeply impressed us,” BlueTalon CEO Eric Tilenius, who has clearly read his share of “our incredible journey” blog posts, explains in today’s announcement. “The Azure Data team was uniquely thoughtful and visionary when it came to data governance. We found them to be the perfect fit for us in both mission and culture. So when Microsoft asked us to join forces, we jumped at the opportunity.”


By Frederic Lardinois

MongoDB gets a data lake, new security features and more

MongoDB is hosting its developer conference today and unsurprisingly, the company has quite a few announcements to make. Some are straightforward, like the launch of MongoDB 4.2 with some important new security features, while others, like the launch of the company’s Atlas Data Lake, point the company beyond its core database product.

“Our new offerings radically expand the ways developers can use MongoDB to better work with data,” said Dev Ittycheria, the CEO and President of MongoDB. “We strive to help developers be more productive and remove infrastructure headaches — with additional features along with adjunct capabilities like full-text search and data lake. IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public cloud. It’s our mission to give developers better ways to work with data wherever it resides, including in public and private clouds.”

The highlight of today’s set of announcements is probably the launch of MongoDB Atlas Data Lake. Atlas Data Lake allows users to query data, using the MongoDB Query Language, on AWS S3, no matter their format, including JSON, BSON, CSV, TSV, Parquet and Avro. To get started, users only need to point the service at their existing S3 buckets. They don’t have to manage servers or other infrastructure. Support for Data Lake on Google Cloud Storage and Azure Storage is in the works and will launch in the future.

Also new is Full-Text Search, which gives users access to advanced text search features based on the open-source Apache Lucene 8.

In addition, MongoDB is also now starting to bring together Realm, the mobile database product it acquired earlier this year, and the rest of its product lineup. Using the Realm brand, Mongo is merging its serverless platform, MongoDB Stitch, and Realm’s mobile database and synchronization platform. Realm’s synchronization protocol will now connect to MongoDB Atlas’ cloud database, while Realm Sync will allow developers to bring this data to their applications. 

“By combining Realm’s wildly popular mobile database and synchronization platform with the strengths of Stitch, we will eliminate a lot of work for developers by making it natural and easy to work with data at every layer of the stack, and to seamlessly move data between devices at the edge to the core backend,”  explained Eliot Horowitz, the CTO and co-founder of MongoDB.

As for the latest release of MongoDB, the highlight of the release is a set of new security features. With this release, Mongo is implementing client-side Field Level Encryption. Traditionally, database security has always relied on server-side trust. This typically leaves the data accessible to administrators, even if they don’t have client access. If an attacker breaches the server, that’s almost automatically a catastrophic event.

With this new security model, Mongo is shifting access to the client and to the local drivers. It provides multiple encryptions options and for developers to make use of this, they will use a new ‘encrypt’ JSON scheme attribute.

This ensures that all application code can generally run unmodified and even the admins won’t get access to the database or its logs and backups unless they get client access rights themselves. Since the logic resides in the drivers, the encryption is also handled totally separate from the actual database.

Other new features in MongoDB 4.2 include support for distributed transactions and the ability to manage MongoDB deployments from a single Kubernetes control plane.


By Frederic Lardinois

Apollo raises $22M for its GraphQL platform

Apollo, a San Francisco-based startup that provides a number of developer and operator tools and services around the GraphQL query language, today announced that it has raised a $22 million growth funding round co-led by Andreessen Horowitz and Matrix Partners. Existing investors Trinity Ventures and Webb Investment Network also participated in this round.

Today, Apollo is probably the biggest player in the GraphQL ecosystem. At its core, the company’s services allow businesses to use the Facebook-incubated GraphQL technology to shield their developers from the patchwork of legacy APIs and databases as they look to modernize their technology stacks. The team argues that while REST APIs that talked directly to other services and databases still made sense a few years ago, it doesn’t anymore now that the number of API endpoints keeps increasing rapidly.

Apollo replaces this with what it calls the Data Graph. “There is basically a missing piece where we think about how people build apps today, which is the piece that connects the billions of devices out there,” Apollo co-founder and CEO Geoff Schmidt told me. “You probably don’t just have one app anymore, you probably have three, for the web, iOS and Android . Or maybe six. And if you’re a two-sided marketplace you’ve got one for buyers, one for sellers and another for your ops team.” Managing the interfaces between all of these apps quickly becomes complicated and means you have to write a lot of custom code for every new feature. The promise of the Data Graph is that developers can use GraphQL to query the data in the graph and move on, all without having to write the boilerplate code that typically slows them down. At the same time, the ops teams can use the Graph to enforce access policies and implement other security features.

“If you think about it, there’s a lot of analogies to what happened with relational databases in the 80s,” Schmidt said. “There is a need for a new layer in the stack. Previously, your query planner was a human being, not a piece of software, and a relational databased is a piece of software that would just give you a database. And you needed a way to query that database and that syntax was called SQL.”

Geoff Schmidt, Apollo CEO, and Matt DeBergalis, CTO.

GraphQL itself, of course, is open source. Apollo is now building a lot of the proprietary tools around this idea of the Data Graph that make it useful for businesses. There’s a cloud-hosted graph manager, for example, that lets you track your schema, as well as a dashboard to track performance, as well as integrations with continuous integration services. “It’s basically a set of services that keep track of the metadata about your graph and help you manage the configuration of your graph and all the workflows and processes around it,” Schmidt said.

The development of Apollo didn’t come out of nowhere. The founders previously launched Meteor, a framework and set of hosted services that allowed developers to write their apps in JavaScript, both on the front-end and back-end. Meteor was tightly coupled to MongoDB, though, which worked well for some use cases but also held the platform back in the long run. With Apollo, the team decided to go in the opposite direction and instead build a platform that makes being database agnostic the core of its value proposition.

The company also recently launched Apollo Federation, which makes it easier for businesses to work with a distributed graph. Sometimes, after all, your data lives in lots of different places. Federation allows for a distributed architecture that combines all of the different data sources into a single schema that developers can then query.

Schmidt tells me that the company started to get some serious traction last year and by December, it was getting calls from VCs that heard from their portfolio companies that they were using Apollo.

The company plans to use the new funding to build out its technology to scale its field team to support the enterprises that bet on its technology, including the open source technologies that power both the service.

“I see the Data Graph as a core new layer of the stack, just like we as an industry invested in the relational databased for decades, making it better and better,” Schmidt said. “We’re still finding new uses for SQL and that relational database model. I think the Data Graph is going to be the same way.”


By Frederic Lardinois

Couchbase’s mobile database gets built-in ML and enhanced synchronization features

Couchbase, the company behind the eponymous NoSQL database, announced a major update to its mobile database today that brings some machine learning smarts, as well as improved synchronization features and enhanced stats and logging support to the software.

“We’ve led the innovation and data management at the edge since the release of our mobile database five years ago,” Couchbase’s VP of Engineering Wayne Carter told me. “And we’re excited that others are doing that now. We feel that it’s very, very important for businesses to be able to utilize these emerging technologies that do sit on the edge to drive their businesses forward, and both making their employees more effective and their customer experience better.”

The latter part is what drove a lot of today’s updates, Carter noted. He also believes that the database is the right place to do some machine learning. So with this release, the company is adding predictive queries to its mobile database. This new API allows mobile apps to take pre-trained machine learning models and run predictive queries against the data that is stored locally. This would allow a retailer to create a tool that can use a phone’s camera to figure out what part a customer is looking for.

To support these predictive queries, Couchbase mobile is also getting support for predictive indexes. “Predictive indexes allow you to create an index on prediction, enabling correlation of real-time predictions with application data in milliseconds,” Carter said. In many ways, that’s also the unique value proposition for bringing machine learning into the database. “What you really need to do is you need to utilize the unique values of a database to be able to deliver the answer to those real-time questions within milliseconds,” explained Carter.

The other major new feature in this release is delta synchronization, which allows businesses to push far smaller updates to the databases on their employees mobile devices. That’s because they only have to receive the information that changed instead of a full updated database. Carter says this was a highly requested feature but until now, the company always had to prioritize work on other components of Couchbase.

This is an especially useful feature for the company’s retail customers, a vertical where it has been quite successful. These users need to keep their catalogs up to data and quite a few of them supply their employees with mobile devices to help shoppers. Rumor has it that Apple, too, is a Couchbase user.

The update also includes a few new features that will be more of interest to operators, including advanced stats reporting and enhanced logging support.

 


By Frederic Lardinois

Databricks open-sources Delta Lake to make data lakes more reliable

Databricks, the company founded by the original developers of the Apache Spark big data analytics engine, today announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these vast data repositories.

Delta Lake, which has long been a proprietary part of Databrick’s offering, is already in production use by companies like Viacom, Edmunds, Riot Games and McGraw Hill.

The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

“Today nearly every company has a data lake they are trying to gain insights from, but data lakes have proven to lack data reliability. Delta Lake has eliminated these challenges for hundreds of enterprises. By making Delta Lake open source, developers will be able to easily build reliable data lakes and turn them into ‘Delta Lakes’,” said Ali Ghodsi, co-founder and CEO at Databricks.

What’s important to note here is that Delta lake runs on top of existing data lakes and is compatible with the Apache spark APIs.

The company is still looking at how the project will be governed in the future. “We are still exploring different models of open source project governance, but the GitHub model is well understood and presents a good trade-off between the ability to accept contributions and governance overhead,” Ghodsi said. “One thing we know for sure is we want to foster a vibrant community, as we see this as a critical piece of technology for increasing data reliability on data lakes. This is why we chose to go with a permissive open source license model: Apache License v2, same license that Apache Spark uses.”

To invite this community, Databricks plans to take outside contributions, just like the Spark project.

“We want Delta Lake technology to be used everywhere on-prem and in the cloud by small and large enterprises,” said Ghodsi. “This approach is the fastest way to build something that can become a standard by having the community provide direction and contribute to the development efforts.” That’s also why the company decided against a Commons Clause licenses that some open-source companies now use to prevent others (and especially large clouds) from using their open source tools in their own commercial SaaS offerings. “We believe the Commons Clause license is restrictive and will discourage adoption. Our primary goal with Delta Lake is to drive adoption on-prem as well as in the cloud.”


By Frederic Lardinois

Google Cloud challenges AWS with new open-source integrations

Google today announced that it has partnered with a number of top open-source data management and analytics companies to integrate their products into its Google Cloud Platform and offer them as managed services operated by its partners. The partners here are Confluent, DataStax, Elastic, InfluxData, MongoDB, Neo4j and Redis Labs.

The idea here, Google says, is to provide users with a seamless user experience and the ability to easily leverage these open-source technologies in Google’s cloud. But there is a lot more at play here, even though Google never quite says so. That’s because Google’s move here is clearly meant to contrast its approach to open-source ecosystems with Amazon’s. It’s no secret that Amazon’s AWS cloud computing platform has a reputation for taking some of the best open-source projects and then forking those and packaging them up under its own brand, often without giving back to the original project. There are some signs that this is changing, but a number of companies have recently taken action and changed their open-source licenses to explicitly prevent this from happening.

That’s where things get interesting, because those companies include Confluent, Elastic, MongoDB, Neo4j and Redis Labs — and those are all partnering with Google on this new project, though it’s worth noting that InfluxData is not taking this new licensing approach and that while DataStax uses lots of open-source technologies, its focus is very much on its enterprise edition.

“As you are aware, there has been a lot of debate in the industry about the best way of delivering these open-source technologies as services in the cloud,” Manvinder Singh, the head of infrastructure partnerships at Google Cloud, said in a press briefing. “Given Google’s DNA and the belief that we have in the open-source model, which is demonstrated by projects like Kubernetes, TensorFlow, Go and so forth, we believe the right way to solve this it to work closely together with companies that have invested their resources in developing these open-source technologies.”

So while AWS takes these projects and then makes them its own, Google has decided to partner with these companies. While Google and its partners declined to comment on the financial arrangements behind these deals, chances are we’re talking about some degree of profit-sharing here.

“Each of the major cloud players is trying to differentiate what it brings to the table for customers, and while we have a strong partnership with Microsoft and Amazon, it’s nice to see that Google has chosen to deepen its partnership with Atlas instead of launching an imitation service,” Sahir Azam, the senior VP of Cloud Products at MongoDB told me. “MongoDB and GCP have been working closely together for years, dating back to the development of Atlas on GCP in early 2017. Over the past two years running Atlas on GCP, our joint teams have developed a strong working relationship and support model for supporting our customers’ mission critical applications.”

As for the actual functionality, the core principle here is that Google will deeply integrate these services into its Cloud Console; for example, similar to what Microsoft did with Databricks on Azure. These will be managed services and Google Cloud will handle the invoicing and the billings will count toward a user’s Google Cloud spending commitments. Support will also run through Google, so users can use a single service to manage and log tickets across all of these services.

Redis Labs CEO and co-founder Ofer Bengal echoed this. “Through this partnership, Redis Labs and Google Cloud are bringing these innovations to enterprise customers, while giving them the choice of where to run their workloads in the cloud, he said. “Customers now have the flexibility to develop applications with Redis Enterprise using the fully integrated managed services on GCP. This will include the ability to manage Redis Enterprise from the GCP console, provisioning, billing, support, and other deep integrations with GCP.”


By Frederic Lardinois

How to handle dark data compliance risk at your company

Slack and other consumer-grade productivity tools have been taking off in workplaces large and small — and data governance hasn’t caught up.

Whether it’s litigation, compliance with regulations like GDPR, or concerns about data breaches, legal teams need to account for new types of employee communication. And that’s hard when work is happening across the latest messaging apps and SaaS products, which make data searchability and accessibility more complex.

Here’s a quick look at the problem, followed by our suggestions for best practices at your company.

Problems

The increasing frequency of reported data breaches and expanding jurisdiction of new privacy laws are prompting conversations about dark data and risks at companies of all sizes, even small startups. Data risk discussions necessarily include the risk of a data breach, as well as preservation of data. Just two weeks ago it was reported that Jared Kushner used WhatsApp for official communications and screenshots of those messages for preservation, which commentators say complies with recordkeeping laws but raises questions about potential admissibility as evidence.


By Arman Tabatabai