Stemma launches with $4.8M seed to build managed data catalogue

As companies increasingly rely on data to run their businesses, having accurate sources of data becomes paramount. Stemma, a new early stage startup, has come up with a solution, a managed data catalogue that acts as an organization’s source of truth.

Today the company announced a $4.8 million seed investment led by Sequoia with assorted individual tech luminaries also participating. The product is also available for the first time today.

Company co-founder and CEO Mark Grover says the product is actually built on top of the open source Amundsen data catalogue project that he helped launch at Lyft to manage its massive data requirements. The problem was that with so much data, employees had to kludge together systems to confirm the data validity. Ultimately manual processes like asking someone in Slack or even creating a Wiki failed under the weight of trying to keep up with the volume and velocity.

“I saw this problem first-hand at Lyft, which led me to create the open source Amundsen project with a team of talented engineers,” Grover said. That project has 750 users at Lyft using it every week. Since it was open sourced, 35 companies like Brex, Snap and Asana have been using it.

What Stemma offers is a managed version of Amundsen that adds additional functionality like using intelligence to show data that’s meaningful to the person who is searching in the catalogue. It also can add metadata automatically to data as it’s added to the catalogue, creating documentation about the data on the fly, among other features.

The company launched last fall when Grover and co-founder and CTO Dorian Johnson decided to join forces and create a commercial product on top of Amundsen. Grover points out that Lyft was supportive of the move.

Today the company has five employees, in addition to the founders and has plans to add several more this year. As he does that, he is cognizant of diversity and inclusion in the hiring process. “I think it’s super important that we continue to invest in diversity, and the two ways that I think are the most meaningful for us right now is to have early employees that are from diverse groups, and that is the case within the first five,” he said. Beyond that, he says that as the company grows he wants to improve the ratio, while also looking at diversity in investors, board members and executives.

The company, which launched during COVID is entirely remote right now and plans to remain that way for at least the short term. As the company grows, they will look at ways to build camaraderie like organizing a regular cadence of employee offsite events.


By Ron Miller

Microsoft Azure expands its NoSQL portfolio with Managed Instances for Apache Cassandra

At its Ignite conference today, Microsoft announced the launch of Azure Managed Instance for Apache Cassandra, its latest NoSQL database offering and a competitor to Cassandra-centric companies like Datastax. Microsoft describes the new service as a ‘semi-managed offering that will help companies bring more of their Cassandra-based workloads into its cloud.

“Customers can easily take on-prem Cassandra workloads and add limitless cloud scale while maintaining full compatibility with the latest version of Apache Cassandra,” Microsoft explains in its press materials. “Their deployments gain improved performance and availability, while benefiting from Azure’s security and compliance capabilities.”

Like its counterpart, Azure SQL Manages Instance, the idea here is to give users access to a scalable, cloud-based database service. To use Cassandra in Azure before, businesses had to either move to Cosmos DB, its highly scalable database service which supports the Cassandra, MongoDB, SQL and Gremlin APIs, or manage their own fleet of virtual machines or on-premises infrastructure.

Cassandra was originally developed at Facebook and then open-sourced in 2008. A year later, it joined the Apache Foundation and today it’s used widely across the industry, with companies like Apple and Netflix betting on it for some of their core services, for example. AWS launched a managed Cassandra-compatible service at its re:Invent conference in 2019 (it’s called Amazon Keyspaces today), Microsoft only launched the Cassandra API for Cosmos DB last November. With today’s announcement, though, the company can now offer a full range of Cassandra-based servicer for enterprises that want to move these workloads to its cloud.


By Frederic Lardinois

Datastax acquires Kesque as it gets into data streaming

Datastax, the company best known for commercializing the open-source Apache Cassandra database, is moving beyond databases. As the company announced today, it has acquired Kesque, a cloud messaging service.

The Kesque team built its service on top of the Apache Pulsar messaging and streaming project. Datastax has now taken that team’s knowledge in this area and, combined with its own expertise, is launching its own Pulsar-based streaming platform by the name of Datastax Luna Streaming, which is now generally available.

This move comes right as Datastax is also now, for the first time, announcing that it is cash-flow positive and profitable, as the company’s chief product officer, Ed Anuff, told me. “We are at over $150 million in [annual recurring revenue]. We are cash-flow positive and we are profitable,” he told me. This marks the first time the company is publically announcing this data. In addition, the company also today revealed that about 20 percent of its annual contract value is now for DataStax Astra, its managed multi-cloud Cassandra service and that the number of self-service Asta subscribers has more than doubled from Q3 to Q4.

The launch of Luna Streaming now gives the 10-year-old company a new area to expand into — and one that has some obvious adjacencies with its existing product portfolio.

“We looked at how a lot of developers are building on top of Cassandra,” Anuff, who joined Datastax after leaving Google Cloud last year, said. “What they’re doing is, they’re addressing what people call ‘data-in-motion’ use cases. They have huge amounts of data that are coming in, huge amounts of data that are going out — and they’re typically looking at doing something with streaming in conjunction with that. As we’ve gone in and asked, “What’s next for Datastax?,’ streaming is going to be a big part of that.”

Given Datastax’s open-source roots, it’s no surprise the team decided to build its service on another open-source project and acquire an open-source company to help it do so. Anuff noted that while there has been a lot of hype around streaming and Apache Kafka, a cloud-native solution like Pulsar seemed like the better solution for the company. Pulsar was originally developed at Yahoo! (which, full disclosure, belongs to the same Verizon Media Group family as TechCrunch) and even before acquiring Kesque, Datastax already used Pulsar to build its Astra platform. Other Pulsar users include Yahoo, Tencent, Nutanix and Splunk.

“What we saw was that when you go and look at doing streaming in a scale-out way, that Kafka isn’t the only approach. We looked at it, and we liked the Pulsar architecture, we like what’s going on, we like the community — and remember, we’re a company that grew up in the Apache open-source community — we said, ‘okay, we think that it’s got all the right underpinnings, let’s go and get involved in that,” Anuff said. And in the process of doing so, the team came across Kesque founder Chris Bartholomew and eventually decided to acquire his company.

The new Luna Streaming offering will be what Datastax calls a “subscription to success with Apache Pulsar.’ It will include a free, production-ready distribution of Pulsar and an optional, SLA-backed subscription tier with enterprise support.

Unsurprisingly, Datastax also plans to remain active in the Pulsar community. The team is already making code contributions, but Anuff also stressed that Datastax is helping out with scalability testing. “This is one of the things that we learned in our participation in the Apache Cassandra project,” Anuff said. “A lot of what these projects need is folks coming in doing testing, helping with deployments, supporting users. Our goal is to be a great participant in the community.”


By Frederic Lardinois

Databricks launches SQL Analytics

AI and data analytics company Databricks today announced the launch of SQL Analytics, a new service that makes it easier for data analysts to run their standard SQL queries directly on data lakes. And with that, enterprises can now easily connect their business intelligence tools like Tableau and Microsoft’s Power BI to these data repositories as well.

SQL Analytics will be available in public preview on November 18.

In many ways, SQL Analytics is the product Databricks has long been looking to build and that brings its concept of a ‘lake house’ to life. It combines the performance of a data warehouse, where you store data after it has already been transformed and cleaned, with a data lake, where you store all of your data in its raw form. The data in the data lake, a concept that Databrick’s co-founder and CEO Ali Ghodsi has long championed, is typically only transformed when it gets used. That makes data lakes cheaper, but also a bit harder to handle for users.

Image Credits: Databricks

“We’ve been saying Unified Data Analytics, which means unify the data with the analytics. So data processing and analytics, those two should be merged. But no one picked that up,” Ghodsi told me. But ‘lake house’ caught on as a term.

“Databricks has always offered data science, machine learning. We’ve talked about that for years. And with Spark, we provide the data processing capability. You can do [extract, transform, load]. That has always been possible. SQL Analytics enables you to now do the data warehousing workloads directly, and concretely, the business intelligence and reporting workloads, directly on the data lake.”

The general idea here is that with just one copy of the data, you can enable both traditional data analyst use cases (think BI) and the data science workloads (think AI) Databricks was already known for. Ideally, that makes both use cases cheaper and simpler.

The service sits on top of an optimized version of Databricks’ open-source Delta Lake storage layer to enable the service to quickly complete queries. In addition, Delta Lake also provides auto-scaling endpoints to keep the query latency consistent, even under high loads.

While data analysts can query these data sets directly, using standard SQL, the company also built a set of connectors to BI tools. Its BI partners include Tableau, Qlik, Looker and Thoughtspot, as well as ingest partners like Fivetran, Fishtown Analytics, Talend and Matillion.

Image Credits: Databricks

“Now more than ever, organizations need a data strategy that enables speed and agility to be adaptable,” said Francois Ajenstat, Chief Product Officer at Tableau. “As organizations are rapidly moving their data to the cloud, we’re seeing growing interest in doing analytics on the data lake. The introduction of SQL Analytics delivers an entirely new experience for customers to tap into insights from massive volumes of data with the performance, reliability and scale they need.”

In a demo, Ghodsi showed me what the new SQL Analytics workspace looks like. It’s essentially a stripped-down version of the standard code-heavy experience that Databricks users are familiar with. Unsurprisingly, SQL Analytics provides a more graphical experience that focuses more on visualizations and not Python code.

While there are already some data analysts on the Databricks platform, this obviously opens up a large new market for the company — something that would surely bolster its plans for an IPO next year.


By Frederic Lardinois

Data virtualization service Varada raises $12M

Varada, a Tel Aviv-based startup that focuses on making it easier for businesses to query data across services, today announced that it has raised a $12 million Series A round led by Israeli early-stage fund MizMaa Ventures, with participation by Gefen Capital.

“If you look at the storage aspect for big data, there’s always innovation, but we can put a lot of data in one place,” Varada CEO and co-founder Eran Vanounou told me. “But translating data into insight? It’s so hard. It’s costly. It’s slow. It’s complicated.”

That’s a lesson he learned during his time as CTO of LivePerson, which he described as a classic big data company. And just like at LivePerson, where the team had to reinvent the wheel to solve its data problems, again and again, every company — and not just the large enterprises — now struggles with managing their data and getting insights out of it, Vanounou argued.

Image Credits: Varada

The rest of the founding team, David Krakov, Roman Vainbrand and Tal Ben-Moshe, already had a lot of experience in dealing with these problems, too, with Ben-Moshe having served at the Chief Software Architect of Dell EMC’s XtremIO flash array unit, for example. They built the system for indexing big data that’s at the core of Varada’s platform (with the open-source Presto SQL query engine being one of the other cornerstones).

Image Credits: Varada

Essentially, Varada embraces the idea of data lakes and enriches that with its indexing capabilities. And those indexing capabilities is where Varada’s smarts can be found. As Vanounou explained, the company is using a machine learning system to understand when users tend to run certain workloads and then caches the data ahead of time, making the system far faster than its competitors.

“If you think about big organizations and think about the workloads and the queries, what happens during the morning time is different from evening time. What happened yesterday is not what happened today. What happened on a rainy day is not what happened on a shiny day. […] We listen to what’s going on and we optimize. We leverage the indexing technology. We index what is needed when it is needed.”

That helps speed up queries, but it also means less data has to be replicated, which also brings down the cost. AÅs Mizmaa’s Aaron Applebaum noted, since Varada is not a SaaS solution, the buyers still get all of the discounts from their cloud providers, too.

In addition, the system can allocate resources intelligently to that different users can tap into different amounts of bandwidth. You can tell it to give customers more bandwidth than your financial analysts, for example.

“Data is growing like crazy: in volume, in scale, in complexity, in who requires it and what the business intelligence uses are, what the API uses are,” Applebaum said when I asked him why he decided to invest. “And compute is getting slightly cheaper, but not really, and storage is getting cheaper. So if you can make the trade-off to store more stuff, and access things more intelligently, more quickly, more agile — that was the basis of our thesis, as long as you can do it without compromising performance.”

Varada, with its team of experienced executives, architects and engineers, ticked a lot of the company’s boxes in this regard, but he also noted that unlike some other Israeli startups, the team understood that it had to listen to customers and understand their needs, too.

“In Israel, you have a history — and it’s become less and less the case — but historically, there’s a joke that it’s ‘ready, fire, aim.’ You build a technology, you’ve got this beautiful thing and you’re like, ‘alright, we did it,’ but without listening to the needs of the customer,” he explained.

The Varada team is not afraid to compare itself to Snowflake, which at least at first glance seems to make similar promises. Vananou praised the company for opening up the data warehousing market and proving that people are willing to pay for good analytics. But he argues that Varada’s approach is fundamentally different.

“We embrace the data lake. So if you are Mr. Customer, your data is your data. We’re not going to take it, move it, copy it. This is your single source of truth,” he said. And in addition, the data can stay in the company’s virtual private cloud. He also argues that Varada isn’t so much focused on the business users but the technologists inside a company.

 


By Frederic Lardinois

VESoft raises $8M to meet China’s growing need for graph databases

Sherman Ye founded VESoft in 2018 when he saw a growing demand for graph databases in China. Its predecessors like Neo4j and TigerGraph had already been growing aggressively in the West for a few years, while China was just getting to know the technology that leverages graph structures to store data sets and depict their relationships, such as those used for social media analysis, e-commerce recommendations, and financial risk management.

VESoft is ready for further growth after closing an $8 million funding round led by Redpoint China Ventures, an investment firm launched by Silicon Valley-based Redpoint Ventures in 2005. Existing investor Matrix Partners China also participated in the Series pre-A round. The new capital will allow the startup to develop products and expand to markets in North America, Europe, and other parts of Asia.

The 30-people team is comprised of former employees from Alibaba, Facebook, Huawei, and IBM. It’s based in Hangzhou, a scenic city known for its rich history and housing Alibaba and its financial affiliate Ant Financial, where Ye previously worked as a senior engineer after his four-year stint with Facebook in California. From 2017 to 2018, the entrepreneur noticed that Ant Financial’s customers were increasingly interested in adopting graph databases as an alternative to relational databases, a model that had been popular since the 80s and normally organizes data into tables.

“While relational databases are capable of achieving many functions carried out by graph databases… they deteriorate in performance as the quantity of data grows,” Yu told TechCrunch during an interview. “We didn’t use to have so much data.”

Information explosion is one reason why Chinese companies are turning to graph databases, which can handle millions of transactions to discover patterns within scattered data. The technology’s rise is also a response to new forms of online businesses that depend more on relationships.

“Take recommendations for example. The old model recommends content based purely on user profiles, but the problem of relying on personal browsing history is it fails to recommend new things. That was fine for a long time as the Chinese [internet] market was big enough to accommodate many players. But as the industry becomes saturated and crowded… companies need to ponder how to retain existing users, lengthen their time spent, and win users from rivals.”

The key lies in serving people content and products they find appealing. Graph databases come in handy, suggested Yu, when services try to predict users’ interest or behavior as the model uncovers what their friends or people within their social circles like. “That’s a lot more effective than feeding them what’s trending.”

Neo4j compares relational and graph databases (Link)

The company has made its software open source, which the founder believed can help cultivate a community of graph database users and educate the market in China. It will also allow VESoft to reach more engineers in the English-speaking world who are well-acquainted with the open-source culture.

“There is no such thing as being ‘international’ or ‘domestic’ for a technology-driven company. There are no boundaries between countries in the open-source world,” reckoned Yu.

When it comes to generating income, the startup plans to launch a paid version for enterprises, which will come with customized plug-ins and host services.

The Nebula Graph, the brand of VESoft’s database product, is now serving 20 enterprise clients from areas across social media, e-commerce, and finance including big names like food delivery giant Meituan, popular social commerce app Xiaohongshu, and e-commerce leader JD.com. A number of overseas companies are also trialing Nebula.

The time is ripe for enterprise-facing startups with a technological moat in China as the market for consumers has been divided by incumbents like Tencent and Alibaba. This makes fundraising relatively easy for VESoft. The founder is confident that Chinese companies are rapidly catching up with their Western counterparts in the space, for the gargantuan amount of data and the myriad of ways data is used in the country “will propel the technology forward.”


By Rita Liao

Couchbase raises $105M Series G funding round

Couchbase. the Santa Clara-based company behind the eponymous NoSQL cloud database service, today announced that it has raised a $105 million all-equity Series G round “to expand product development and global go-to-market capabilities.”

The oversubscribed round was led by GPI Capital, with participation from existing investors Accel, Sorenson Capital, North Bridge Venture Partners, Glynn Capital, Adams Street Partners and Mayfield. With this, the company has now raised a total of $251 million, according to Crunchbase.

Back in 2016, Couchbase raised a $30 million down round, which at the time was meant to be the company’s last round before an IPO. That IPO hasn’t materialized, but the company continues to grow, with 30 percent of the Fortune 100 now using its database. Couchbase also today announced that, over the course of the last fiscal year, it saw 70 percent total contract value growth, more than 50 percent new business growth and over 35 percent growth in average subscription deal size. In total, Couchbase said today, it is now seeing almost $100 million in committed annual recurring revenue.

“To be competitive today, enterprises must transform digitally, and use technology to get closer to their customers and improve the productivity of their workforces,” said Couchbase President and CEO Matt Cain in today’s announcement. “To do so, they require a cloud-native database built specifically to support modern web, mobile and IoT applications.  Application developers and enterprise architects rely on Couchbase to enable agile application development on a platform that performs at scale, from the public cloud to the edge, and provides operational simplicity and reliability. More and more, the largest companies in the world truly run their businesses on Couchbase, architecting their most business-critical applications on our platform.”

The company is playing in a large but competitive market, with the likes of MongoDB, DataStax and all the major cloud vendors vying for similar customers in the NoSQL space. One feature that has always made Couchbase stand out is Couchbase Mobile, which extends the service to the cloud. Like some of its competitors, the company has also recently placed its bets on the Kubernetes container orchestration tools with, for example the launch of its Autonomous Operator for Kubernetes 2.0. More importantly, though, the company also introduced its fully-managed Couchbase Cloud Database-as-a-Service in February, which allows businesses to run the database within their own virtual private cloud on public clouds like AWS and Microsoft Azure.

“We are excited to partner with Couchbase and view Couchbase Server’s highly performant, distributed architecture as purpose-built to support mission-critical use cases at scale,” said Alex Migon, a partner at GPI Capital and a new member of the company’s board of directors. “Couchbase has developed a truly enterprise-grade product, with leading support for cutting-edge application development and deployment needs.  We are thrilled to contribute to the next stage of the company’s growth.”

The company tells me that it plans to use the new funding to continue its “accelerated trajectory with investment in each of their three core pillars: sustained differentiation, profitable growth, and world class teams.” Of course, Couchbase will also continue to build new features for its NoSQL server, mobile platform and Couchbase Cloud — and in addition, the company will continue to expand geographically to serve its global customer operations.


By Frederic Lardinois

Microsoft launches Azure Synapse Link to help enterprises get faster insights from their data

At its Build developer conference, Microsoft today announced Azure Synapse Link, a new enterprise service that allows businesses to analyze their data faster and more efficiently, using an approach that’s generally called ‘hybrid transaction/analytical processing’ (HTAP). That’s a mouthful, it essentially enables enterprises to use the same database system for analytical and transactional workloads on a single system. Traditionally, enterprises had to make some tradeoffs between either building a single system for both that was often highly over-provisioned or to maintain separate systems for transactional and analytics workloads.

Last year, at its Ignite conference, Microsoft announced Azure Synapse Analytics, an analytics service that combines analytics and data warehousing to create what the company calls “the next evolution of Azure SQL Data Warehouse.” Synapse Analytics brings together data from Microsoft’s services and those from its partners and makes it easier to analyze.

“One of the key things, as we work with our customers on their digital transformation journey, there is an aspect of being data-driven, of being insights-driven as a culture, and a key part of that really is that once you decide there is some amount of information or insights that you need, how quickly are you able to get to that? For us, time to insight and a secondary element, which is the cost it takes, the effort it takes to build these pipelines and maintain them with an end-to-end analytics solution, was a key metric we have been observing for multiple years from our largest enterprise customers,” said Rohan Kumar, Microsoft’s corporate VP for Azure Data.

Synapse Link takes the work Microsoft did on Synaps Analytics a step further by removing the barriers between Azure’s operational databases and Synapse Analytics, so enterprises can immediately get value from the data in those databases without going through a data warehouse first.

“What we are announcing with Synapse Link is the next major step in the same vision that we had around reducing the time to insight,” explained Kumar. “And in this particular case, a long-standing barrier that exists today between operational databases and analytics systems is these complex ETL (extract, transform, load) pipelines that need to be set up just so you can do basic operational reporting or where, in a very transactionally consistent way, you need to move data from your operational system to the analytics system, because you don’t want impact the performance of the operational system in any way because that’s typically dealing with, depending on the system, millions of transactions per second.”

ETL pipelines, Kumar argued, are typically expensive and hard to build and maintain, yet enterprises are now building new apps — and maybe even line of business mobile apps — where any action that consumers take and that is registered in the operational database is immediately available for predictive analytics, for example.

From the user perspective, enabling this only takes a single click to link the two, while it removes the need for managing additional data pipelines or database resources. That, Kumar said, was always the main goal for Synapse Link. “With a single click, you should be able to enable real-time analytics on you operational data in ways that don’t have any impact on your operational systems, so you’re not using the compute part of your operational system to do the query, you actually have to transform the data into a columnar format, which is more adaptable for analytics, and that’s really what we achieved with Synapse Link.”

Because traditional HTAP systems on-premises typically share their compute resources with the operational database, those systems never quite took off, Kumar argued. In the cloud, with Synapse Link, though, that impact doesn’t exist because you’re dealing with two separate systems. Now, once a transaction gets committed to the operational database, the Synapse Link system transforms the data into a columnar format that is more optimized for the analytics system — and it does so in real time.

For now, Synapse Link is only available in conjunction with Microsoft’s Cosmos DB database. As Kumar told me, that’s because that’s where the company saw the highest demand for this kind of service, but you can expect the company to add support for available in Azure SQL, Azure Database for PostgreSQL and Azure Database for MySQL in the future.


By Frederic Lardinois

Microsoft partners with Redis Labs to improve its Azure Cache for Redis

For a few years now, Microsoft has offered Azure Cache for Redis, a fully managed caching solution built on top of the open-source Redis project. Today, it is expanding this service by adding Redis Enterprise, Redis Lab’s commercial offering, to its platform. It’s doing so in partnership with Redis Labs and while Microsoft will offer some basic support for the service, Redis Labs will handle most of the software support itself.

Julia Liuson, Microsoft’s corporate VP of its developer tools division, told me that the company wants to be seen as a partner to open-source companies like Redis Labs, which was among the first companies to change its license to prevent cloud vendors from commercializing and repackaging their free code without contributing back to the community. Last year, Redis Labs partnered with Google Cloud to bring its own fully managed service to its platform and so maybe it’s no surprise that we are now seeing Microsoft make a similar move.

Liuson tells me that with this new tier for Azure Cache for Redis, users will get a single bill and native Azure management, as well as the option to deploy natively on SSD flash storage. The native Azure integration should also make it easier for developers on Azure to integrate Redis Enterprise into their applications.

It’s also worth noting that Microsoft will support Redis Labs’ own Redis modules, including RediSearch, a Redis-powered search engine, as well as RedisBloom and RedisTimeSeries, which provide support for new datatypes in Redis.

“For years, developers have utilized the speed and throughput of Redis to produce unbeatable responsiveness and scale in their applications,” says Liuson. “We’ve seen tremendous adoption of Azure Cache for Redis, our managed solution built on open source Redis, as Azure customers have leveraged Redis performance as a distributed cache, session store, and message broker. The incorporation of the Redis Labs Redis Enterprise technology extends the range of use cases in which developers can utilize Redis, while providing enhanced operational resiliency and security.”


By Frederic Lardinois

Fishtown Analytics raises $12.9M Series A for its open-source analytics engineering tool

Philadelphia-based Fishtown Analytics, the company behind the popular open-source data engineering tool dbt, today announced that it has raised a $12.9 million Series A round led by Andreessen Horowitz, with the firm’s general partner Martin Casada joining the company’s board.

“I wrote this blog post in early 2016, essentially saying that analysts needed to work in a fundamentally different way,” Fishtown founder and CEO Tristan Handy told me, when I asked him about how the product came to be. “They needed to work in a way that much more closely mirrored the way the software engineers work and software engineers have been figuring this shit out for years and data analysts are still like sending each other Microsoft Excel docs over email.”

The dbt open-source project forms the basis of this. It allows anyone who can write SQL queries to transform data and then load it into their preferred analytics tools. As such, it sits in-between data warehouses and the tools that load data into them on one end, and specialized analytics tools on the other.

As Casada noted when I talked to him about the investment, data warehouses have now made it affordable for businesses to store all of their data before it is transformed. So what was traditionally “extract, transform, load” (ETL) has now become “extract, load, transform” (ELT). Andreessen Horowitz is already invested in Fivetran, which helps businesses move their data into their warehouses, so it makes sense for the firm to also tackle the other side of this business.

“Dbt is, as far as we can tell, the leading community for transformation and it’s a company we’ve been tracking for at least a year,” Casada said. He also argued that data analysts — unlike data scientists — are not really catered to as a group.

Before this round, Fishtown hadn’t raised a lot of money, even though it has been around for a few years now, except for a small SAFE round from Amplify.

But Handy argued that the company needed this time to prove that it was on to something and build a community. That community now consists of more than 1,700 companies that use the dbt project in some form and over 5,000 people in the dbt Slack community. Fishtown also now has over 250 dbt Cloud customers and the company signed up a number of big enterprise clients earlier this year. With that, the company needed to raise money to expand and also better service its current list of customers.

“We live in Philadelpha. The cost of living is low here and none of us really care to make a quadro-billion dollars, but we do want to answer the question of how do we best serve the community,” Handy said. “And for the first time, in the early part of the year, we were like, holy shit, we can’t keep up with all of the stuff that people need from us.”

The company plans to expand the team from 25 to 50 employees in 2020 and with those, the team plans to improve and expand the product, especially its IDE for data analysts, which Handy admitted could use a bit more polish.


By Frederic Lardinois

Collibra nabs another $112.5M at a $2.3B valuation for its big data management platform

GDPR and other data protection and privacy regulations — as well as a significant (and growing) number of data breaches and exposées of companies’ privacy policies — have put a spotlight on not just on the vast troves of data that businesses and other organizations hold on us, but also how they handle it. Today, one of the companies helping them cope with that data trove in a better and legal way is announcing a huge round of funding to continue that work. Collibra, which provides tools to manage, warehouse, store and analyse data troves, is today announcing that it has raised $112.5 million in funding, at a post-money valuation of $2.3 billion.

The funding — a Series F from the looks of it — represents a big bump for the startup, which last year raised $100 million at a valuation of just over $1 billion. This latest round was co-led by ICONIQ Capital, Index Ventures, and Durable Capital Partners LIP, with previous investors CapitalG (Google’s growth fund), Battery Ventures, and Dawn Capital also participating.

Collibra, originally a spin-out from Vrije Universiteit in Brussels, Belgium, today works with some 450 enterprises and other large organizations — customers include Adobe, Verizon (which owns TechCrunch), insurers AXA, and a number of healthcare providers. Its products cover a range of services focused around company data, including tools to help customers comply with local data protection policies, store it securely, and to run analytics and more.

These are all tools that have long had a place in enterprise big data IT, but have become increasingly more used and in-demand both as data policies have expanded, and as the prospects of what can be discovered through big data analytics have become more advanced. With that growth, many companies have realised that they are not in a position to use and store their data in the best possible way, and that is where companies like Collibra step in.

“Most large organizations are in data chaos,” Felix Van de Maele, co-founder and CEO, previously told us. “We help them understand what data they have, where they store it and [understand] whether they are allowed to use it.”

As you would expect with a big IT trend, Collibra is not the only company chasing this opportunity. Competitors include Informatica, IBM, Talend, Egnyte, among a number of others, but the market position of Collibra, and its advanced technology, is what has continued to impress investors.

“Durable Capital Partners invests in innovative companies that have significant potential to shape growing industries and build larger companies,” said Henry Ellenbogen, founder and chief investment officer for Durable Capital Partners LP, in a statement (Ellenbogen is formerly an investment manager a T. Rowe Price, and this is his first investment in Collibra under Durable). “We believe Collibra is a leader in the Data Intelligence category, a space that could have a tremendous impact on global business operations and a space that we expect will continue to grow as data becomes an increasingly critical asset.”

“We have a high degree of conviction in Collibra and the importance of the company’s mission to help organizations benefit from their data,” added Matt Jacobson, general partner at ICONIQ Capital and Collibra board member, in his own statement. “There is an increasing urgency for enterprises to harness their data for strategic business decisions. Collibra empowers organizations to use their data to make critical business decisions, especially in uncertain business environments.”


By Ingrid Lunden

Datastax acquires The Last Pickle

Data management company Datastax, one of the largest contributors to the Apache Cassandra project, today announced that it has acquired The Last Pickle (and no, I don’t know what’s up with that name either), a New Zealand-based Cassandra consulting and services firm that’s behind a number of popular open-source tools for the distributed NoSQL database.

As Datastax Chief Strategy Officer Sam Ramji, who you may remember from his recent tenure at Apigee, the Cloud Foundry Foundation, Google and Autodesk, told me, The Last Pickle is one of the premier Apache Cassandra consulting and services companies. The team there has been building Cassandra-based open source solutions for the likes of Spotify, T Mobile and AT&T since it was founded back in 2012. And while The Last Pickle is based in New Zealand, the company has engineers all over the world that do the heavy lifting and help these companies successfully implement the Cassandra database technology.

It’s worth mentioning that Last Pickle CEO Aaron Morton first discovered Cassandra when he worked for WETA Digital on the special effects for Avatar, where the team used Cassandra to allow the VFX artists to store their data.

“There’s two parts to what they do,” Ramji explained. “One is the very visible consulting, which has led them to become world experts in the operation of Cassandra. So as we automate Cassandra and as we improve the operability of the project with enterprises, their embodied wisdom about how to operate and scale Apache Cassandra is as good as it gets — the best in the world.” And The Last Pickle’s experience in building systems with tens of thousands of nodes — and the challenges that its customers face — is something Datastax can then offer to its customers as well.

And Datastax, of course, also plans to productize The Last Pickle’s open-source tools like the automated repair tool Reaper and the Medusa backup and restore system.

As both Ramji and Datastax VP of Engineering Josh McKenzie stressed, Cassandra has seen a lot of commercial development in recent years, with the likes of AWS now offering a managed Cassandra service, for example, but there wasn’t all that much hype around the project anymore. But they argue that’s a good thing. Now that it is over ten years old, Cassandra has been battle-hardened. For the last ten years, Ramji argues, the industry tried to figure out what the de factor standard for scale-out computing should be. By 2019, it became clear that Kubernetes was the answer to that.

“This next decade is about what is the de facto standard for scale-out data? We think that’s got certain affordances, certain structural needs and we think that the decades that Cassandra has spent getting harden puts it in a position to be data for that wave.”

McKenzie also noted that Cassandra provides users with a number of built-in features like support for mutiple data centers and geo-replication, rolling updates and live scaling, as well as wide support across programming languages, give it a number of advantages over competing databases.

“It’s easy to forget how much Cassandra gives you for free just based on its architecture,” he said. “Losing the power in an entire datacenter, upgrading the version of the database, hardware failing every day? No problem. The cluster is 100 percent always still up and available. The tooling and expertise of The Last Pickle really help bring all this distributed and resilient power into the hands of the masses.”

The two companies did not disclose the price of the acquisition.


By Frederic Lardinois

Satori Cyber raises $5.25M to help businesses protect their data flows

The amount of data that most companies now store — and the places they store it — continues to increase rapidly. With that, the risk of the wrong people managing to get access to this data also increases, so it’s no surprise that we’re now seeing a number of startups that focus on protecting this data and how it flows between clouds and on-premises servers. Satori Cyber, which focuses on data protecting and governance, today announced that it has raised a $5.25 million seed round led by YL Ventures.

“We believe in the transformative power of data to drive innovation and competitive advantage for businesses,” the company says. “We are also aware of the security, privacy and operational challenges data-driven organizations face in their journey to enable broad and optimized data access for their teams, partners and customers. This is especially true for companies leveraging cloud data technologies.”

Satori is officially coming out of stealth mode today and launching its first product, the Satori Cyber Secure Data Access Cloud. This service provides enterprises with the tools to provide access controls for their data, but maybe just as importantly, it also offers these companies and their security teams visibility into their data flows across cloud and hybrid environments. The company argues that data is “a moving target” because it’s often hard to know how exactly it moves between services and who actually has access to it. With most companies now splitting their data between lots of different data stores, that problem only becomes more prevalent over time and continuous visibility becomes harder to come by.

“Until now, security teams have relied on a combination of highly segregated and restrictive data access and one-off technology-specific access controls within each data store, which has only slowed enterprises down,” said Satori Cyber CEO and Co-founder Eldad Chai. “The Satori Cyber platform streamlines this process, accelerates data access and provides a holistic view across all organizational data flows, data stores and access, as well as granular access controls, to accelerate an organization’s data strategy without those constraints.”

Both co-founders previously spent nine years building security solutions at Imperva and Incapsula (which acquired Imperva in 2014). Based on this experience, they understood that onboarding had to be as easy as possible and that operations would have to be transparent to the users. “We built Satori’s Secure Data Access Cloud with that in mind, and have designed the onboarding process to be just as quick, easy and painless. On-boarding Satori involves a simple host name change and does not require any changes in how your organizational data is accessed or used,” they explain.

 

 


By Frederic Lardinois

Microsoft’s Azure Synapse Analytics bridges the gap between data lakes and warehouses

At its annual Ignite conference in Orlando, Fla., Microsoft today announced a major new Azure service for enterprises: Azure Synapse Analytics, which Microsoft describes as “the next evolution of Azure SQL Data Warehouse.” Like SQL Data Warehouse, it aims to bridge the gap between data warehouses and data lakes, which are often completely separate. Synapse also taps into a wide variety of other Microsoft services, including Power BI and Azure Machine Learning, as well as a partner ecosystem that includes Databricks, Informatica, Accenture, Talend, Attunity, Pragmatic Works and Adatis. It’s also integrated with Apache Spark.

The idea here is that Synapse allows anybody working with data in those disparate places to manage and analyze it from within a single service. It can be used to analyze relational and unstructured data, using standard SQL.

Screen Shot 2019 10 31 at 10.11.48 AM

Microsoft also highlights Synapse’s integration with Power BI, its easy to use business intelligence and reporting tool, as well as Azure Machine Learning for building models.

With the Azure Synapse studio, the service provides data professionals with a single workspace for prepping and managing their data, as well as for their big data and AI tasks. There’s also a code-free environment for managing data pipelines.

As Microsoft stresses, businesses that want to adopt Synapse can continue to use their existing workloads in production with Synapse and automatically get all of the benefits of the service. “Businesses can put their data to work much more quickly, productively, and securely, pulling together insights from all data sources, data warehouses, and big data analytics systems,” writes Microsoft CVP of Azure Data, Rohan Kumar.

In a demo at Ignite, Kumar also benchmarked Synapse against Google’s BigQuery. Synapse ran the same query over a petabyte of data in 75% less time. He also noted that Synapse can handle thousands of concurrent users — unlike some of Microsoft’s competitors.


By Frederic Lardinois

Microsoft acquires data privacy and governance service BlueTalon

Microsoft today announced that it has acquired BlueTalon, a data privacy and governance service that helps enterprises set policies for how their employees can access their data. The service then enforces those policies across most popular data environments and provides tools for auditing policies and access, too.

Neither Microsoft nor BlueTalon disclosed the financial details of the transaction. Ahead of today’s acquisition, BlueTalon had raised about $27.4 million, according to Crunchbase. Investors include Bloomberg Beta, Maverick Ventures, Signia Venture Partners and Stanford’s StartX fund.

BlueTalon Policy Engine How it works

“The IP and talent acquired through BlueTalon brings a unique expertise at the apex of big data, security and governance,” writes Rohan Kumar, Microsoft’s corporate VP for Azure Data. “This acquisition will enhance our ability to empower enterprises across industries to digitally transform while ensuring right use of data with centralized data governance at scale through Azure.”

Unsurprisingly, the BlueTalon team will become part of the Azure Data Governance group, where the team will work on enhancing Microsoft’s capabilities around data privacy and governance. Microsoft already offers access and governance control tools for Azure, of course. As virtually all businesses become more data-centric, though, the need for centralized access controls that work across systems is only going to increase and new data privacy laws aren’t making this process easier.

“As we began exploring partnership opportunities with various hyperscale cloud providers to better serve our customers, Microsoft deeply impressed us,” BlueTalon CEO Eric Tilenius, who has clearly read his share of “our incredible journey” blog posts, explains in today’s announcement. “The Azure Data team was uniquely thoughtful and visionary when it came to data governance. We found them to be the perfect fit for us in both mission and culture. So when Microsoft asked us to join forces, we jumped at the opportunity.”


By Frederic Lardinois