SEC filing indicates big data provider Palantir is raising $961M, $550M of it already secured

Palantir, the secretive big data and analytics provider that works with governments and other public and private organizations to power national security, health and a variety of other services, has reportedly been eyeing up a public listing this autumn. But in the meantime it’s also continuing to push ahead in the private markets.

The company has filed a Form D indicating that it is in the process of raising nearly $1 billion — $961,099,010, to be exact — with $549,727,437 of that already sold, and a further $411,371,573 remaining to be raised.

The filing appears to confirm a report from back in September 2019 that the company was seeking to raise between $1 billion and $3 billion, its first fundraising in four years. That report noted Palantir was targeting a $26 billion valuation, up from $20 billion four years ago. A Reuters article from June put its valuation on secondary market trades at between $10 billion and $14 billion.

The bigger story of that Reuters report was that Palantir confirmed two fundraises from strategic investors that both work with the company: $500 million in funding from Japanese insurance company Sompo Holdings, and $50 million from Fujitsu. Together, it seems like these might account for $550 million already sold on the Form D.

It’s not clear if this fundraise would essentially mean a delay to a public listing, or if it would complement it.

To date Palantir has raised $3.3 billion in funding, according to PitchBook data, with no less than 108 investors on its cap table. But if you dig into the PitchBook data (some of which is behind a paywall) it also seems that Palantir has raised a number of other rounds of undisclosed amounts. Confusingly (but probably apt for a company famous for being secretive) some of that might also be part of this Form D amount.

We have reached out to Palantir to ask about the Form D and will update this post as we learn more.

While Palantir was last valued at $20 billion when it last raised money four years ago, there are some data points that point to a bigger valuation today.

In April, according to a Bloomberg report, the company briefed investors with documents showing that it expects to make $1 billion in revenues this year, up 38% on 2019, and breaking even in the first time since being founded 16 years ago by Peter Thiel, Nathan Gettings, Joe Lonsdale, Stephen Cohen, and current CEO, Alex Karp.

(The Bloomberg report didn’t explain why Palantir was briefing investors, whether for a potential public listing, or for the fundraise we’re reporting on here, or something else.)

On top of that, the company has been in the news a lot around the global novel coronavirus pandemic. Specifically, it’s been winning business, in the form of projects in major markets like the UK (where it’s part of a consortium of companies working with the NHS on a COVID-19 data trove) and the US (where it’s been working on a COVID-19 tracker for the federal government and a project with the CDC), and possibly others. Those projects will presumably need a lot of upfront capital to set up and run, possibly one reason raising money now.


By Ingrid Lunden

Collibra nabs another $112.5M at a $2.3B valuation for its big data management platform

GDPR and other data protection and privacy regulations — as well as a significant (and growing) number of data breaches and exposées of companies’ privacy policies — have put a spotlight on not just on the vast troves of data that businesses and other organizations hold on us, but also how they handle it. Today, one of the companies helping them cope with that data trove in a better and legal way is announcing a huge round of funding to continue that work. Collibra, which provides tools to manage, warehouse, store and analyse data troves, is today announcing that it has raised $112.5 million in funding, at a post-money valuation of $2.3 billion.

The funding — a Series F from the looks of it — represents a big bump for the startup, which last year raised $100 million at a valuation of just over $1 billion. This latest round was co-led by ICONIQ Capital, Index Ventures, and Durable Capital Partners LIP, with previous investors CapitalG (Google’s growth fund), Battery Ventures, and Dawn Capital also participating.

Collibra, originally a spin-out from Vrije Universiteit in Brussels, Belgium, today works with some 450 enterprises and other large organizations — customers include Adobe, Verizon (which owns TechCrunch), insurers AXA, and a number of healthcare providers. Its products cover a range of services focused around company data, including tools to help customers comply with local data protection policies, store it securely, and to run analytics and more.

These are all tools that have long had a place in enterprise big data IT, but have become increasingly more used and in-demand both as data policies have expanded, and as the prospects of what can be discovered through big data analytics have become more advanced. With that growth, many companies have realised that they are not in a position to use and store their data in the best possible way, and that is where companies like Collibra step in.

“Most large organizations are in data chaos,” Felix Van de Maele, co-founder and CEO, previously told us. “We help them understand what data they have, where they store it and [understand] whether they are allowed to use it.”

As you would expect with a big IT trend, Collibra is not the only company chasing this opportunity. Competitors include Informatica, IBM, Talend, Egnyte, among a number of others, but the market position of Collibra, and its advanced technology, is what has continued to impress investors.

“Durable Capital Partners invests in innovative companies that have significant potential to shape growing industries and build larger companies,” said Henry Ellenbogen, founder and chief investment officer for Durable Capital Partners LP, in a statement (Ellenbogen is formerly an investment manager a T. Rowe Price, and this is his first investment in Collibra under Durable). “We believe Collibra is a leader in the Data Intelligence category, a space that could have a tremendous impact on global business operations and a space that we expect will continue to grow as data becomes an increasingly critical asset.”

“We have a high degree of conviction in Collibra and the importance of the company’s mission to help organizations benefit from their data,” added Matt Jacobson, general partner at ICONIQ Capital and Collibra board member, in his own statement. “There is an increasing urgency for enterprises to harness their data for strategic business decisions. Collibra empowers organizations to use their data to make critical business decisions, especially in uncertain business environments.”


By Ingrid Lunden

BackboneAI scores $4.7M seed to bring order to intercompany data sharing

BackboneAI, an early-stage startup that wants to help companies dealing with lots of data, particularly coming from a variety of external sources, announced a $4.7 million seed investment today.

The round was led by Fika Ventures with participation from Boldstart Ventures, Dynamo Ventures, GGV Capital, MetaProp, Spider VC and several other unnamed investors.

Company founder Rob Bailey says he has spent a lot of time in his career watching how data flows in organizations. There are still a myriad of challenges related to moving data between organizations, and that’s what his company is trying to solve. “BackboneAI is an AI platform specifically built for automating data flows within and between companies,” he said.

This could involve any number of scenarios from keeping large, complex data catalogues up-to-date to coordinating the intricate flow of construction materials between companies or content rights management across an entertainment industry.

Bailey says that he spent 18 months talking to companies before he built the product. “What we found is that every company we talked to was, in some way or another, concerned about an absolute flood of data from all these different applications and from all the companies that they’re working with externally,” he explained.

The BackboneAI platform aims to solve a number of problems related to this. For starters, it automates the acquisition of this data, usually from third parties like suppliers, customers, regulatory agencies and so forth. Then it handles ingestion of the data, and finally it takes care of a lot of actual processing from external sources, while mapping it to internal systems like the company ERP system.

As an example, he uses an industrial supply company that may deal with a million SKUs across a couple of dozen divisions. Trying to track that with manual or even legacy systems is difficult. “They take all this product data in [from external suppliers], and then process the information in their own [internal] product catalog, and then finally present that data about those products to hundreds of thousands of customers. It’s an incredibly large and challenging data problem as you’re processing millions and millions of SKUs and orders, and you have to keep that data current on a regular basis,” he explained.

The company is just getting started. It spent 2019 incubating inside of Boldstart Ventures . Today the company has close to 20 employees in New York City, and it has signed its first Fortune 500 customer. Bailey says they have 15 additional Fortune 500 companies in the pipeline. With the seed money, he hopes to build on this initial success.


By Ron Miller

Databricks brings its Delta Lake project to the Linux Foundation

Databricks, the big data analytics service founded by the original developers of Apache Spark, today announced that it is bringing its Delta Lake open-source project for building data lakes to the Linux Foundation and under an open governance model. The company announced the launch of Delta Lake earlier this year and even though it’s still a relatively new project, it has already been adopted by many organizations and has found backing from companies like Intel, Alibaba and Booz Allen Hamilton.

“In 2013, we had a small project where we added SQL to Spark at Databricks […] and donated it to the Apache Foundation,” Databricks CEO and co-founder Ali Ghodsi told me. “Over the years, slowly people have changed how they actually leverage Spark and only in the last year or so it really started to dawn upon us that there’s a new pattern that’s emerging and Spark is being used in a completely different way than maybe we had planned initially.”

This pattern, he said, is that companies are taking all of their data and putting it into data lakes and then do a couple of things with this data, machine learning and data science being the obvious ones. But they are also doing things that are more traditionally associated with data warehouses, like business intelligence and reporting. The term Ghodsi uses for this kind of usage is ‘Lake House.’ More and more, Databricks is seeing that Spark is being used for this purpose and not just to replace Hadoop and doing ETL (extract, transform, load). “This kind of Lake House patterns we’ve seen emerge more and more and we wanted to double down on it.”

Spark 3.0, which is launching today, enables more of these use cases and speeds them up significantly, in addition to the launch of a new feature that enables you to add a pluggable data catalog to Spark.

Data Lake, Ghodsi said, is essentially the data layer of the Lake House pattern. It brings support for ACID transactions to data lakes, scalable metadata handling, and data versioning, for example. All the data is stored in the Apache Parquet format and users can enforce schemas (and change them with relative ease if necessary).

It’s interesting to see Databricks choose the Linux Foundation for this project, given that its roots are in the Apache Foundation. “We’re super excited to partner with them,” Ghodsi said about why the company chose the Linux Foundation. “They run the biggest projects on the planet, including the Linux project but also a lot of cloud projects. The cloud-native stuff is all in the Linux Foundation.”

“Bringing Delta Lake under the neutral home of the Linux Foundation will help the open source community dependent on the project develop the technology addressing how big data is stored and processed, both on-prem and in the cloud,” said Michael Dolan, VP of Strategic Programs at the Linux Foundation. “The Linux Foundation helps open source communities leverage an open governance model to enable broad industry contribution and consensus building, which will improve the state of the art for data storage and reliability.”


By Frederic Lardinois

Satya Nadella looks to the future with edge computing

Speaking today at the Microsoft Government Leaders Summit in Washington DC, Microsoft CEO Satya Nadella made the case for edge computing, even while pushing the Azure cloud as what he called “the world’s computer.”

While Amazon, Google and other competitors may have something to say about that, marketing hype aside, many companies are still in the midst of transitioning to the cloud. Nadella says the future of computing could actually be at the edge where computing is done locally before data is then transferred to the cloud for AI and machine learning purposes. What goes around, comes around.

But as Nadella sees it, this is not going to be about either edge or cloud. It’s going to be the two technologies working in tandem. “Now, all this is being driven by this new tech paradigm that we describe as the intelligent cloud and the intelligent edge,” he said today.

He said that to truly understand the impact the edge is going to have on computing, you have to look at research, which predicts there will be 50 billion connected devices in the world by 2030, a number even he finds astonishing. “I mean this is pretty stunning. We think about a billion Windows machines or a couple of billion smartphones. This is 50 billion [devices], and that’s the scope,” he said.

The key here is that these 50 billion devices, whether you call them edge devices or the Internet of Things, will be generating tons of data. That means you will have to develop entirely new ways of thinking about how all this flows together. “The capacity at the edge, that ubiquity is going to be transformative in how we think about computation in any business process of ours,” he said. As we generate ever-increasing amounts of data, whether we are talking about public sector kinds of use case, or any business need, it’s going to be the fuel for artificial intelligence, and he sees the sheer amount of that data driving new AI use cases.

“Of course when you have that rich computational fabric, one of the things that you can do is create this new asset, which is data and AI. There is not going to be a single application, a single experience that you are going to build, that is not going to be driven by AI, and that means you have to really have the ability to reason over large amounts of data to create that AI,” he said.

Nadella would be more than happy to have his audience take care of all that using Microsoft products, whether Azure compute, database, AI tools or edge computers like the Data Box Edge it introduced in 2018. While Nadella is probably right about the future of computing, all of this could apply to any cloud, not just Microsoft.

As computing shifts to the edge, it’s going to have a profound impact on the way we think about technology in general, but it’s probably not going to involve being tied to a single vendor, regardless of how comprehensive their offerings may be.


By Ron Miller

Data storage company Cloudian launches a new edge analytics subsidiary called Edgematrix

Cloudian, a company that enables businesses to store and manage massive amounts of data, announced today the launch of Edgematrix, a new unit focused on edge analytics for large data sets. Edgematrix, a majority-owned subsidiary of Cloudian, will first be available in Japan, where both companies are based. It has raised a $9 million Series A from strategic investors NTT Docomo, Shimizu Corporation and Japan Post Capital, as well as Cloudian co-founder and CEO Michael Tso and board director Jonathan Epstein. The funding will be used on product development, deployment and sales and marketing.

Cloudian itself has raised a total of $174 million, including a $94 million Series E round announced last year. Its products include the Hyperstore platform, which allows businesses to store hundreds of petrabytes of data on premise, and software for data analytics and machine learning. Edgematrix uses Hyperstore for storing large-scale data sets and its own AI software and hardware for data processing at the “edge” of networks, closer to where data is collected from IoT devices like sensors.

The company’s solutions were created for situations where real-time analytics is necessary. For example, it can be used to detect the make, model and year of cars on highways so targeted billboard ads can be displayed to their drivers.

Tso told TechCrunch in an email that Edgematrix was launched after Cloudian co-founder and president Hiroshi Ohta and a team spent two years working on technology to help Cloudian customers process and analyze their data more efficiently.

“With more and more data being created at the edge, including IoT data, there’s a growing need for being able to apply real-time data analysis and decision-making at or near the edge, minimizing the transmission costs and latencies involved in moving the data elsewhere,” said Tso. “Based on the initial success of a small Cloudian team developing AI software solutions and attracting a number of top-tier customers, we decided that the best way to build on this success was establishing a subsidiary with strategic investors.”

Edgematrix is launching in Japan first because spending on AI systems there is expected to grow faster than in any other market, at a compound annual growth rate of 45.3% from 2018 to 2023, according to IDC.

“Japan has been ahead of the curve as an early adopter of AI technology, with both the governmetn and private sector viewing it as essential to boosting productivity,” said Tso. “Edgematrix will focus on the Japanese market for at least the next year, and assuming that all goes well, it would then expand to North America and Europe.”


By Catherine Shu

Microsoft acquires data privacy and governance service BlueTalon

Microsoft today announced that it has acquired BlueTalon, a data privacy and governance service that helps enterprises set policies for how their employees can access their data. The service then enforces those policies across most popular data environments and provides tools for auditing policies and access, too.

Neither Microsoft nor BlueTalon disclosed the financial details of the transaction. Ahead of today’s acquisition, BlueTalon had raised about $27.4 million, according to Crunchbase. Investors include Bloomberg Beta, Maverick Ventures, Signia Venture Partners and Stanford’s StartX fund.

BlueTalon Policy Engine How it works

“The IP and talent acquired through BlueTalon brings a unique expertise at the apex of big data, security and governance,” writes Rohan Kumar, Microsoft’s corporate VP for Azure Data. “This acquisition will enhance our ability to empower enterprises across industries to digitally transform while ensuring right use of data with centralized data governance at scale through Azure.”

Unsurprisingly, the BlueTalon team will become part of the Azure Data Governance group, where the team will work on enhancing Microsoft’s capabilities around data privacy and governance. Microsoft already offers access and governance control tools for Azure, of course. As virtually all businesses become more data-centric, though, the need for centralized access controls that work across systems is only going to increase and new data privacy laws aren’t making this process easier.

“As we began exploring partnership opportunities with various hyperscale cloud providers to better serve our customers, Microsoft deeply impressed us,” BlueTalon CEO Eric Tilenius, who has clearly read his share of “our incredible journey” blog posts, explains in today’s announcement. “The Azure Data team was uniquely thoughtful and visionary when it came to data governance. We found them to be the perfect fit for us in both mission and culture. So when Microsoft asked us to join forces, we jumped at the opportunity.”


By Frederic Lardinois

With Tableau and Mulesoft, Salesforce gains full view of enterprise data

Back in the 2010 timeframe, it was common to say that content was king, but after watching Google buy Looker for $2.6 billion last week and Salesforce nab Tableau for $15.7 billion this morning, it’s clear that data has ascended to the throne in a business context.

We have been hearing about Big Data for years, but we’ve probably reached a point in 2019 where the data onslaught is really having an impact on business. If you can find the key data nuggets in the big data pile, it can clearly be a competitive advantage, and companies like Google and Salesforce are pulling out their checkbooks to make sure they are in a position to help you out.

While Google, as a cloud infrastructure vendor, is trying to help companies on its platform and across the cloud understand and visualize all that data, Salesforce as a SaaS vendor might have a different reason — one that might surprise you — given that Salesforce was born in the cloud. But perhaps it recognizes something fundamental. If it truly wants to own the enterprise, it has to have a hybrid story, and with Mulesoft and Tableau, that’s precisely what it has — and why it was willing to spend around $23 billion to get it.

Making connections

Certainly, Salesforce chairman Marc Benioff has no trouble seeing the connections between his two big purchases over the last year. He sees the combination of Mulesoft connecting to the data sources and Tableau providing a way to visualize as a “beautiful thing.”


By Ron Miller

Tealium, a big data platform for structuring disparate customer information, raises $55M led by Silver Lake

The average enterprise today uses about 90 different software packages, with between 30-40 of them touching customers directly or indirectly. The data that comes out of those systems can prove to be very useful — to help other systems and employees work more intelligently, to help companies make better business decisions — but only if it’s put in order: now, a startup called Tealium, which has built a system precisely to do just that and works with the likes of Facebook and IBM to help manage their customer data, has raised a big round of funding to continue building out the services it provides.

Today, it is announcing a $55 million round of funding — a Series F led by Silver Lake Waterman, the firm’s late-stage capital growth fund; with ABN AMRO, Bain Capital, Declaration Partners, Georgian Partners, Industry Ventures, Parkwood, and Presidio Ventures also participating.

Jeff Lunsford, Tealium’s CEO, said that the company is not disclosing valuation, but he did say that it was “substantially” higher than when the company was last priced three years ago. (For context, that valuation was $305 million in 2016, according to PitchBook — a figure Lunsford didn’t dispute when I spoke with him about it.)

He added that the company is close to profitability and is projected to make $100 million in revenues this year, and that this is being considered the company’s “final round” — presumably a sign that it will either no longer need external funding and that if it does, the next step might be either getting acquired or going public.

This brings the total raised by Tealium to $160 million.

The company’s rise over the last eight years has dovetailed with the rapid growth of big data. The movement of services to digital platforms has resulted in a sea of information. Much of that largely sits untapped, but those who are able to bring it to order can reap the rewards by gaining better insights into their organizations.

Tealium had its beginnings in amassing and ordering tags from internet traffic to help optimise marketing and so on — a business where it competes with the likes of Google and Adobe.

Over time, it has expanded and capitalised to a much wider set of data sources that range well beyond web and commerce, and one use of the funding will be to continue expanding those data sources, and also how they are used, with an emphasis on using more AI, Lunsford said.

“There are new areas that touch customers like smart home and smart office hardware, and each requires a step up in integration for a company like us,” he said. “Then once you have it all centralised you could feed machine learning algorithms to have tighter predictions.”

That vast potential is one reason for the investor interest.

“Tealium enables enterprises to solve the customer data fragmentation problem by integrating and enriching data across sources,in real-time to create audiences while providing data governance and fidelity” said Shawn O’Neill, Managing Director, of Silver Lake Waterman, in a statement. “Jeff and his team have built a great platform and we are excited to support the company’s continued growth and investment in innovation.”

The rapid growth of digital services has already seen the company getting a big boost in terms of the data that is passing through its cloud-based platform: it has had a 300 percent year-over-year increase in visitor profiles created, with current tech customers including the likes of Facebook, IBM, Visa and others from across a variety of sectors, such as healthcare, finance and more.

“You’d be surprised how many big tech companies use Telium,” Lunsford said. “Even they have a limited amount of bandwidth when it comes to developing their internal platforms.”

People like to say that “data is the new oil”, but these days that expression has taken on perhaps an unintended meaning: just like the overconsumption of oil and fossil fuels in general is viewed as detrimental to the long-term health of our planet, the overconsumption of data has also become a very problematic spectre in our very pervasive world of tech.

Governments — the European Union being one notable example — are taking up the challenge of that latter issue with new regulations, specifically GDPR. Interestingly, Lunsford says this has been a good thing rather than a bad thing for his company, as it gives a much clearer directive to companies about what they can use, and how it can be used.

“They want to follow the law,” he said of their clients, “and we give them the data freedom and control to do that.” It’s not the only company tackling the business opportunity of being a big-data repository at a time when data misuse is being scrutinised more than ever: InCountry, which launched weeks ago, is also banking on this gap in the market.

I’d argue that this could potentially be one more reason why Tealium is keen on expanding to areas like IoT and other sources of customer information: just like the sea, the pool of data that’s there for the tapping is nearly limitless.


By Ingrid Lunden

Why Daimler moved its big data platform to the cloud

Like virtually every big enterprise company, a few years ago, the German auto giant Daimler decided to invest in its own on-premises data centers. And while those aren’t going away anytime soon, the company today announced that it has successfully moved its on-premises big data platform to Microsoft’s Azure cloud. This new platform, which the company calls eXtollo, is Daimler’s first major service to run outside of its own data centers, though it’ll probably not be the last.

As Daimler’s head of its corporate center of excellence for advanced analytics and big data Guido Vetter told me, that the company started getting interested in big data about five years ago. “We invested in technology — the classical way, on-premise — and got a couple of people on it. And we were investigating what we could do with data because data is transforming our whole business as well,” he said.

By 2016, the size of the organization had grown to the point where a more formal structure was needed to enable the company to handle its data at a global scale. At the time, the buzzword was ‘data lakes’ and the company started building its own in order to build out its analytics capacities.

Electric Line-Up, Daimler AG

“Sooner or later, we hit the limits as it’s not our core business to run these big environments,” Vetter said. “Flexibility and scalability are what you need for AI and advanced analytics and our whole operations are not set up for that. Our backend operations are set up for keeping a plant running and keeping everything safe and secure.” But in this new world of enterprise IT, companies need to be able to be flexible and experiment — and, if necessary, throw out failed experiments quickly.

So about a year and a half ago, Vetter’s team started the eXtollo project to bring all the company’s activities around advanced analytics, big data and artificial intelligence into the Azure Cloud and just over two weeks ago, the team shut down its last on-premises servers after slowly turning on its solutions in Microsoft’s data centers in Europe, the U.S. and Asia. All in all, the actual transition between the on-premises data centers and the Azure cloud took about nine months. That may not seem fast, but for an enterprise project like this, that’s about as fast as it gets (and for a while, it fed all new data into both its on-premises data lake and Azure).

If you work for a startup, then all of this probably doesn’t seem like a big deal, but for a more traditional enterprise like Daimler, even just giving up control over the physical hardware where your data resides was a major culture change and something that took quite a bit of convincing. In the end, the solution came down to encryption.

“We needed the means to secure the data in the Microsoft data center with our own means that ensure that only we have access to the raw data and work with the data,” explained Vetter. In the end, the company decided to use thethAzure Key Vault to manage and rotate its encryption keys. Indeed, Vetter noted that knowing that the company had full control over its own data was what allowed this project to move forward.

Vetter tells me that the company obviously looked at Microsoft’s competitors as well, but he noted that his team didn’t find a compelling offer from other vendors in terms of functionality and the security features that it needed.

Today, Daimler’s big data unit uses tools like HD Insights and Azure Databricks, which covers more than 90 percents of the company’s current use cases. In the future, Vetter also wants to make it easier for less experienced users to use self-service tools to launch AI and analytics services.

While cost is often a factor that counts against the cloud since renting server capacity isn’t cheap, Vetter argues that this move will actually save the company money and that storage cost, especially, are going to be cheaper in the cloud than in its on-premises data center (and chances are that Daimler, given its size and prestige as a customer, isn’t exactly paying the same rack rate that others are paying for the Azure services).

As with so many big data AI projects, predictions are the focus of much of what Daimler is doing. That may mean looking at a car’s data and error code and helping the technician diagnose an issue or doing predictive maintenance on a commercial vehicle. Interestingly, the company isn’t currently bringing any of its own IoT data from its plants to the cloud. That’s all managed in the company’s on-premises data centers because it wants to avoid the risk of having to shut down a plant because its tools lost the connection to a data center, for example.


By Frederic Lardinois

Microsoft Azure sets its sights on more analytics workloads

Enterprises now amass huge amounts of data, both from their own tools and applications, as well as from the SaaS applications they use. For a long time, that data was basically exhaust. Maybe it was stored for a while to fulfill some legal requirements, but then it was discarded. Now, data is what drives machine learning models, and the more data you have, the better. It’s maybe no surprise, then, that the big cloud vendors started investing in data warehouses and lakes early on. But that’s just a first step. After that, you also need the analytics tools to make all of this data useful.

Today, it’s Microsoft turn to shine the spotlight on its data analytics services. The actual news here is pretty straightforward. Two of these are services that are moving into general availability: the second generation of Azure Data Lake Storage for big data analytics workloads and Azure Data Explorer, a managed service that makes easier ad-hoc analysis of massive data volumes. Microsoft is also previewing a new feature in Azure Data Factory, its graphical no-code service for building data transformation. Data Factory now features the ability to map data flows.

Those individual news pieces are interesting if you are a user or are considering Azure for your big data workloads, but what’s maybe more important here is that Microsoft is trying to offer a comprehensive set of tools for managing and storing this data — and then using it for building analytics and AI services.

(Photo credit:Josh Edelson/AFP/Getty Images)

“AI is a top priority for every company around the globe,” Julia White, Microsoft’s corporate VP for Azure, told me. “And as we are working with our customers on AI, it becomes clear that their analytics often aren’t good enough for building an AI platform.” These companies are generating plenty of data, which then has to be pulled into analytics systems. She stressed that she couldn’t remember a customer conversation in recent months that didn’t focus on AI. “There is urgency to get to the AI dream,” White said, but the growth and variety of data presents a major challenge for many enterprises. “They thought this was a technology that was separate from their core systems. Now it’s expected for both customer-facing and line-of-business applications.”

Data Lake Storage helps with managing this variety of data since it can handle both structured and unstructured data (and is optimized for the Spark and Hadoop analytics engines). The service can ingest any kind of data — yet Microsoft still promises that it will be very fast. “The world of analytics tended to be defined by having to decide upfront and then building rigid structures around it to get the performance you wanted,” explained White. Data Lake Storage, on the other hand, wants to offer the best of both worlds.

Likewise, White argued that while many enterprises used to keep these services on their on-premises servers, many of them are still appliance-based. But she believes the cloud has now reached the point where the price/performance calculations are in its favor. It took a while to get to this point, though, and to convince enterprises. White noted that for the longest time, enterprises that looked at their analytics projects thought $300 million projects took forever, tied up lots of people and were frankly a bit scary. “But also, what we had to offer in the cloud hasn’t been amazing until some of the recent work,” she said. “We’ve been on a journey — as well as the other cloud vendors — and the price performance is now compelling.” And it sure helps that if enterprises want to meet their AI goals, they’ll now have to tackle these workloads, too.


By Frederic Lardinois

Big companies are not becoming data-driven fast enough

I remember watching MIT professor Andrew McAfee years ago telling stories about the importance of data over gut feeling, whether it was predicting successful wines or making sound business decisions. We have been hearing about big data and data-driven decision making for so long, you would think it has become hardened into our largest organizations by now. As it turns out, new research by NewVantage Partners finds that most large companies are having problems implementing an organization-wide, data-driven strategy.

McAfee was fond of saying that before the data deluge we have today, the way most large organizations made decisions was via the HiPPO — the highest paid person’s opinion. Then he would chide the audience that this was not the proper way to run your business. Data, not gut feelings, even those based on experience, should drive important organizational decisions.

While companies haven’t failed to recognize McAfee’s advice, the NVP report suggests they are having problems implementing data-driven decision making across organizations. There are plenty of technological solutions out there today to help them from startups all the way to the largest enterprise vendors, but the data (see, you always need to go back to the data) suggests that it’s not a technology problem, it’s people problem.

Executives can have farsighted vision that their organizations need to be data-driven. They can acquire all of the latest solutions to bring data to the forefront, but unless they combine that with a broad cultural shift and a deep understanding of how to use that data inside business processes, they will continue to struggle.

The study’s authors, Randy Bean and Thomas H. Davenport, wrote about the people problem in their study’s executive summary. “We hear little about initiatives devoted to changing human attitudes and behaviors around data. Unless the focus shifts to these types of activities, we are likely to see the same problem areas in the future that we’ve observed year after year in this survey.”

The survey found that 72 percent of respondents have failed in this regard, reporting they haven’t been able to create a data-driven culture, whatever that means to individual respondents. Meanwhile, 69 percent reported they had failed to create a data-driven organization, although it would seem that these two metrics would be closely aligned.

Perhaps most discouraging of all is that the data is trending the wrong way. Over the last several years, the report’s authors say that those organizations calling themselves data-driven has actually dropped each year from 37.1% in 2017 to 32.4% in 2018 to 31.0% in the latest survey.

This matters on so many levels, but consider that as companies shift to artificial intelligence and machine learning, these technologies rely on abundant amounts of data to work effectively. What’s more, every organization regardless of its size, is generating vast amounts of data, simply as part of being a digital business in the 21st century. They need to find a way to control this data to make better decisions and understand their customers better. It’s essential.

There is so much talk about innovation and disruption, and understanding and affecting company culture, but so much of all this is linked. You need to be more agile. You need to be more digital. You need to be transformational. You need to be all of these things — and data is at the center of all of it.

Data has been called the new oil often enough to be cliche, but these results reveal that the lesson is failing to get through. Companies need to be data-driven now, this instant. This isn’t something to be working towards at this point. This is something you need to be doing, unless your ultimate goal is to become irrelevant.


By Ron Miller

Databricks raises $250M at a $2.75B valuation for its analytics platform

Databricks, the company behind the Apache Spark big data analytics engine, today announced that it has raised a $250 million Series E round led by Andreessen Horowitz. Coatue Management, Microsoft and NEA, also participated in this round, which brings the company’s total funding to $498.5 million. Microsoft’s involvement here is probably a bit of a surprise, but it’s worth noting that it also worked with Databricks on the launch of Azure Databricks as a first-party service on the platform, something that’s still a rarity in the Azure cloud.

As Databricks also today announced, its annual recurring revenue now exceeds $100 million. The company didn’t share whether it’s cash flow-positive at this point, but Databricks CEO and co-founder Ali Ghodsi shared that the company’s valuation is now $2.75 billion.

Current customers, which the company says number around 2,000, include the likes of Nielsen, Hotels.com, Overstock, Bechtel, Shell and HP.

While Databricks is obviously known for its contributions to Apache Spark, the company itself monetizes that work by offering its Unified Analytics platform on top of it. This platform allows enterprises to build their data pipelines across data storage systems and prepare data sets for data scientists and engineers. To do this, Databricks offers shared notebooks and tools for building, managing and monitoring data pipelines, and then uses that data to build machine learning models, for example. Indeed, training and deploying these models is one of the company’s focus areas these days, which makes sense, given that this is one of the main use cases for big data, after all.

On top of that, Databricks also offers a fully managed service for hosting all of these tools.

“Databricks is the clear winner in the big data platform race,” said Ben Horowitz, co-founder and general partner at Andreessen Horowitz, in today’s announcement. “In addition, they have created a new category atop their world-beating Apache Spark platform called Unified Analytics that is growing even faster. As a result, we are thrilled to invest in this round.”

Ghodsi told me that Horowitz was also instrumental in getting the company to re-focus on growth. The company was already growing fast, of course, but Horowitz asked him why Databricks wasn’t growing faster. Unsurprisingly, given that it’s an enterprise company, that means aggressively hiring a larger sales force — and that’s costly. Hence the company’s need to raise at this point.

As Ghodsi told me, one of the areas the company wants to focus on is the Asia Pacific region, where overall cloud usage is growing fast. The other area the company is focusing on is support for more verticals like mass media and entertainment, federal agencies and fintech firms, which also comes with its own cost, given that the experts there don’t come cheap.

Ghodsi likes to call this “boring AI,” since it’s not as exciting as self-driving cars. In his view, though, the enterprise companies that don’t start using machine learning now will inevitably be left behind in the long run. “If you don’t get there, there’ll be no place for you in the next 20 years,” he said.

Engineering, of course, will also get a chunk of this new funding, with an emphasis on relatively new products like MLFlow and Delta, two tools Databricks recently developed and that make it easier to manage the life cycle of machine learning models and build the necessary data pipelines to feed them.


By Frederic Lardinois

Has the fight over privacy changed at all in 2019?

Few issues divide the tech community quite like privacy. Much of Silicon Valley’s wealth has been built on data-driven advertising platforms, and yet, there remain constant concerns about the invasiveness of those platforms.

Such concerns have intensified in just the last few weeks as France’s privacy regulator placed a record fine on Google under Europe’s General Data Protection Regulation (GDPR) rules which the company now plans to appeal. Yet with global platform usage and service sales continuing to tick up, we asked a panel of eight privacy experts: “Has anything fundamentally changed around privacy in tech in 2019? What is the state of privacy and has the outlook changed?” 

This week’s participants include:

TechCrunch is experimenting with new content forms. Consider this a recurring venue for debate, where leading experts – with a diverse range of vantage points and opinions – provide us with thoughts on some of the biggest issues currently in tech, startups and venture. If you have any feedback, please reach out: [email protected].


Thoughts & Responses:


Albert Gidari

Albert Gidari is the Consulting Director of Privacy at the Stanford Center for Internet and Society. He was a partner for over 20 years at Perkins Coie LLP, achieving a top-ranking in privacy law by Chambers, before retiring to consult with CIS on its privacy program. He negotiated the first-ever “privacy by design” consent decree with the Federal Trade Commission. A recognized expert on electronic surveillance law, he brought the first public lawsuit before the Foreign Intelligence Surveillance Court, seeking the right of providers to disclose the volume of national security demands received and the number of affected user accounts, ultimately resulting in greater public disclosure of such requests.

There is no doubt that the privacy environment changed in 2018 with the passage of California’s Consumer Privacy Act (CCPA), implementation of the European Union’s General Data Protection Regulation (GDPR), and new privacy laws enacted around the globe.

“While privacy regulation seeks to make tech companies betters stewards of the data they collect and their practices more transparent, in the end, it is a deception to think that users will have more “privacy.””

For one thing, large tech companies have grown huge privacy compliance organizations to meet their new regulatory obligations. For another, the major platforms now are lobbying for passage of a federal privacy law in the U.S. This is not surprising after a year of privacy miscues, breaches and negative privacy news. But does all of this mean a fundamental change is in store for privacy? I think not.

The fundamental model sustaining the Internet is based upon the exchange of user data for free service. As long as advertising dollars drive the growth of the Internet, regulation simply will tinker around the edges, setting sideboards to dictate the terms of the exchange. The tech companies may be more accountable for how they handle data and to whom they disclose it, but the fact is that data will continue to be collected from all manner of people, places and things.

Indeed, if the past year has shown anything it is that two rules are fundamental: (1) everything that can be connected to the Internet will be connected; and (2) everything that can be collected, will be collected, analyzed, used and monetized. It is inexorable.

While privacy regulation seeks to make tech companies betters stewards of the data they collect and their practices more transparent, in the end, it is a deception to think that users will have more “privacy.” No one even knows what “more privacy” means. If it means that users will have more control over the data they share, that is laudable but not achievable in a world where people have no idea how many times or with whom they have shared their information already. Can you name all the places over your lifetime where you provided your SSN and other identifying information? And given that the largest data collector (and likely least secure) is government, what does control really mean?

All this is not to say that privacy regulation is futile. But it is to recognize that nothing proposed today will result in a fundamental shift in privacy policy or provide a panacea of consumer protection. Better privacy hygiene and more accountability on the part of tech companies is a good thing, but it doesn’t solve the privacy paradox that those same users who want more privacy broadly share their information with others who are less trustworthy on social media (ask Jeff Bezos), or that the government hoovers up data at rate that makes tech companies look like pikers (visit a smart city near you).

Many years ago, I used to practice environmental law. I watched companies strive to comply with new laws intended to control pollution by creating compliance infrastructures and teams aimed at preventing, detecting and deterring violations. Today, I see the same thing at the large tech companies – hundreds of employees have been hired to do “privacy” compliance. The language is the same too: cradle to grave privacy documentation of data flows for a product or service; audits and assessments of privacy practices; data mapping; sustainable privacy practices. In short, privacy has become corporatized and industrialized.

True, we have cleaner air and cleaner water as a result of environmental law, but we also have made it lawful and built businesses around acceptable levels of pollution. Companies still lawfully dump arsenic in the water and belch volatile organic compounds in the air. And we still get environmental catastrophes. So don’t expect today’s “Clean Privacy Law” to eliminate data breaches or profiling or abuses.

The privacy world is complicated and few people truly understand the number and variety of companies involved in data collection and processing, and none of them are in Congress. The power to fundamentally change the privacy equation is in the hands of the people who use the technology (or choose not to) and in the hands of those who design it, and maybe that’s where it should be.


Gabriel Weinberg

Gabriel Weinberg is the Founder and CEO of privacy-focused search engine DuckDuckGo.

Coming into 2019, interest in privacy solutions is truly mainstream. There are signs of this everywhere (media, politics, books, etc.) and also in DuckDuckGo’s growth, which has never been faster. With solid majorities now seeking out private alternatives and other ways to be tracked less online, we expect governments to continue to step up their regulatory scrutiny and for privacy companies like DuckDuckGo to continue to help more people take back their privacy.

“Consumers don’t necessarily feel they have anything to hide – but they just don’t want corporations to profit off their personal information, or be manipulated, or unfairly treated through misuse of that information.”

We’re also seeing companies take action beyond mere regulatory compliance, reflecting this new majority will of the people and its tangible effect on the market. Just this month we’ve seen Apple’s Tim Cook call for stronger privacy regulation and the New York Times report strong ad revenue in Europe after stopping the use of ad exchanges and behavioral targeting.

At its core, this groundswell is driven by the negative effects that stem from the surveillance business model. The percentage of people who have noticed ads following them around the Internet, or who have had their data exposed in a breach, or who have had a family member or friend experience some kind of credit card fraud or identity theft issue, reached a boiling point in 2018. On top of that, people learned of the extent to which the big platforms like Google and Facebook that collect the most data are used to propagate misinformation, discrimination, and polarization. Consumers don’t necessarily feel they have anything to hide – but they just don’t want corporations to profit off their personal information, or be manipulated, or unfairly treated through misuse of that information. Fortunately, there are alternatives to the surveillance business model and more companies are setting a new standard of trust online by showcasing alternative models.


Melika Carroll

Melika Carroll is Senior Vice President, Global Government Affairs at Internet Association, which represents over 45 of the world’s leading internet companies, including Google, Facebook, Amazon, Twitter, Uber, Airbnb and others.

We support a modern, national privacy law that provides people meaningful control over the data they provide to companies so they can make the most informed choices about how that data is used, seen, and shared.

“Any national privacy framework should provide the same protections for people’s data across industries, regardless of whether it is gathered offline or online.”

Internet companies believe all Americans should have the ability to access, correct, delete, and download the data they provide to companies.

Americans will benefit most from a federal approach to privacy – as opposed to a patchwork of state laws – that protects their privacy regardless of where they live. If someone in New York is video chatting with their grandmother in Florida, they should both benefit from the same privacy protections.

It’s also important to consider that all companies – both online and offline – use and collect data. Any national privacy framework should provide the same protections for people’s data across industries, regardless of whether it is gathered offline or online.

Two other important pieces of any federal privacy law include user expectations and the context in which data is shared with third parties. Expectations may vary based on a person’s relationship with a company, the service they expect to receive, and the sensitivity of the data they’re sharing. For example, you expect a car rental company to be able to track the location of the rented vehicle that doesn’t get returned. You don’t expect the car rental company to track your real-time location and sell that data to the highest bidder. Additionally, the same piece of data can have different sensitivities depending on the context in which it’s used or shared. For example, your name on a business card may not be as sensitive as your name on the sign in sheet at an addiction support group meeting.

This is a unique time in Washington as there is bipartisan support in both chambers of Congress as well as in the administration for a federal privacy law. Our industry is committed to working with policymakers and other stakeholders to find an American approach to privacy that protects individuals’ privacy and allows companies to innovate and develop products people love.


Johnny Ryan

Dr. Johnny Ryan FRHistS is Chief Policy & Industry Relations Officer at Brave. His previous roles include Head of Ecosystem at PageFair, and Chief Innovation Officer of The Irish Times. He has a PhD from the University of Cambridge, and is a Fellow of the Royal Historical Society.

Tech companies will probably have to adapt to two privacy trends.

“As lawmakers and regulators in Europe and in the United States start to think “purpose specification” as a tool for anti-trust enforcement, tech giants should beware.”

First, the GDPR is emerging as a de facto international standard.

In the coming years, the application of GDPR-like laws for commercial use of consumers’ personal data in the EU, Britain (post-EU), Japan, India, Brazil, South Korea, Malaysia, Argentina, and China bring more than half of global GDP under a similar standard.

Whether this emerging standard helps or harms United States firms will be determined by whether the United States enacts and actively enforces robust federal privacy laws. Unless there is a federal GDPR-like law in the United States, there may be a degree of friction and the potential of isolation for United States companies.

However, there is an opportunity in this trend. The United States can assume the global lead by doing two things. First, enact a federal law that borrows from the GDPR, including a comprehensive definition of “personal data”, and robust “purpose specification”. Second, invest in world-leading regulation that pursues test cases, and defines practical standards. Cutting edge enforcement of common principles-based standards is de facto leadership.

Second, privacy and antitrust law are moving closer to each other, and might squeeze big tech companies very tightly indeed.

Big tech companies “cross-use” user data from one part of their business to prop up others. The result is that a company can leverage all the personal information accumulated from its users in one line of business, and for one purpose, to dominate other lines of business too.

This is likely to have anti-competitive effects. Rather than competing on the merits, the company can enjoy the unfair advantage of massive network effects even though it may be starting from scratch in a new line of business. This stifles competition and hurts innovation and consumer choice.

Antitrust authorities in other jurisdictions have addressed this. In 2015, the Belgian National Lottery was fined for re-using personal information acquired through its monopoly for a different, and incompatible, line of business.

As lawmakers and regulators in Europe and in the United States start to think “purpose specification” as a tool for anti-trust enforcement, tech giants should beware.


John Miller

John Miller is the VP for Global Policy and Law at the Information Technology Industry Council (ITI), a D.C. based advocate group for the high tech sector.  Miller leads ITI’s work on cybersecurity, privacy, surveillance, and other technology and digital policy issues.

Data has long been the lifeblood of innovation. And protecting that data remains a priority for individuals, companies and governments alike. However, as times change and innovation progresses at a rapid rate, it’s clear the laws protecting consumers’ data and privacy must evolve as well.

“Data has long been the lifeblood of innovation. And protecting that data remains a priority for individuals, companies and governments alike.”

As the global regulatory landscape shifts, there is now widespread agreement among business, government, and consumers that we must modernize our privacy laws, and create an approach to protecting consumer privacy that works in today’s data-driven reality, while still delivering the innovations consumers and businesses demand.

More and more, lawmakers and stakeholders acknowledge that an effective privacy regime provides meaningful privacy protections for consumers regardless of where they live. Approaches, like the framework ITI released last fall, must offer an interoperable solution that can serve as a model for governments worldwide, providing an alternative to a patchwork of laws that could create confusion and uncertainty over what protections individuals have.

Companies are also increasingly aware of the critical role they play in protecting privacy. Looking ahead, the tech industry will continue to develop mechanisms to hold us accountable, including recommendations that any privacy law mandate companies identify, monitor, and document uses of known personal data, while ensuring the existence of meaningful enforcement mechanisms.


Nuala O’Connor

Nuala O’Connor is president and CEO of the Center for Democracy & Technology, a global nonprofit committed to the advancement of digital human rights and civil liberties, including privacy, freedom of expression, and human agency. O’Connor has served in a number of presidentially appointed positions, including as the first statutorily mandated chief privacy officer in U.S. federal government when she served at the U.S. Department of Homeland Security. O’Connor has held senior corporate leadership positions on privacy, data, and customer trust at Amazon, General Electric, and DoubleClick. She has practiced at several global law firms including Sidley Austin and Venable. She is an advocate for the use of data and internet-enabled technologies to improve equity and amplify marginalized voices.

For too long, Americans’ digital privacy has varied widely, depending on the technologies and services we use, the companies that provide those services, and our capacity to navigate confusing notices and settings.

“Americans deserve comprehensive protections for personal information – protections that can’t be signed, or check-boxed, away.”

We are burdened with trying to make informed choices that align with our personal privacy preferences on hundreds of devices and thousands of apps, and reading and parsing as many different policies and settings. No individual has the time nor capacity to manage their privacy in this way, nor is it a good use of time in our increasingly busy lives. These notices and choices and checkboxes have become privacy theater, but not privacy reality.

In 2019, the legal landscape for data privacy is changing, and so is the public perception of how companies handle data. As more information comes to light about the effects of companies’ data practices and myriad stewardship missteps, Americans are surprised and shocked about what they’re learning. They’re increasingly paying attention, and questioning why they are still overburdened and unprotected. And with intensifying scrutiny by the media, as well as state and local lawmakers, companies are recognizing the need for a clear and nationally consistent set of rules.

Personal privacy is the cornerstone of the digital future people want. Americans deserve comprehensive protections for personal information – protections that can’t be signed, or check-boxed, away. The Center for Democracy & Technology wants to help craft those legal principles to solidify Americans’ digital privacy rights for the first time.


Chris Baker

Chris Baker is Senior Vice President and General Manager of EMEA at Box.

Last year saw data privacy hit the headlines as businesses and consumers alike were forced to navigate the implementation of GDPR. But it’s far from over.

“…customers will have trust in a business when they are given more control over how their data is used and processed”

2019 will be the year that the rest of the world catches up to the legislative example set by Europe, as similar data regulations come to the forefront. Organizations must ensure they are compliant with regional data privacy regulations, and more GDPR-like policies will start to have an impact. This can present a headache when it comes to data management, especially if you’re operating internationally. However, customers will have trust in a business when they are given more control over how their data is used and processed, and customers can rest assured knowing that no matter where they are in the world, businesses must meet the highest bar possible when it comes to data security.

Starting with the U.S., 2019 will see larger corporations opt-in to GDPR to support global business practices. At the same time, local data regulators will lift large sections of the EU legislative framework and implement these rules in their own countries. 2018 was the year of GDPR in Europe, and 2019 be the year of GDPR globally.


Christopher Wolf

Christopher Wolf is the Founder and Chair of the Future of Privacy Forum think tank, and is senior counsel at Hogan Lovells focusing on internet law, privacy and data protection policy.

With the EU GDPR in effect since last May (setting a standard other nations are emulating),

“Regardless of the outcome of the debate over a new federal privacy law, the issue of the privacy and protection of personal data is unlikely to recede.”

with the adoption of a highly-regulatory and broadly-applicable state privacy law in California last Summer (and similar laws adopted or proposed in other states), and with intense focus on the data collection and sharing practices of large tech companies, the time may have come where Congress will adopt a comprehensive federal privacy law. Complicating the adoption of a federal law will be the issue of preemption of state laws and what to do with the highly-developed sectoral laws like HIPPA and Gramm-Leach-Bliley. Also to be determined is the expansion of FTC regulatory powers. Regardless of the outcome of the debate over a new federal privacy law, the issue of the privacy and protection of personal data is unlikely to recede.


By Arman Tabatabai

Dataiku raises $101 million for its collaborative data science platform

Dataiku wants to turn buzzwords into an actual service. The company has been focused on data tools for many years, before everybody started talking about big data, data science and machine learning.

And the company just raised $101 million in a round led by Iconiq Capital with Alven Capital, Battery Ventures, Dawn Capital and FirstMark Capital also participating.

If you’re generating a lot of data, Dataiku helps you find a meaning behind data sets. First, you import your data by connecting Dataiku to your storage system. The platform supports dozens of database formats and sources — Hadoop, NoSQL, images, you name it.

You can then use Dataiku to visualize your data, clean your data set, run some algorithms on your data in order to build a machine learning model, deploy it and more. Dataiku has a visual coding tool, or you can use your own code.

But Dataiku isn’t just a tool for data scientists. Even if you’re a business analyst, you can visualize and extract data from Dataiku directly. And because of its software-as-a-service approach, your entire team of data scientists and data analysts can collaborate on Dataiku.

Clients use it to track churn, detect fraud, forecast demand, optimize lifetime values and more. Customers include General Electric, Sephora, Unilever, KUKA, FOX and BNP Paribas.

With today’s funding round, the company plans to double its staff. The company currently works with 200 people in New York, Paris and London. It plans to open offices in Singapore and Sydney as well.


By Romain Dillet