Microsoft Azure sets its sights on more analytics workloads

Enterprises now amass huge amounts of data, both from their own tools and applications, as well as from the SaaS applications they use. For a long time, that data was basically exhaust. Maybe it was stored for a while to fulfill some legal requirements, but then it was discarded. Now, data is what drives machine learning models, and the more data you have, the better. It’s maybe no surprise, then, that the big cloud vendors started investing in data warehouses and lakes early on. But that’s just a first step. After that, you also need the analytics tools to make all of this data useful.

Today, it’s Microsoft turn to shine the spotlight on its data analytics services. The actual news here is pretty straightforward. Two of these are services that are moving into general availability: the second generation of Azure Data Lake Storage for big data analytics workloads and Azure Data Explorer, a managed service that makes easier ad-hoc analysis of massive data volumes. Microsoft is also previewing a new feature in Azure Data Factory, its graphical no-code service for building data transformation. Data Factory now features the ability to map data flows.

Those individual news pieces are interesting if you are a user or are considering Azure for your big data workloads, but what’s maybe more important here is that Microsoft is trying to offer a comprehensive set of tools for managing and storing this data — and then using it for building analytics and AI services.

(Photo credit:Josh Edelson/AFP/Getty Images)

“AI is a top priority for every company around the globe,” Julia White, Microsoft’s corporate VP for Azure, told me. “And as we are working with our customers on AI, it becomes clear that their analytics often aren’t good enough for building an AI platform.” These companies are generating plenty of data, which then has to be pulled into analytics systems. She stressed that she couldn’t remember a customer conversation in recent months that didn’t focus on AI. “There is urgency to get to the AI dream,” White said, but the growth and variety of data presents a major challenge for many enterprises. “They thought this was a technology that was separate from their core systems. Now it’s expected for both customer-facing and line-of-business applications.”

Data Lake Storage helps with managing this variety of data since it can handle both structured and unstructured data (and is optimized for the Spark and Hadoop analytics engines). The service can ingest any kind of data — yet Microsoft still promises that it will be very fast. “The world of analytics tended to be defined by having to decide upfront and then building rigid structures around it to get the performance you wanted,” explained White. Data Lake Storage, on the other hand, wants to offer the best of both worlds.

Likewise, White argued that while many enterprises used to keep these services on their on-premises servers, many of them are still appliance-based. But she believes the cloud has now reached the point where the price/performance calculations are in its favor. It took a while to get to this point, though, and to convince enterprises. White noted that for the longest time, enterprises that looked at their analytics projects thought $300 million projects took forever, tied up lots of people and were frankly a bit scary. “But also, what we had to offer in the cloud hasn’t been amazing until some of the recent work,” she said. “We’ve been on a journey — as well as the other cloud vendors — and the price performance is now compelling.” And it sure helps that if enterprises want to meet their AI goals, they’ll now have to tackle these workloads, too.


By Frederic Lardinois

Cloudera and Hortonworks finalize their merger

Cloudera and Hortonworks, two of the biggest players in the Hadoop big data space, today announced that they have finalized their all-stock merger. The new company will use the Cloudera brand and will continue to trade under the CLDR symbol on the New York Stock Exchange.

“Today, we start an exciting new chapter for Cloudera as we become the leading enterprise data cloud provider,” said Tom Reilly, chief executive officer of Cloudera, in today’s announcement. “This combined team and technology portfolio establish the new Cloudera as a clear market leader with the scale and resources to drive continued innovation and growth. We will provide customers a comprehensive solution-set to bring the right data analytics to data anywhere the enterprise needs to work, from the Edge to AI, with the industry’s first Enterprise Data Cloud.”

The companies describe the deal as a “merger of equals,” though Cloudera stockholders will own about 60 percent of the equity in the company.

The combined company expects to generate over $720 million in revenue from its 2,500 customers who rely on it to help them manage the complexities of processing their data. While Hadoop itself is open source and freely available, Cloudera and Hortonworks abstract away most of the infrastructure. Both focused on slightly different markets, though, with Hortonworks going after a more technical user and a pure open source approach, while Cloudera also offered some proprietary tools.

“Together, we are well positioned to continue growing and competing in the streaming and IoT, data management, data warehousing, machine learning/AI and hybrid cloud markets,” said Hortonworks CEO Rob Bearden back when the deal was first announced. “Importantly, we will be able to offer a broader set of offerings that will enable our customers to capitalize on the value of their data.”


By Frederic Lardinois

Google’s Cloud Spanner database adds new features and regions

Cloud Spanner, Google’s globally distributed relational database service, is getting a bit more distributed today with the launch of a new region and new ways to set up multi-region configurations. The service is also getting a new feature that gives developers deeper insights into their most resource-consuming queries.

With this update, Google is adding to the Cloud Spanner lineup Hong Kong (asia-east2), its newest data center location. With this, Cloud Spanner is now available in 14 out of 18 Google Cloud Platform (GCP) regions, including seven the company added this year alone. The plan is to bring Cloud Spanner to every new GCP region as they come online.

The other new region-related news is the launch of two new configurations for multi-region coverage. One, called eur3, focuses on the European Union, and is obviously meant for users there who mostly serve a local customer base. The other is called nam6 and focuses on North America, with coverage across both costs and the middle of the country, using data centers in Oregon, Los Angeles, South Carolina and Iowa. Previously, the service only offered a North American configuration with three regions and a global configuration with three data centers spread across North America, Europe and Asia.

While Cloud Spanner is obviously meant for global deployments, these new configurations are great for users who only need to serve certain markets.

As far as the new query features are concerned, Cloud Spanner is now making it easier for developers to view, inspect and debug queries. The idea here is to give developers better visibility into their most frequent and expensive queries (and maybe make them less expensive in the process).

In addition to the Cloud Spanner news, Google Cloud today announced that its Cloud Dataproc Hadoop and Spark service now supports the R language, in addition to Python 3.7 support on App Engine.


By Frederic Lardinois