As it closes in on ARM, Nvidia announces UK supercomputer dedicated to medical research

As Nvidia continues to work through its deal to acquire ARM for $40 billion from SoftBank, the computing giant is making another big move to lay out its commitment to investing in UK technology. Today the company announced plans to develop Cambridge-1, a new AI supercomputer that will be used for research in the health industry in the country, the first supercomputer built by Nvidia specifically for external research access, it said.

Nvidia said it is already working with GSK, AstraZeneca, London hospitals Guy’s and St Thomas’ NHS Foundation Trust, King’s College London and Oxford Nanopore to use the Cambridge-1. The supercomputer is due to come online by the end of the year and will be the company’s second supercomputer in the country. The first is already in development at the company’s AI Center of Excellence in Cambridge, and the plan is to add more supercomputers over time.

The growing role of AI has underscored an interesting crossroads in medical research. One one hand, leading researchers all acknowledge the role it will be playing in their work. On the other, none of them and their institutions have the resources to meet that demand on their own. That’s driving them all to get involved much more deeply with big tech companies like Google, Microsoft and in this case Nvidia, to carry out work.

Alongside the supercomputer news, Nvidia is making a second announcement in the area of healthcare in the UK: it has inked a partnership with GSK, which has established an AI hub in London, to build AI-based computational processes that will be using in drug vaccine and discovery — an especially timely piece of news, given that we are in a global health pandemic and all drug makers and researchers are on the hunt to understand more about, and build vaccines for, Covid-19.

The news is coinciding with Nvidia’s industry event, the GPU Technology Conference.

“Tackling the world’s most pressing challenges in healthcare requires massively powerful computing resources to harness the capabilities of AI,” said Jensen Huang, founder and CEO of NVIDIA, will say in his keynote at the event. “The Cambridge-1 supercomputer will serve as a hub of innovation for the U.K., and further the groundbreaking work being done by the nation’s researchers in critical healthcare and drug discovery.”

The company plans to dedicate Cambridge-1 resources in four areas, it said: industry research, in particular joint research on projects that exceed the resources of any single institution; university-granted compute time; health-focused AI startups; and education for future AI practitioners. It’s already building specific applications in areas, like the drug discovery work it’s doing with GSK, that will be run on the machine.

The Cambridge-1 will be built on Nvidia’s DGX SuperPOD system, which can process 400 petaflops of AI performance and 8 petaflops of Linpack performance. Nvidia said this will rank it as the 29th fastest supercomputer in the world.

“Number 29” doesn’t sound very groundbreaking, but there are other reasons why the announcement is significant.

For starters, it underscores how the supercomputing market — while still not a mass-market enterprise — is increasingly developing more focus around specific areas of research and industries. In this case, it underscores how health research has become more complex, and how applications of artificial intelligence have both spurred that complexity but, in the case of building stronger computing power, also provides a better route — some might say one of the only viable routes in the most complex of cases — to medical breakthroughs and discoveries.

It’s also notable that the effort is being forged in the UK. Nvidia’s deal to buy ARM has seen some resistance in the market — with one group leading a campaign to stop the sale and take ARM independent — but this latest announcement underscores that the company is already involved pretty deeply in the UK market, bolstering Nvidia’s case to double down even further. (Yes, chip reference designs and building supercomputers are different enterprises, but the argument for Nvidia is one of commitment and presence.)

“AI and machine learning are like a new microscope that will help scientists to see things that they couldn’t see otherwise,” said Dr. Hal Barron, Chief Scientific Officer and President, R&D, GSK, in a statement. “NVIDIA’s investment in computing, combined with the power of deep learning, will enable solutions to some of the life sciences industry’s greatest challenges and help us continue to deliver transformational medicines and vaccines to patients. Together with GSK’s new AI lab in London, I am delighted that these advanced technologies will now be available to help the U.K.’s outstanding scientists.”

“The use of big data, supercomputing and artificial intelligence have the potential to transform research and development; from target identification through clinical research and all the way to the launch of new medicines,” added James Weatherall, PhD, Head of Data Science and AI, Astrazeneca, in his statement.

“Recent advances in AI have seen increasingly powerful models being used for complex tasks such as image recognition and natural language understanding,” said Sebastien Ourselin, Head, School of Biomedical Engineering & Imaging Sciences at King’s College London. “These models have achieved previously unimaginable performance by using an unprecedented scale of computational power, amassing millions of GPU hours per model. Through this partnership, for the first time, such a scale of computational power will be available to healthcare research – it will be truly transformational for patient health and treatment pathways.”

Dr. Ian Abbs, Chief Executive & Chief Medical Director of Guy’s and St Thomas’ NHS Foundation Trust Officer, said: “If AI is to be deployed at scale for patient care, then accuracy, robustness and safety are of paramount importance. We need to ensure AI researchers have access to the largest and most comprehensive datasets that the NHS has to offer, our clinical expertise, and the required computational infrastructure to make sense of the data. This approach is not only necessary, but also the only ethical way to deliver AI in healthcare – more advanced AI means better care for our patients.”

“Compact AI has enabled real-time sequencing in the palm of your hand, and AI supercomputers are enabling new scientific discoveries in large-scale genomic datasets,” added Gordon Sanghera, CEO, Oxford Nanopore Technologies. “These complementary innovations in data analysis support a wealth of impactful science in the UK, and critically, support our goal of bringing genomic analysis to anyone, anywhere.”

 


By Ingrid Lunden

Nvidia’s Ampere GPUs come to Google Cloud

Nvidia today announced that its new Ampere-based data center GPUs, the A100 Tensor Core GPUs, are now available in alpha on Google Cloud. As the name implies, these GPUs were designed for AI workloads, as well as data analytics and high-performance computing solutions.

The A100 promises a significant performance improvement over previous generations. Nvidia says the A100 can boost training and inference performance by over 20x compared to its predecessors (though you’ll mostly see 6x or 7x improvements in most benchmarks) and tops out at about 19.5 TFLOPs in single-precision performance and 156 TFLOPs for Tensor Float 32 workloads.

Image Credits: Nvidia

“Google Cloud customers often look to us to provide the latest hardware and software services to help them drive innovation on AI and scientific computing workloads,” said Manish Sainani, Director of Product Management at Google Cloud, in today’s announcement. “With our new A2 VM family, we are proud to be the first major cloud provider to market Nvidia A100 GPUs, just as we were with Nvidia’s T4 GPUs. We are excited to see what our customers will do with these new capabilities.”

Google Cloud users can get access to instances with up to 16 of these A100 GPUs, for a total of 640GB of GPU memory and 1.3TB of system memory.


By Frederic Lardinois

Mirantis releases its first major update to Docker Enterprise

In a surprise move, Mirantis acquired Docker’s Enterprise platform business at the end of last year and while Docker itself is refocusing on developers, Mirantis kept the Docker Enterprise name and product. Today, Mirantis is rolling out its first major update to Docker Enterprise with the release of version 3.1.

For the most part, these updates are in line with what’s been happening in the container ecosystem in recent months. There’s support for Kubernetes 1.17 and improved support for Kubernetes on Windows (something the Kubernetes community has worked on quite a bit in the last year or so). Also new is Nvidia GPU integration in Docker Enterprise through a pre-installed device plugin, as well as support for Istio Ingress for Kubernetes and a new command-line tool for deploying clusters with the Docker Engine.

In addition to the product updates, Mirantis is also launching three new support options for its customers that now give them the option to get 24×7 support for all support cases, for example, as well as enhanced SLAs for remote managed operations, designated customer success managers and proactive monitoring and alerting. With this, Mirantis is clearly building on its experience as a managed service provider.

What’s maybe more interesting, though, is how this acquisition is playing out at Mirantis itself. Mirantis, after all, went through its fair share of ups and downs in recent years, from high-flying OpenStack platform to layoffs and everything in between.

“Why we do this in the first place and why at some point I absolutely felt that I wanted to do this is because I felt that this would be a more compelling and interesting company to build, despite maybe some of the short-term challenges along the way, and that very much turned out to be true. It’s been fantastic,” Mirantis CEO and co-founder Adrian Ionel told me. “What we’ve seen since the acquisition, first of all, is that the customer base has been dramatically more loyal than people had thought, including ourselves.”

Ionel admitted that he thought some users would defect because this is obviously a major change, at least from the customer’s point of view. “Of course we have done everything possible to have something for them that’s really compelling and we put out the new roadmap right away in December after the acquisition — and people bought into it at very large scale,” he said. With that, Mirantis retained more than 90 percent of the customer base and the vast majority of all of Docker Enterprise’s largest users.

Ionel, who almost seemed a bit surprised by this, noted that this helped the company to turn in two “fantastic” quarters and was profitable in the last quarter, despite the COVID-19.

“We wanted to go into this acquisition with a sober assessment of risks because we wanted to make it work, we wanted to make it successful because we were well aware that a lot of acquisitions fail,” he explained. “We didn’t want to go into it with a hyper-optimistic approach in any way — and we didn’t — and maybe that’s one of the reasons why we are positively surprised.”

He argues that the reason for the current success is that enterprises are doubling down on their container journeys and because they actually love the Docker Enterprise platform, like infrastructure independence, its developer focus, security features and ease of use. One thing many large customers asked for was better support for multi-cluster management at scale, which today’s update delivers.

“Where we stand today, we have one product development team. We have one product roadmap. We are shipping a very big new release of Docker Enterprise. […] The field has been completely unified and operates as one salesforce, with record results. So things have been extremely busy, but good and exciting.”


By Frederic Lardinois

Nvidia acquires Cumulus Networks

Nvidia today announced its plans to acquire Cumulus Networks, an open-source centric company that specializes in helping enterprises optimize their data center networking stack. Cumulus offers both its own Linux distribution for network switches, as well as tools for managing network operations. With Cumulus Express, the company also offers a hardware solution in the form of its own data center switch.

The two companies did not announce the price of the acquisition, but chances are we are talking about a considerable amount, given that Cumulus had raised $134 million since it was founded in 2010.

Mountain View-based Cumulus already had a previous partnership with Mellanox, which Nvidia acquired for $6.9 billion. That acquisition closed only a few days ago. As Mellanox’s Amit Katz notes in today’s announcement, the two companies first met in 2013 and they formed a first official partnership in 2016.  Cumulus, it’s worth noting, was also an early player in the OpenStack ecosystem.

Having both Cumulus and Mellanox in its stable will give Nvidia virtually all of the tools it needs to help enterprises and cloud providers build out their high-performance computing and AI workloads in their data centers. While you may mostly think about Nvidia because of its graphics cards, the company has a sizable data center group, which delivered close to $1  billion in revenue in the last quarter, up 43 percent from a year ago. In comparison, Nvidia’s revenue from gaming was just under $1.5 billion.

“With Cumulus, NVIDIA can innovate and optimize across the entire networking stack from chips and systems to software including analytics like Cumulus NetQ, delivering great performance and value to customers,” writes Katz. “This open networking platform is extensible and allows enterprise and cloud-scale data centers full control over their operations.”


By Frederic Lardinois

Nvidia acquires data storage and management platform SwiftStack

Nvidia today announced that it has acquired SwiftStack, a software-centric data storage and management platform that supports public cloud, on-premises and edge deployments.

The company’s recent launches focused on improving its support for AI, high-performance computing and accelerated computing workloads, which is surely what Nvidia is most interested in here.

“Building AI supercomputers is exciting to the entire SwiftStack team,” says the company’s co-founder and CPO Joe Arnold in today’s announcement. “We couldn’t be more thrilled to work with the talented folks at NVIDIA and look forward to contributing to its world-leading accelerated computing solutions.”

The two companies did not disclose the price of the acquisition, but SwiftStack had previously raised about $23.6 million in Series A and B rounds led by Mayfield Fund and OpenView Venture Partners. Other investors include Storm Ventures and UMC Capital.

SwiftStack, which was founded in 2011, placed an early bet on OpenStack, the massive open-source project that aimed to give enterprises an AWS-like management experience in their own data centers. The company was one of the largest contributors to OpenStack’s Swift object storage platform and offered a number of services around this, though it seems like in recent years, it has downplayed the OpenStack relationship as that platform’s popularity has fizzled in many verticals.

SwiftStack lists the likes of PayPal, Rogers, data center provider DC Blox, Snapfish and Verizon (TechCrunch’s parent company) on its customer page. Nvidia, too, is a customer.

SwiftStack notes that it team will continue to maintain existing set of open source tools like Swift, ProxyFS, 1space and Controller.

“SwiftStack’s technology is already a key part of NVIDIA’s GPU-powered AI infrastructure, and this acquisition will strengthen what we do for you,” says Arnold.


By Frederic Lardinois

Nvidia and VMware team up to make GPU virtualization easier

Nvidia today announced that it has been working with VMware to bring its virtual GPU technology (vGPU) to VMware’s vSphere and VMware Cloud on AWS. The company’s core vGPU technology isn’t new, but it now supports server virtualization to enable enterprises to run their hardware-accelerated AI and data science workloads in environments like VMware’s vSphere, using its new vComputeServer technology.

Traditionally (as far as that’s a thing in AI training), GPU-accelerated workloads tend to run on bare metal servers, which were typically managed separately from the rest of a company’s servers.

“With vComputeServer, IT admins can better streamline management of GPU accelerated
virtualized servers while retaining existing workflows and lowering overall operational costs,” Nvidia explains in today’s announcement. This also means that businesses will reap the cost benefits of GPU sharing and aggregation, thanks to the improved utilization this technology promises.

vComputeServer works with VMware Sphere, vCenter and vMotion, as well as VMware Cloud. Indeed, the two companies are using the same vComputeServer technology to also bring accelerated GPU services to VMware Cloud on AWS. This allows enterprises to take their containerized applications and from their own data center to the cloud as needed — and then hook into AWS’s other cloud-based technologies.

2019 08 25 1849

“From operational intelligence to artificial intelligence, businesses rely on GPU-accelerated computing to make fast, accurate predictions that directly impact their bottom line,” said Nvidia founder and CEO Jensen Huang . “Together with VMware, we’re designing the most advanced and highest performing GPU- accelerated hybrid cloud infrastructure to foster innovation across the enterprise.”


By Frederic Lardinois

Unveiling its latest cohort, Alchemist announces $4 million in funding for its enterprise accelerator

The enterprise software and services-focused accelerator Alchemist has raised $4 million in fresh financing from investors BASF and the Qatar Development Bank, just in time for its latest demo day unveiling 20 new companies.

Qatar and BASF join previous investors, including the venture firms Mayfield, Khosla Ventures, Foundation Capital, DFJ and USVP, and corporate investors like Cisco, Siemens and Juniper Networks.

While the roster of successes from Alchemist’s fund isn’t as lengthy as Y Combinator, the accelerator program has launched the likes of the quantum computing upstart Rigetti, the soft-launch developer tool LaunchDarkly and drone startup Matternet .

Some (personal) highlights of the latest cohort include:

  • Bayware: Helmed by a former head of software-defined networking from Cisco, the company is pitching a tool that makes creating networks in multi-cloud environments as easy as copying and pasting.
  • MotorCortex.AI: Co-founded by a Stanford engineering professor and a Carnegie Mellon roboticist, the company is using computer vision, machine learning and robotics to create a fruit packer for packaging lines. Starting with avocados, the company is aiming to tackle the entire packaging side of pick and pack in logistics.
  • Resilio: With claims of a 96% effectiveness rate and $35,000 in annual recurring revenue with another $1 million in the pipeline, Resilio is already seeing companies embrace its mobile app that uses a phone’s camera to track stress levels and application-based prompts on how to lower it, according to Alchemist.
  • Operant Networks: It’s a long-held belief (of mine) that if computing networks are already irrevocably compromised, the best thing that companies and individuals can do is just encrypt the hell out of their data. Apparently Operant agrees with me. The company is claiming 50% time savings with this approach, and have booked $1.9 million in 2019 as proof, according to Alchemist.
  • HPC Hub: HPC Hub wants to democratize access to supercomputers by overlaying a virtualization layer and pre-installed software on underutilized super computers to give more companies and researchers easier access to machines… and they’ve booked $92,000 worth of annual recurring revenue.
  • DinoPlusAI: This chip developer is designing a low latency chip for artificial intelligence applications, reducing latency by 12 times over a competing Nvidia chip, according to the company. DinoPlusAI sees applications for its tech in things like real-time AI markets and autonomous driving. Its team is led by a designer from Cadence and Broadcom and the company already has $8 million in letters of intent signed, according to Alchemist.
  • Aero Systems West: Co-founders from the Air Force’s Research Labs and MIT are aiming to take humans out of drone operations and maintenance. The company contends that for every hour of flight time, drones require seven hours of maintenance and check ups. Aero Systems aims to reduce that by using remote analytics, self-inspection, autonomous deployment and automated maintenance to take humans out of the drone business.

Watch a live stream of Alchemist’s demo day pitches, starting at 3PM, here.

 


By Jonathan Shieber

Former Facebook engineer picks up $15M for AI platform Spell

In 2016, Serkan Piantino packed up his desk at Facebook with hopes to move on to something new. The former Director of Engineering for Faceboook AI Research had every intention to keep working on AI, but quickly realized a huge issue.

Unless you’re under the umbrella of one of these big tech companies like Facebook, it can be very difficult and incredibly expensive to get your hands on the hardware necessary to run machine learning experiments.

So he built Spell, which today received $15 million in Series A funding led by Eclipse Ventures and Two Sigma Ventures.

Spell is a collaborative platform that lets anyone run machine learning experiments. The company connects clients with the best, newest hardware hosted by Google, AWS and Microsoft Azure and gives them the software interface they need to run, collaborate, and build with AI.

“We spent decades getting to a laptop powerful enough to develop a mobile app or a website, but we’re struggling with things we develop in AI that we haven’t struggled with since the 70s,” said Piantino. “Before PCs existed, the computers filled the whole room at a university or NASA and people used terminals to log into a single main frame. It’s why Unix was invented, and that’s kind of what AI needs right now.”

In a meeting with Piantino this week, TechCrunch got a peek at the product. First, Piantino pulled out his MacBook and opened up Terminal. He began to run his own code against MNIST, which is a database of handwritten digits commonly used to train image detection algorithms.

He started the program and then moved over to the Spell platform. While the original program was just getting started, Spell’s cloud computing platform had completed the test in under a minute.

The advantage here is obvious. Engineers who want to work on AI, either on their own or for a company, have a huge task in front of them. They essentially have to build their own computer, complete with the high-powered GPUs necessary to run their tests.

With Spell, the newest GPUs from NVIDIA and Google are virtually available for anyone to run their test.

Individual users can get on for free, specify the type of GPU they need to compute their experiment, and simply let it run. Corporate users, on the other hand, are able to view the runs taking place on Spell and compare experiments, allowing users to collaborate on their projects from within the platform.

Enterprise clients can set up their own cluster, and keep all of their programs private on the Spell platform, rather than running tests on the public cluster.

Spell also offers enterprise customers a ‘spell hyper’ command that offers built-in support for hyperparameter optimization. Folks can track their models and results and deploy them to Kubernetes/Kubeflow in a single click.

But, perhaps most importantly, Spell allows an organization to instantly transform their model into an API that can be used more broadly throughout the organization, or or used directly within an app or website.

The implications here are huge. Small companies and startups looking to get into AI now have a much lower barrier to entry, whereas large traditional companies can build out their own proprietary machine learning algorithms for use within the organization without an outrageous upfront investment.

Individual users can get on the platform for free, whereas enterprise clients can get started for $99/month per host you use over the course of a month. Piantino explains that Spell charges based on concurrent usage, so if the customer has 10 concurrent things running, the company considers that the ‘size’ of the Spell cluster and charges based on that.

Piantino sees Spell’s model as the key to defensibility. Whereas many cloud platforms try to lock customers in to their entire suite of products, Spell works with any language framework and lets users plug and play on the platforms of their choice by simply commodifying the hardware. In fact, Spell doesn’t even share with clients which cloud cluster (Microsoft Azure, Google, or AWS) they’re on.

So, on the one hand the speed of the tests themselves goes up based on access to new hardware, but, because Spell is an agnostic platform, there is also a huge advantage in how quickly one can get set up and start working.

The company plans to use the funding to further grow the team and the product, and Piantino says he has his eye out for top-tier engineering talent as well as a designer.


By Jordan Crook

Nvidia’s T4 GPUs are now available in beta on Google Cloud

Google Cloud today announced that Nvidia’s Turing-based Tesla T4 data center GPUs are now available in beta in its data centers in Brazil, India, Netherlands, Singapore, Tokyo and the United States. Google first announced a private test of these cards in November, but that was a very limited alpha test. All developers can now take these new T4 GPUs for a spin through Google’s Compute Engine service.

The T4, which essentially uses the same processor architecture as Nvidia’s RTX cards for consumers, slots in between the existing Nvidia V100 and P4 GPUs on the Google Cloud Platform . While the V100 is optimized for machine learning, though, the T4 (as its P4 predecessor) is more of a general purpose GPU that also turns out to be great for training models and inferencing.

In terms of machine and deep learning performance, the 16GB T4 is significantly slower than the V100, though if you are mostly running inference on the cards, you may actually see a speed boost. Unsurprisingly, using the T4 is also cheaper than the V100, starting at $0.95 per hour compared to $2.48 per hour for the V100, with another discount for using preemptible VMs and Google’s usual sustained use discounts.

Google says that the card’s 16GB memory should easily handle large machine learning models and the ability to run multiple smaller models at the same time. The standard PCI Express 3.0 card also comes with support for Nvidia’s Tensor Cores to accelerate deep learning and Nvidia’s new RTX ray-tracing cores. Performance tops out at 260 TOPS and developers can connect up to four T4 GPUs to a virtual machine.

It’s worth stressing that this is also the first GPU in the Google Cloud lineup that supports Nvidia’s ray-tracing technology. There isn’t a lot of software on the market yet that actually makes use of this technique, which allows you to render more lifelike images in real time, but if you need a virtual workstation with a powerful next-generation graphics card, that’s now an option.

With today’s beta launch of the T4, Google Cloud now offers quite a variety of Nvidia GPUs, including the K80, P4, P100 and V100, all at different price points and with different performance characteristics.


By Frederic Lardinois

Nvidia launches Rapids to help bring GPU acceleration to data analytics

Nvidia, together with partners like IBM, HPE, Oracle, Databricks and others, is launching a new open-source platform for data science and machine learning today. Rapids, as the company is calling it, is all about making it easier for large businesses to use the power of GPUs to quickly analyze massive amounts of data and then use that to build machine learning models.

“Businesses are increasingly data-driven,” Nvidia’s VP of Accelerated Computing Ian Buck told me. “They sense the market and the environment and the behavior and operations of their business through the data they’ve collected. We’ve just come through a decade of big data and the output of that data is using analytics and AI. But most it is still using traditional machine learning to recognize complex patterns, detect changes and make predictions that directly impact their bottom line.”

The idea behind Rapids then is to work with the existing popular open-source libraries and platforms that data scientists use today and accelerate them using GPUs. Rapids integrates with these libraries to provide accelerated analytics, machine learning and — in the future — visualization.

Rapids is based on Python, Buck noted; it has interfaces that are similar to Pandas and Scikit, two very popular machine learning and data analysis libraries, and it’s based on Apache Arrow for in-memory database processing. It can scale from a single GPU to multiple notes and IBM notes that the platform can achieve improvements of up to 50x for some specific use cases when compared to running the same algorithms on CPUs (though that’s not all that surprising, given what we’ve seen from other GPU-accelerated workloads in the past).

Buck noted that Rapids is the result of a multi-year effort to develop a rich enough set of libraries and algorithms, get them running well on GPUs and build the relationships with the open-source projects involved.

“It’s designed to accelerate data science end-to-end,” Buck explained. “From the data prep to machine learning and for those who want to take the next step, deep learning. Through Arrow, Spark users can easily move data into the Rapids platform for acceleration.”

Indeed, Spark is surely going to be one of the major use cases here, so it’s no wonder that Databricks, the company founded by the team behind Spark, is one of the early partners.

“We have multiple ongoing projects to integrate Spark better with native accelerators, including Apache Arrow support and GPU scheduling with Project Hydrogen,” said Spark founder Matei Zaharia in today’s announcement. “We believe that RAPIDS is an exciting new opportunity to scale our customers’ data science and AI workloads.”

Nvidia is also working with Anaconda, BlazingDB, PyData, Quansight and scikit-learn, as well as Wes McKinney, the head of Ursa Labs and the creator of Apache Arrow and Pandas.

Another partner is IBM, which plans to bring Rapids support to many of its services and platforms, including its PowerAI tools for running data science and AI workloads on GPU-accelerated Power9 servers, IBM Watson Studio and Watson Machine Learning and the IBM Cloud with its GPU-enabled machines. “At IBM, we’re very interested in anything that enables higher performance, better business outcomes for data science and machine learning — and we think Nvidia has something very unique here,” Rob Thomas, the GM of IBM Analytics told me.

“The main benefit to the community is that through an entirely free and open-source set of libraries that are directly compatible with the existing algorithms and subroutines that their used to — they now get access to GPU-accelerated versions of them,” Buck said. He also stressed that Rapids isn’t trying to compete with existing machine learning solutions. “Part of the reason why Rapids is open source is so that you can easily incorporate those machine learning subroutines into their software and get the benefits of it.”


By Frederic Lardinois

Nvidia launches the Tesla T4, its fastest data center inferencing platform yet

Nvidia today announced its new GPU for machine learning and inferencing in the data center. The new Tesla T4 GPUs (where the ‘T’ stands for Nvidia’s new Turing architecture) are the successors to the current batch of P4 GPUs that virtually every major cloud computing provider now offers. Google, Nvidia said, will be among the first to bring the new T4 GPUs to its Cloud Platform.

Nvidia argues that the T4s are significantly faster than the P4s. For language inferencing, for example, the T4 is 34 times faster than using a CPU and more than 3.5 times faster than the P4. Peak performance for the P4 is 260 TOPS for 4-bit integer operations and 65 TOPS for floating point operations. The T4 sits on a standard low-profile 75 watt PCI-e card.

What’s most important, though, is that Nvidia designed these chips specifically for AI inferencing. “What makes Tesla T4 such an efficient GPU for inferencing is the new Turing tensor core,” said Ian Buck, Nvidia’s VP and GM of its Tesla data center business. “[Nvidia CEO] Jensen [Huang] already talked about the Tensor core and what it can do for gaming and rendering and for AI, but for inferencing — that’s what it’s designed for.” In total, the chip features 320 Turing Tensor cores and 2,560 CUDA cores.

In addition to the new chip, Nvidia is also launching a refresh of its TensorRT software for optimizing deep learning models. This new version also includes the TensorRT inference server, a fully containerized microservice for data center inferencing that plugs seamlessly into an existing Kubernetes infrastructure.

 

 


By Frederic Lardinois

Nvidia launches colossal HGX-2 cloud server to power HPC and AI

Nvida launched a monster box yesterday called the HGX-2, and it’s the stuff that geek dreams are made of. It’s a cloud server that is purported to be so powerful it combines high performance computing with artificial intelligence requirements in one exceptionally compelling package.

You know you want to know the specs, so let’s get to it: It starts with 16x NVIDIA Tesla V100 GPUs. That’s good for 2 petaFLOPS for AI with low precision, 250 teraFLOPS
for medium precision and 125 teraFLOPS for those times when you need the highest precision. It comes standard with a 1/2 a terabyte of memory and 12 Nvidia NVSwitches, which enable GPU to GPU communications at 300 GB per second. They have doubled the capacity from the HGX-1 released last year.

Chart: Nvidia

Paresh Kharya, group product marketing manager for Nvidia’s Tesla data center products says this communication speed enables them to treat the GPUs essentially as a one giant, single GPU. “And what that allows [developers] to do is not just access that massive compute power, but also access that half a terabyte of GPU memory memory as a single memory block in their programs,” he explained.

Graphic: Nvidia

Unfortunately you won’t be able to buy one of these boxes. In fact, Nvidia is distributing them strictly to resellers, who will likely package these babies up and sell them to hyperscale datacenters and cloud providers. The beauty of this approach for cloud resellers is that when they buy it, they have the entire range of precision in a single box, Kharya said

“The benefit of the unified platform is as companies and cloud providers are building out their infrastructure, they can standardize on a single unified architecture that supports the entire range of high performance workloads. So whether it’s AI, or whether it’s high performance simulations the entire range of workloads is now possible in just a single platform,”Kharya explained.

He points out this is particularly important in large scale datacenters. “In hyperscale companies or cloud providers, the main benefit that they’re providing is the economies of scale. If they can standardize on the fewest possible architectures, they can really maximize the operational efficiency. And what HGX allows them to do is to standardize on that single unified platform,” he added.

As for developers, they can write programs that take advantage of the underlying technologies and program in the exact level of precision they require from a single box.

The HGX-2 powered servers will be available later this year from partner resellers including Lenovo, QCT, Supermicro and Wiwynn.


By Ron Miller

Pure Storage teams with Nvidia on GPU-fueled Flash storage solution for AI

As companies gather increasing amounts of data, they face a choice over bottlenecks. They can have it in the storage component or the backend compute system. Some companies have attacked the problem by using GPUs to streamline the back end problem or Flash storage to speed up the storage problem. Pure Storage wants to give customers the best of both worlds.

Today it announced, Airi, a complete data storage solution for AI workloads in a box.

Under the hood Airi starts with a Pure Storage FlashBlade, a storage solution that Pure created specifically with AI and machine learning kind of processing in mind. NVidia contributes the pure power with four NVIDIA DGX-1 supercomputers, delivering four petaFLOPS of performance with NVIDIA ® Tesla ® V100 GPUs. Arista provides the networking hardware to make it all work together with Arista 100GbE switches. The software glue layer comes from the NVIDIA GPU Cloud deep learning stack and Pure Storage AIRI Scaling Toolkit.

Photo: Pure Storage

One interesting aspect of this deal is that the FlashBlade product operates as a separate product inside of the Pure Storage organization. They have put together a team of engineers with AI and data pipeline understanding with the focus inside the company on finding ways to move beyond the traditional storage market and find out where the market is going.

This approach certainly does that, but the question is do companies want to chase the on-prem hardware approach or take this kind of data to the cloud. Pure would argue that the data gravity of AI workloads would make this difficult to achieve with a cloud solution, but we are seeing increasingly large amounts of data moving to the cloud with the cloud vendors providing tools for data scientists to process that data.

If companies choose to go the hardware route over the cloud, each vendor in this equation — whether Nvidia, Pure Storage or Arista — should benefit from a multi-vendor sale. The idea ultimately is to provide customers with a one-stop solution they can install quickly inside a data center if that’s the approach they want to take.

The red-hot AI hardware space gets even hotter with $56M for a startup called SambaNova Systems

Another massive financing round for an AI chip company is coming in today, this time for SambaNova Systems — a startup founded by a pair of Stanford professors and a longtime chip company executive — to build out the next generation of hardware to supercharge AI-centric operations.

SambaNova joins an already quite large class of startups looking to attack the problem of making AI operations much more efficient and faster by rethinking the actual substrate where the computations happen. While the GPU has become increasingly popular among developers for its ability to handle the kinds of lightweight mathematics in very speedy fashion necessary for AI operations. Startups like SambaNova look to create a new platform from scratch, all the way down to the hardware, that is optimized exactly for those operations. The hope is that by doing that, it will be able to outclass a GPU in terms of speed, power usage, and even potentially the actual size of the chip. SambaNova today said it has raised a massive $56 million series A financing round led by GV, with participation from Redline Capital and Atlantic Bridge Ventures.

SambaNova is the product of technology from Kunle Olukotun and Chris Ré, two professors at Stanford, and led by former SVP of development Rodrigo Liang, who was also a VP at Sun for almost 8 years. When looking at the landscape, the team at SambaNova looked to work their way backwards, first identifying what operations need to happen more efficiently and then figuring out what kind of hardware needs to be in place in order to make that happen. That boils down to a lot of calculations stemming from a field of mathematics called linear algebra done very, very quickly, but it’s something that existing CPUs aren’t exactly tuned to do. And a common criticism from most of the founders in this space is that Nvidia GPUs, while much more powerful than CPUs when it comes to these operations, are still ripe for disruption.

“You’ve got these huge [computational] demands, but you have the slowing down of Moore’s law,” Olukotun said. “The question is, how do you meet these demands while Moore’s law slows. Fundamentally you have to develop computing that’s more efficient. If you look at the current approaches to improve these applications based on multiple big cores or many small, or even FPGA or GPU, we fundamentally don’t think you can get to the efficiencies you need. You need an approach that’s different in the algorithms you use and the underlying hardware that’s also required. You need a combination of the two in order to achieve the performance and flexibility levels you need in order to move forward.”

While a $56 million funding round for a series A might sound massive, it’s becoming a pretty standard number for startups looking to attack this space, which has an opportunity to beat massive chipmakers and create a new generation of hardware that will be omnipresent among any device that is built around artificial intelligence — whether that’s a chip sitting on an autonomous vehicle doing rapid image processing to potentially even a server within a healthcare organization training models for complex medical problems. Graphcore, another chip startup, got $50 million in funding from Sequoia Capital, while Cerebras Systems also received significant funding from Benchmark Capital.

Olukotun and Liang wouldn’t go into the specifics of the architecture, but they are looking to redo the operational hardware to optimize for the AI-centric frameworks that have become increasingly popular in fields like image and speech recognition. At its core, that involves a lot of rethinking of how interaction with memory occurs and what happens with heat dissipation for the hardware, among other complex problems. Apple, Google with its TPU, and reportedly Amazon have taken an intense interest in this space to design their own hardware that’s optimized for products like Siri or Alexa, which makes sense because dropping that latency to as close to zero as possible with as much accuracy in the end improves the user experience. A great user experience leads to more lock-in for those platforms, and while the larger players may end up making their own hardware, GV’s Dave Munichiello — who is joining the company’s board — says this is basically a validation that everyone else is going to need the technology soon enough.

“Large companies see a need for specialized hardware and infrastructure,” he said. “AI and large-scale data analytics are so essential to providing services the largest companies provide that they’re willing to invest in their own infrastructure, and that tells us more more investment is coming. What Amazon and Google and Microsoft and Apple are doing today will be what the rest of the Fortune 100 are investing in in 5 years. I think it just creates a really interesting market and an opportunity to sell a unique product. It just means the market is really large, if you believe in your company’s technical differentiation, you welcome competition.”

There is certainly going to be a lot of competition in this area, and not just from those startups. While SambaNova wants to create a true platform, there are a lot of different interpretations of where it should go — such as whether it should be two separate pieces of hardware that handle either inference or machine training. Intel, too, is betting on an array of products, as well as a technology called Field Programmable Gate Arrays (or FPGA), which would allow for a more modular approach in building hardware specified for AI and are designed to be flexible and change over time. Both Munichiello’s and Olukotun’s arguments are that these require developers who have a special expertise of FPGA, which a sort of niche-within-a-niche that most organizations will probably not have readily available.

Nvidia has been a massive benefactor in the explosion of AI systems, but it clearly exposed a ton of interest in investing in a new breed of silicon. There’s certainly an argument for developer lock-in on Nvidia’s platforms like Cuda. But there are a lot of new frameworks, like TensorFlow, that are creating a layer of abstraction that are increasingly popular with developers. That, too represents an opportunity for both SambaNova and other startups, who can just work to plug into those popular frameworks, Olukotun said. Cerebras Systems CEO Andrew Feldman actually also addressed some of this on stage at the Goldman Sachs Technology and Internet Conference last month.

“Nvidia has spent a long time building an ecosystem around their GPUs, and for the most part, with the combination of TensorFlow, Google has killed most of its value,” Feldman said at the conference. “What TensorFlow does is, it says to researchers and AI professionals, you don’t have to get into the guts of the hardware. You can write at the upper layers and you can write in Python, you can use scripts, you don’t have to worry about what’s happening underneath. Then you can compile it very simply and directly to a CPU, TPU, GPU, to many different hardwares, including ours. If in order to do work you have to be the type of engineer that can do hand-tuned assembly or can live deep in the guts of hardware there will be no adoption… We’ll just take in their TensorFlow, we don’t have to worry about anything else.”

(As an aside, I was once told that Cuda and those other lower-level platforms are really used by AI wonks like Yann LeCun building weird AI stuff in the corners of the Internet.)

There are, also, two big question marks for SambaNova: first, it’s very new, having started in just November while many of these efforts for both startups and larger companies have been years in the making. Munichiello’s answer to this is that the development for those technologies did, indeed, begin a while ago — and that’s not a terrible thing as SambaNova just gets started in the current generation of AI needs. And the second, among some in the valley, is that most of the industry just might not need hardware that’s does these operations in a blazing fast manner. The latter, you might argue, could just be alleviated by the fact that so many of these companies are getting so much funding, with some already reaching close to billion-dollar valuations.

But, in the end, you can now add SambaNova to the list of AI startups that have raised enormous rounds of funding — one that stretches out to include a myriad of companies around the world like Graphcore and Cerebras Systems, as well as a lot of reported activity out of China with companies like Cambricon Technology and Horizon Robotics. This effort does, indeed, require significant investment not only because it’s hardware at its base, but it has to actually convince customers to deploy that hardware and start tapping the platforms it creates, which supporting existing frameworks hopefully alleviates.

“The challenge you see is that the industry, over the last ten years, has underinvested in semiconductor design,” Liang said. “If you look at the innovations at the startup level all the way through big companies, we really haven’t pushed the envelope on semiconductor design. It was very expensive and the returns were not quite as good. Here we are, suddenly you have a need for semiconductor design, and to do low-power design requires a different skillset. If you look at this transition to intelligent software, it’s one of the biggest transitions we’ve seen in this industry in a long time. You’re not accelerating old software, you want to create that platform that’s flexible enough [to optimize these operations] — and you want to think about all the pieces. It’s not just about machine learning.”