Alibaba unveils Hanguang 800, an AI inference chip it says significantly increases the speed of machine learning tasks

Alibaba Group introduced its first AI inference chip today, a neural processing unit called Hanguang 800 that it says makes performing machine learning tasks dramatically faster and more energy-efficient. The chip, announced today during Alibaba Cloud’s annual Apsara Computing Conference in Hangzhou, is already being used to power features on Alibaba’s e-commerce sites, including product search and personalized recommendations. It will be made available to Alibaba Cloud customers later.

As an example of what the chip can do, Alibaba said it usually takes Taobao an hour to categorize the one billion product images that are uploaded to the e-commerce platform each day by merchants and prepare them for search and personalized recommendations. Using Hanguang 800, Taobao was able to complete the task in only five minutes.

Alibaba is already using Hanguang 800 in many of its business operations that need machine processing. In addition to product search and recommendations, this includes automatic translation on its e-commerce sites, advertising and intelligence customer services.

Though Alibaba hasn’t revealed when the chip will be available to its cloud customers, the chip may help Chinese companies reduce their dependence on U.S. technology as the trade war makes business partnerships between Chinese and American tech companies more difficult. It can also help Alibaba Cloud grow in markets outside of China. Within China, it is the market leader, but in the Asia-Pacific region, Alibaba Cloud still ranks behind Amazon, Microsoft and Google, according to the Synergy Research Group.

Hanguang 800 was created by T-Head, the unit that leads the development of chips for cloud and edge computing within Alibaba DAMO Academy, the global research and development initiative that Alibaba is investing more than $15 billion in. T-Head developed the chip’s hardware and algorithms designed for business apps, including Alibaba’s retail and logistics apps.

In a statement, Alibaba Group CTO and president of Alibaba Cloud Intelligence Jeff Zhang (pictured above) said “The launch of Hanguang 800 is an important step in our pursuit of next-generation technologies, boosting computing capabilities that will drive both our current and emerging businesses while improving energy-efficiency.”

He added “In the near future, we plan to empower our clients by providing access through our cloud business to the advanced computing that is made possible by the chip, anytime and anywhere.”

T-Head’s other launches included the XuanTie 910 earlier this year, an IoT processor based on RISC-V, the open-source hardware instruction set that began as a project at U.C. Berkeley. XuanTie 910 was created for heavy-duty IoT applications, including edge servers, networking, gateway and autonomous vehicles.

Alibaba DAMO Academy collaborates with universities around the world that have included U.C. Berkeley and Tel Aviv University. Researchers in the program focus on machine learning, network security, visual computing and natural language processing, with the goal of serving two billion customers and creating 100 million jobs by 2035.


By Catherine Shu

Google and Twitter are using AMD’s new EPYC Rome processors in their datacenters

AMD announced that Google and Twitter are among the companies now using EPYC Rome processors during a launch event for the 7nm chips today. The release of EPYC Rome marks a major step in AMD’s processor war with Intel, which said last month that its own 7nm chips, Ice Lake, won’t be available until 2021 (though it is expected to release its 10nm node this year).

Intel is still the biggest datacenter processor maker by far, however, and also counts Google and Twitter among its customers. But AMD’s latest releases and its strategy of undercutting competitors with lower pricing have quickly transformed it into a formidable rival.

Google has used other AMD chips before, including in its “Millionth Server,” built in 2008, and says it is now the first company to use second-generation EPYC chips in its datacenters. Later this year, Google will also make virtual machines that run on the chips available to Google Cloud customers.

In a press statement, Bart Sano, Google vice president of engineering, said “AMD 2nd Gen Epyc processors will help us continue to do what we do best in our datacenters: innovate. Its scalable compute, memory and I/O performance will expand out ability to drive innovation forward in our infrastructure and will give Google Cloud customers the flexibility to choose the best VM for their workloads.”

Twitter plans to begin using EPYC Rome in its datacenter infrastructure later this year. Its senior director of engineering, Jennifer Fraser, said the chips will reduce the energy consumption of its datacenters. “Using the AMD EPYC 7702 processor, we can scale out our compute clusters with more cores in less space using less power, which translates to 25% lower [total cost of ownership] for Twitter.”

In a comparison test between 2-socket Intel Xeon 6242 and AMD EPYC 7702P processors, AMD claimed that its chips were able to reduce total cost of ownership by up to 50% across “numerous workloads.” AMD EPYC Rome’s flagship is the 64-core, 128-thread 7742 chip, with a 2.25 base frequency, 225 default TDP and 256MB of total cache, starts at $6,950.


By Catherine Shu

AWS expands cloud infrastructure offerings with new AMD EPYC-powered T3a instances

Amazon is always looking for ways to increase the options it offers developers in AWS, and to that end, today it announced a bunch of new AMD EPYC-powered T3a instances. These were originally announced at the end of last year at re:Invent, AWS’s annual customer conference.

Today’s announcement is about making these chips generally available. They have been designed for a specific type of burstable workload, where you might not always need a sustained amount of compute power.

“These instances deliver burstable, cost-effective performance and are a great fit for workloads that do not need high sustained compute power but experience temporary spikes in usage. You get a generous and assured baseline amount of processing power and the ability to transparently scale up to full core performance when you need more processing power, for as long as necessary,” AWS’s Jeff Barr wrote in a blog post.

These instances are build on the AWS Nitro System, Amazon’s custom networking interface hardware that the company has been working on for the last several years. The primary components of this system include the Nitro Card I/O Acceleration, Nitro Security Chip and the Nitro Hypervisor.

Today’s release comes on top of the announcement last year that the company would be releasing EC2 instances powered by Arm-based AWS Graviton Processors, another option for developers, who are looking for a solution for scale-out workloads.

It also comes on the heels of last month’s announcement that it was releasing EC2 M5 and R5 instances, which use lower-cost AMD chips. These are also built on top of the Nitro System.

The EPCY processors are available starting today in seven sizes in your choice of spot instances, reserved instances or on-demand, as needed. They are available in US East in northern Virginia, US West in Oregon, Europe in ireland, US East in Ohio and Asia-Pacific in Singapore.


By Ron Miller

AWS announces new Inferentia machine learning chip

AWS is not content to cede any part of any market to any company. When it comes to machine learning chips, names like Nvidia or Google come to mind, but today at AWS re:Invent in Las Vegas, the company announced a new dedicated machine learning chip of its own called Inferentia.

“Inferentia will be a very high throughput low-latency, sustained performance very cost-effective processor,” AWS CEO Andy Jassy explained during the announcement.

Holger Mueller, an analyst with Constellation Research says that while Amazon is far behind, this is a good step for them as companies try to differentiate their machine learning approaches in the future.

“The speed and cost of running machine learning operations — ideally in deep learning — are a competitive differentiator for enterprises. Speed advantages will make or break success of enterprises (and nations when you think of warfare). That speed can only be achieved with custom hardware, and Inferentia is AWS’s first step to get in to this game,” Mueller told TechCrunch. As he pointed out, Google has a 2-3 year head start with its TPU infrastructure.

Inferentia supports popular frameworks like INT8, FP16 and and mixed precision. What’s more, it supports multiple machine learning frameworks including Tensorflow, Caffe2 and ONNX.

Of course, being an Amazon product, it also supports data from popular AWS products such as EC2, Sagemaker and the new Elastic Inference Engine announced today

While the chip was announced today, AWS CEO Andy Jassy indicated it won’t actually be available until next year.

more AWS re:Invent 2018 coverage


By Ron Miller

Intel acquires NetSpeed Systems to boost its system-on-a-chip business

Intel today is announcing another acquisition as it continues to pick up talent and IP to bolster its next generation of computing chips beyond legacy PCs. The company has acquired NetSpeed Systems, a startup that makes system-on-chip (SoC) design tools and interconnect fabric intellectual property (IP). The company will be joining Intel’s Silicon Engineering Group, and its co-founder and CEO, Sundari Mitra, herself an Intel vet, will be coming on as a VP at Intel where she will continue to lead her team.

Terms of the deal are not being disclosed, but for some context, during NetSpeed’s last fundraise in 2016 (a $10 million Series C) it had a post-money valuation of $60 million, according to data from PitchBook.

SoC is a central part of how newer connected devices are being made. Moving away from traditional motherboards to create all-in-one chips that include processing, memory, input/output and storage is an essential cornerstone when building ever-smaller and more effcient devices. This is an area where Intel is already active but against others like Nvidia and Qualcomm many believe it has some catching up to do, and so this acquisition in important in that context.

“Intel is designing more products with more specialized features than ever before, which is incredibly exciting for Intel architects and for our customers,” said Jim Keller, senior vice president and general manager of the Silicon Engineering Group at Intel, in a statement. “The challenge is synthesizing a broader set of IP blocks for optimal performance while reining in design time and cost. NetSpeed’s proven network-on-chip technology addresses this challenge, and we’re excited to now have their IP and expertise in-house.”

Intel has made a series of acquisitions to speed up development of newer chips to work in connected objects and smaller devices beyond the PCs that helped the company make its name. Another recent acquisition in the same vein include eASIC for IoT chipsets, which Intel acquired in July. Intel has also been acquiring startups in other areas where it hopes to make a bigger mark, such as deep learning (case in point: its acquisition of Movidius in August).

NetSpeed has been around since 2011 and Intel was one of its investors and customers.

“Intel has been a great customer of NetSpeed’s, and I’m thrilled to once again be joining the company,” said Mitra, in a statement. “Intel is world class at designing and optimizing the performance of custom silicon at scale. As part of Intel’s silicon engineering group, we’re excited to help invent new products that will be a foundation for computing’s future.”

Intel said it will to honor NetSpeed’s existing customer contracts, but it also sounds like it the company will not be seeking future business as Intel integrates the company into its bigger business.


By Ingrid Lunden

SiFive gets $50.6M to help companies get their custom chip designs out the door

With the race to next-generation silicon in full swing, the waterfall of venture money flowing into custom silicon startups is already showing an enormous amount of potential for some more flexible hardware for an increasingly changing technology landscape — and Naveed Sherwani hopes to tap that for everyone else.

That’s the premise of SiFive, a startup that’s designed to help entrepreneurs — or any company — come up with a custom designed chip for their needs. But rather than having to raise tens of millions of dollars from a venture firm or have a massive production system in place, SiFive’s goal is to help get that piece of silicon in the hands of the developer quickly so they can see if it actually works based off a set of basic hardware and IP offered, and then figure out when and how to move it into full-scale production. The company starts by offering templates and then allows them to make some modifications for what eventually ends up as a piece of RISC-V silicon that’s in their hands. SiFive today said it has raised $50.6 million in venture financing in a round led by Sutter Hill Ventures, Spark Capital, and Osage University Partners.

“The way we view it, is that we think we should not depend on people learning special languages and things of that nature to be able to modify the architecture and enhance the architecture,” Sherwani said. “What we believe is there could be a high-level interface, which is what we’re building, which will allow people to take existing cores, bring them into their design space, and then apply a configuration. Moving those configurations, you can modify the core, and then you can get the new modified core. That’s the approach we take, we don’t have to learn a special language or be an expert, it’s the way we present the core. We’d like to start with cores that are verified, and each of these modifications does not cause to become non-verifiable.”

SiFive is based on a design framework for silicon called RISC-V. You could consider it a kind of open source analog to designs by major chip fab firms, but the goal for RISC-V chips is to lean on the decades of experience since the original piece of silicon came out of Intel to develop something that is less messy while still getting the right tasks done. Sherwani says that RISC-V chips have more than 50 instruction sets while common chips will have more than 1,000. By nature, they aren’t at the kind of scale of an Intel, so the kinds of efficiencies those firms might have don’t exist. But SiFive hopes to serve a wide array of varying needs rather than mass-producing a single style of silicon.

There are two flows for developers looking to build out silicon using SiFive. First is the prototype flow, where developers will get a chance to spec out their silicon and figure out their specific needs. The goal there is to get something into the hands of the developer they can use to showcase their ideas or technology, and SiFive works with IP vendors and other supply chain partners — during this time, developers aren’t paying for IP. Once the case is proved out (and the startup has, perhaps, raised money based on that idea) they can switch to a production flow with SiFive where they will start paying for IP and services. There’s also a potential marketplace element as more and more people come up with novel ideas for operational cores.

“For any segment in the market there will be a few templates available,” Sherwani said. “We’ll have some tools and methodologies there, and among all the various templates are available show what would be the best for [that customer]. We also have an app store — we are expecting people who have designed cores who are willing two share it, because they don’t need it to be proprietary. If anyone uses that template, then whatever price they can put on it, they can make some money doing that. This whole idea of marketplaces will get more people excited.”

As there is an intense rush to develop new customized silicon, it may be that services like the ones offered by SiFive become more and more necessary. But there’s another element to the bet behind SiFive: making the chip itself less ambiguous and trying to remove black boxes. That doesn’t necessarily make it wildly more secure than the one next to it, but at the very least, it means when there is a major security flaw like Intel’s Spectre problems, there may be a bit more tolerance from the developer community because there are fewer black boxes.

“All these complications are there and unless you have all this expertise, you can’t do a chip,” Sherwani said. “Our vision is that we deliver the entire chip experience to that platform and people can be able to log in. They don’t need a team, any tools, they don’t need FPGAs because all those will be available on the web. As a result the cost goes down because it’s a shared economy, they’re sharing tools, and that is how we think dramatically you can do chips at much lower cost.”

While there is a lot of venture money flowing into the AI chip space — with many different interpretations of what that hardware looks like — Sherwani said the benefit of working with SiFive is to be able to rapidly adapt an idea to a changing algorithm. Developers have already proven out a lot of different tools and frameworks, but once a piece of silicon is in production it’s not easy to change on the fly. Should those best practices or algorithms change, developers will have an opportunity to reassess and redesign the chip as quickly as possible.

The idea of that custom silicon is going to be a big theme going forward as more and more use cases emerge that could be easier with a customized piece of hardware. Already there are startups like Mythic and SambaNova Systems, which have raised tens of millions of dollars and specialize in the rapid-fire calculations for typical AI processes. But this kind of technology is now showing up in devices ranging from an autonomous vehicle to a fridge, and each use case may carry different needs. Intel and other chip design firms probably can’t hit every niche, and the one-size-fits-all (or even something more modular like an FPGA from Intel) might not hit each sweet spot. That, in theory, is the hole that a company like SiFive could fill.