It’s increasingly clear that the future of computing isn’t one chip for everything, but rather many chips for many things. Intel CEO Pat Gelsinger recently envisioned a “sea of accelerators” from which customers could select, mix and match for their specific needs.
Sounds pretty great, right? Unless you’re the software developer burdened with creating custom code for every new kind of chip (or even to just try out a new kind of chip).
This is where Paul Petersen, an Intel Fellow and software architect for oneAPI, comes in. His job is to make it so developers can have one set of code that runs as fast as possible — for all the chips.
As straightforward as this sounds, it’s a task of breathtaking scope.
The ambition of oneAPI is to take on both these problems with one big swing: give developers choice when it comes to hardware and make it easier to achieve high-performance code. Intel, Petersen says, has “the unique viewpoint that we are allowing heterogeneous hardware into the mix.”
In contrast with proprietary solutions — particularly the CUDA programming model, which targets Nvidia GPUs — oneAPI is built on a foundation of openness that enables maximum choice and cements standards into place.
Quick sidebar: oneAPI can refer to two things. First is the oneAPI initiative, an open community working together to define and shape the oneAPI specification, which aims to give developers a common experience across different types of chips and from different vendors. Second are the Intel oneAPI toolkits based on that open standard that work on the span of Intel CPUs, GPUs and FPGAs. From here, “oneAPI” will refer to these Intel toolkits.
Portable Performance Across Many Kinds of Processors
Let’s say you are building a program to predict tomorrow’s weather or to model the interaction of molecules for medical therapies. The Intel oneAPI toolkits contain the tools to migrate and compile existing code and suites of libraries with functions pre-tuned for several kinds of chips — or with something like 4th Gen Intel® Xeon® Scalable processors, many accelerators on one chip.
Instead of learning three or four different ways to perform a single function, Petersen says, developers “can basically just learn the one way of doing it and then we handle the diversity and take care of the mapping down to the hardware in an efficient way.”
Eating complexity for customers has always been a bedrock Intel business strategy. In this case, “We dramatically simplify the developer’s cost of how much they need to learn and how many different ecosystems they need to maintain in order to do their job.”
“We strive to be as close to the hardware as we can get while still being able to be portable and supporting multiple kinds of hardware,” he explains. Therein lies the benefit to developers: code that is easier to write, maintain and market. It’s also the central challenge of Petersen’s work.
This approach means taking on “a lot of challenges in terms of performance optimization simultaneously on multiple targets. Anytime you’re allowing choice, you have to make decisions, which can increase latency. We’re constantly fighting to keep our code paths as short as possible.”
So far, so good: “We can go head-to-head with the same piece of hardware and show users that they’re not losing anything by switching to an open solution,” Petersen says. Several recent academic studies bear this out.