For The Love of Multi-Core Pain

Multi-core is red hot. But most people in the IT industry don’t realize
how much of a pain it is to write applications that can handle the
increased performance that comes with multi-core processors.

Most application developers have just been trained on your classic
single-core scenario and lack the wherewithal to program for many cores and

But with the pressure mounting to contend with the enhanced perks of multi-core
chips for high-performance computing, or multi-core GPUs for video gaming,
it’s becoming more imperative for developers to get up to speed with how to
write new applications.

Enter RapidMind.

The startup recently launched
the second version of its application-development platform to help
independent software developers (ISVs) write multi-core applications without
having to relearn programming. recently caught up with RapidMind CEO Ray DePaul and
Chief Scientist and Co-Founder Michael McCool to discuss how the company plans
to make an impact among programmers with its new platform.

Q: What problem is RapidMind trying to solve?

DePaul: We are leading the way in solving what’s become known as the
multi-core challenge. In the last couple of decades, processors got faster
simply by increasing the clock speed. So you went from one to two to three
gigahertz, and software just went along for the ride and had a free lunch.

of a couple years ago, the processors couldn’t increase the clock speed
anymore because of power and heat dissipation. So we all realized we could
add multiple cores and claim a much higher theoretical performance.

You can look at Intel going from a dual core to a quad core recently, and
people are struggling to find out what to do with these additional cores.
The GPU vendors, ATI and Nvidia, have known about multi-core for years. In
fact, the latest GPUs have up to 128 cores, so they’re way ahead of the game
on this.

Raymond DePaul

Ray DePaul

These are commodity processors that are shipped on video cards and
have tremendous performance over the CPU that’s in the machine. The Cell (the Sony-Toshiba-IBM project for the PlayStation 3 is very similar) is a
nine-core processor. So, there’s a wonderful opportunity to leverage these
processors. But frankly, the software industry has no idea how to take
advantage of them.

Whether it’s gaming or high-performance computing,
everybody is having difficulty getting to those performance levels the
processors theoretically have when you add all of the cores up.

Q: So while the processors are doing their job, the underlying software is
struggling to keep up?

DePaul: Right. With the clock speed not increasing, if an application does nothing, it will likely just run on a single core. That core isn’t getting
any faster. Typically, an application has a higher and higher workload to deal
with, so the net effect of that is that the application is going to feel

McCool: The summary here is Moore’s Law still exists. We’re still going to
see performance doubling every 18 months, but it’s going to be doing that
with a doubling of cores as opposed to doubling of each core. This means not
only will you increase in performance, but the number of cores is going to
grow exponentially over time. It’s a very challenging problem for

DePaul: Think about the complications of Intel showing an 80-core TeraScale
prototype and everybody’s going “oooh” and “ahhh.” But the software industry is
going “What do I do with that? How do I take advantage of 80 cores?” That is
the gap that exists between what the hardware is capable of and what the
software industry is capable of.

We want developers to embrace these new technologies to achieve the
performance the processor vendors are promising and do it in a way that
doesn’t require a lot of retraining. Developers understand their domain and
their application, but it’s too much to ask developers to understand all of
the processor-specific issues and to track processor development.

There’s a
whole generation of developers that haven’t been taught how to develop in
more than one core.

Next page: Developer challenges in multi-core programming.

Page 2 of 2

Q: What specific challenges must developers overcome when dealing with
multi-core programming?

McCool: When people try to program to multi-core processors today, they’ll
typically approach it from multi-threading. They might take your
application and chop it up into different chunks, running a chunk on a
different core and have them communicate. There are problems with this.

Mike McCool

Michael McCool

the number of chunks is a constant, if you get a new processor with more
cores, you’re not going to be able to take care of the additional cores. You
have to have a structure that lets you grow performance with additional

There are also problems with subtle bugs, synchronization, raised
condition and deadlocks, which are very hard to find. This makes programming
a real challenge.

Also, when you look at the Cell, the GPU, or even multi-core CPUs, to get
good performance out of these machines, you have to dive down and understand
the individual processors. You end up getting locked into a particular
processor for a high-performance application. This is a real problem, because
applications have a much longer lifespan than processors.

A given
application is going to run on five or six processors over its lifetime. If
you sacrifice portability to get performance, you then have to redo that
work around your processor, and that results in huge overhead. Not only is it
more complicated to go down to the level, but you have to do it again and
again for each processor.

Q: How does your revised Rapid Mind Platform v2.0 help alleviate the
multi-core programming pains?

DePaul: Our platform gets embedded in their
software. A typical use would be, somebody has an application on their
workstation. And it runs on their Windows and Intel CPU. But there is an
aspect to that application that is a bottleneck.

That software company would
come to us and say they’d like to accelerate that part of the application.
They’ll use our APIs to support those functions, and we will offload that
work to a GNU or to a Cell. So that software vendor can tell their customers
that if they install a $600 GNU video card in that workstation, that slow
part of the application will get a 10x performance improvement.

McCool: Our benefits can be summarized in programmability, performance and
portability. The conceptual model the programmer uses is
platform-independent. They can write code and then run that on a number of
processors. We can decouple the hardware it runs on from the logic of the

We spent a lot of time trying to make this as simple and as
easy to use as possible and widely applicable. It is a single-source
application — you can use your existing compiler. The same model you use is
for single-threaded development so it’s very easy for users to adopt it.

also avoids the problems of raised conditions and deadlocks that plague
multithreading. You typically can’t write an application that has these

Q: What processors are you supporting with v2.0?

McCool: With Platform 2.0, we’ll be shipping with Nvidia and ATI GPU
support, and we also are going to have Cell/B.E. support with multiple form
factors, including the IBM Cell blade and the PS3. And we also have a
preliminary prototype for Intel and AMD multi-core CPUs.

Q: Any plans to support Sun’s Niagara or Rock processors?

DePaul: We’ve been to a few shows, where there’s been as many processor
vendors coming to our booth as there were customers. So, there’s tremendous
interest in adding support for a lot of those processor companies.

We’ll go
where the market takes us and there’s nothing, given the modularity of our
architecture, to add support for new processors. In the end it doesn’t
impact the developers at all, so we’ll do what the developers and ISVs tell

News Around the Web