16

Trying to get a grip on how people are actually writing parallel code currently, considering the immense importance of multicore and multiprocessing hardware these days. To me, it looks like the dominant paradigm is pthreads (POSIX threads), which is native on Linux and available on Windows. HPC people tend to use OpenMP or MPI, but there are not many of these here on StackOverflow it seems. Or do you rely on Java threading, Windows threading APIs, etc. rather than the portable standards? What is the recommended way, in your opinion, to do parallel programming?

Or are you using more exotic things like Erlang, CUDA, RapidMind, CodePlay, Oz, or even dear old Occam?

Clarification: I am looking for solutions that are quite portable and applicable to platforms such as Linux, various unixes, on various host architectures. Windows is a rare case that is nice to support. So C# and .net are really too narrow here, the CLR is a cool piece of technology but could they PLEASE release it for Linux host so that it would be as prevalent as say the JVM, Python, Erlang, or any other portable language.

C++ or JVM-based: probably C++, since JVMs tend to hide performance.

MPI: I would agree that even the HPC people see it as a hard to use tool -- but for running on 128000 processors, it is the only scalable solution for the problems where map/reduce do not apply. Message-passing has great elegance, though, as it is the only programming style that seems to scale really well to local memory/AMP, shared memory/SMP, distributed run-time environments.

An interesting new contender is the MCAPI. but I do not think anyone has had time to have any practical experience with that yet.

So overall, the situation seems to be that there are a lot of interesting Microsoft projects that I did not know about, and that Windows API or pthreads are the most common implementations in practice.


  • maybe you should rephrase the question - it's a bit too opinion-based and so could be closed. - Carlos Heuberger
  • I guess it is... when I asked it nine years ago it seemed reasonable to ask it, but it definitely does not make sense as a hands-on question of the type that is now dominant on stackoverflow. - jakobengblom2

20 답변


10

MPI isn't as hard as most make it seem. Nowadays I think a multi-paradigm approach is best suited for parallel and distributed applications. Use MPI for your node to node communication and synchronization and either OpenMP or PThreads for your more granular parallelization. Think MPI for each machine, and OpenMP or PThreads for each core. This would seem to scale a little bit better than spawning a new MPI Proc for each core for the near future.

Perhaps for dual or quad core right now, spawning a proc for each core on a machine won't have that much overhead, but as we approach more and more cores per machine where the cache and on die memory aren't scaling as much, it would be more appropriate to use a shared memory model.


  • Voted up because multi-paradigm is the way HPC is actually going now. - tgamblin
  • Note that to get decent memory performance with NUMA, your arrays have to be distributed across sockets with the same layout as the threads (OpenMP or otherwise) access it. If you allocate a bunch of memory without explicitly using libnuma (expensive), then you have to be careful to fault it with threads that have the same affinity as the threads in your actual computation. This is pretty hard to guarantee with systems like OpenMP, in contrast, MPI naturally sets affinity and gets you local memory. Scalability is often better with MPI than OpenMP, even on large shared memory machines. - Jed

6

I'd recommend OpenMP. Microsoft have put it into the Visual C++ 2005 compiler so its well supported, and you don't need to do anything other than compile with the /omp directive.

Its simple to use, though obviously it doesn't do everything for you, but then nothing does. I use it for running parallel for loops generally without any hassle, for more complex things I tend to roll my own (eg I have code from ages ago I cut, paste and modify).

You could try Cilk++ which looks good, and has an e-book "How to Survive the Multicore Software Revolution".

Both these kinds of system try to parallelize serial code - ie take a for loop a run it on all the cores simultaneously in as easy a way possible. They don't tend to be general-purpose thread libraries. (eg a research paper(pdf) described performance of different types of thread pools implemented in openMP and suggested 2 new operations should be added to it - yield and sleep. I think they're missing the point of OpenMP a little there)

As you mentioned OpenMP, I assume you're talking about native c++, not C# or .NET.

Also, if the HPC people (who I assume are experts in this kind of domain) seem to be using OpenMP or MPI, then this is what you should be using, not what the readership of SO is!


  • C# or .net are kind of out as they are not very portable -- in my world, any code has to port between Linux, Windows, Solaris, AIX, and run on all kinds of platforms. I often code for embedded Power Arch/Linux, for example. - jakobengblom2

4

We've started looking at parallel extensions from Microsoft - its not in release yet, but is certainly showing potential.


3

I've used ACE to allow developers to use POSIX (or windows) style threading on any platform.


2

Parallel FX Library (PFX) - a managed concurrency library being developed by a collaboration between Microsoft Research and the CLR team at Microsoft for inclusion with a future revision of the .NET Framework. It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). It also consists of a set of Coordination Data Structures (CDS) - a set of data structures used to synchronize and co-ordinate the execution of concurrent tasks. The library was released as a CTP on November 29, 2007 and refreshed again in December 2007 and June 2008.

Not very much experience though...


2

Please be aware that the answers here are not going to be a statistically representative answer to "actually using". Already I see a number of "X is nice" answers.

I've personally used Windows Threads on many a project. The other API I have seen in wide use is pthreads. On the HPC front, MPI is still taken seriously by the people using it <subjective> I don't - it combines all the elegance of C++ with the performance of Javascript. It survives because there is no decent alternative. It will lose to tighly coupled NUMA machines on one side and Google-style map-reduce on the other. </subjective>


  • Voted down because MapReduce doesn't even solve the same problem that MPI does. There is a huge difference between data-intensive computing ala MapReduce and large-scale scientific computing ala MPI. NUMA will be a big factor, but it won't be the whole system. - tgamblin
  • Wait, performance is a reason to prefer MapReduce to MPI? That's an... unusual point of view. - Jonathan Dursi
  • @Jonathan: Depends. MapReduce does quite well on performance/$, albeit at a rather high latency. Core for core, it's hard to beat the performance of a NUMA system. That's why MPI will be squeezed out. It's too expensive to compete with MR and too slow to compete with NUMA machines. - MSalters

2

More Data Parallel Haskell would be nice, but even without it, GHC>6.6 has some impressive ability to parallelize algorithms easily, via Control.Parallel.Strategies.


1

For .Net I have used with great success RetLang. For the JVM, Scale is great.


1

How about Open CL?


1

Very much depends on your environment.

For palin old C nothing beats POSIX.

For C++ there is a very good threading library from BOOST.ORG for free.

Java just use native java threading.

You may also look at other ways to acheive parallelism other than threading, like dividing your application into client and server processes and using asynchronous messaging to communicate. Done properly this can scale up to thousands of users on dozens of servers.

Its also worth remebdering that if you are using Windows MFC, Gnome or Qt windowing environment you are automatically in a multithreaded environment. If you are using Apache ISS or J2EE your application is already running inside a multi-threaded multi-process environment.


1

Most of the concurrent programs I have written were in Ada, which has full support for parallelism natively in the language. One of the nice benifits of this is that your parallel code is portable to any system with an Ada compiler. No special library required.


0

+1 for PLINQ

Win32 Threads, Threadpool and Fibers, Sync Objects


0

I maintain a concurrency link blog that has covered a bunch of these over time (and will continue to do so):

http://concurrency.tumblr.com


0

I only know Java so far, multi threading support there worked well for me..


0

I used OpenMP alot mainly due to its simplicity, portability and flexibility. It's supports mulitple languages even almighty C++/Cli :)


0

I use MPI and like it very much. It does force you to think about the memory hierarchy, but in my experience, thinking about such things is important for high performance anyway. In many cases, MPI can largely be hidden behind domain-specific parallel objects (e.g. PETSc for solving linear and nonlinear equations).


0

pycuda... nothing like 25000 active threads :) [warp scheduled with scoreboarding]. cuda 2 has stream support, so I'm not sure what streamit would bring. CUDA Matlab extensions look neat, as do PLUTO and the coming PetaBricks from MIT.

as far as others, python's threading is lacking; MPI, etc. are complicated, and I don't have a cluster, but I suppose they achieve what they are built for; I stopped c# programming before I got to thread apartments (probably a good thing).


0

It's not parallel per se and does not have a distributed model, but you can write highly concurrent code on the JVM using Clojure. Subsequently you get the plethora of Java libraries available to you. You would have to implement your own parallel algo's on top of clojure, but that should be relatively easy. I do repeat it does not yet have a distributed model.


0

gthreads from the glibc library http://library.gnome.org/devel/glib/stable/glib-Threads.html compile down to pthreads, so you don't loose any performance. They also give you very powerful thread pools, and message queues between threads. I have used them successfully several times, and been very happy with the available features.


0

I use open cl.I think its pretty easier to use as compared to mpi.I have also used mpi before as a requirement for my parallel and distributed computing course but I think you have to do too much manual labor.I am going to start work in CUDA in a few days.CUDA is very similar to open cl but the issues is CUDA is only for nvidia products.

Linked


Related

Latest