Trying to get a grip on how people are actually writing parallel code currently, considering the immense importance of multicore and multiprocessing hardware these days. To me, it looks like the dominant paradigm is pthreads (POSIX threads), which is native on Linux and available on Windows. HPC people tend to use OpenMP or MPI, but there are not many of these here on StackOverflow it seems. Or do you rely on Java threading, Windows threading APIs, etc. rather than the portable standards? What is the recommended way, in your opinion, to do parallel programming?
Or are you using more exotic things like Erlang, CUDA, RapidMind, CodePlay, Oz, or even dear old Occam?
Clarification: I am looking for solutions that are quite portable and applicable to platforms such as Linux, various unixes, on various host architectures. Windows is a rare case that is nice to support. So C# and .net are really too narrow here, the CLR is a cool piece of technology but could they PLEASE release it for Linux host so that it would be as prevalent as say the JVM, Python, Erlang, or any other portable language.
C++ or JVM-based: probably C++, since JVMs tend to hide performance.
MPI: I would agree that even the HPC people see it as a hard to use tool -- but for running on 128000 processors, it is the only scalable solution for the problems where map/reduce do not apply. Message-passing has great elegance, though, as it is the only programming style that seems to scale really well to local memory/AMP, shared memory/SMP, distributed run-time environments.
An interesting new contender is the MCAPI. but I do not think anyone has had time to have any practical experience with that yet.
So overall, the situation seems to be that there are a lot of interesting Microsoft projects that I did not know about, and that Windows API or pthreads are the most common implementations in practice.
MPI isn't as hard as most make it seem. Nowadays I think a multi-paradigm approach is best suited for parallel and distributed applications. Use MPI for your node to node communication and synchronization and either OpenMP or PThreads for your more granular parallelization. Think MPI for each machine, and OpenMP or PThreads for each core. This would seem to scale a little bit better than spawning a new MPI Proc for each core for the near future.
Perhaps for dual or quad core right now, spawning a proc for each core on a machine won't have that much overhead, but as we approach more and more cores per machine where the cache and on die memory aren't scaling as much, it would be more appropriate to use a shared memory model.
I'd recommend OpenMP. Microsoft have put it into the Visual C++ 2005 compiler so its well supported, and you don't need to do anything other than compile with the /omp directive.
Its simple to use, though obviously it doesn't do everything for you, but then nothing does. I use it for running parallel for loops generally without any hassle, for more complex things I tend to roll my own (eg I have code from ages ago I cut, paste and modify).
You could try Cilk++ which looks good, and has an e-book "How to Survive the Multicore Software Revolution".
Both these kinds of system try to parallelize serial code - ie take a for loop a run it on all the cores simultaneously in as easy a way possible. They don't tend to be general-purpose thread libraries. (eg a research paper(pdf) described performance of different types of thread pools implemented in openMP and suggested 2 new operations should be added to it - yield and sleep. I think they're missing the point of OpenMP a little there)
As you mentioned OpenMP, I assume you're talking about native c++, not C# or .NET.
Also, if the HPC people (who I assume are experts in this kind of domain) seem to be using OpenMP or MPI, then this is what you should be using, not what the readership of SO is!
We've started looking at parallel extensions from Microsoft - its not in release yet, but is certainly showing potential.
Parallel FX Library (PFX) - a managed concurrency library being developed by a collaboration between Microsoft Research and the CLR team at Microsoft for inclusion with a future revision of the .NET Framework. It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). It also consists of a set of Coordination Data Structures (CDS) - a set of data structures used to synchronize and co-ordinate the execution of concurrent tasks. The library was released as a CTP on November 29, 2007 and refreshed again in December 2007 and June 2008.
Not very much experience though...
Please be aware that the answers here are not going to be a statistically representative answer to "actually using". Already I see a number of "X is nice" answers.
I've personally used Windows Threads on many a project. The other API I have seen in wide use is pthreads. On the HPC front, MPI is still taken seriously by the people using it <subjective>
I don't - it combines all the elegance of C++ with the performance of Javascript. It survives because there is no decent alternative. It will lose to tighly coupled NUMA machines on one side and Google-style map-reduce on the other. </subjective>
More Data Parallel Haskell would be nice, but even without it, GHC>6.6 has some impressive ability to parallelize algorithms easily, via Control.Parallel.Strategies.
Very much depends on your environment.
For palin old C nothing beats POSIX.
For C++ there is a very good threading library from BOOST.ORG for free.
Java just use native java threading.
You may also look at other ways to acheive parallelism other than threading, like dividing your application into client and server processes and using asynchronous messaging to communicate. Done properly this can scale up to thousands of users on dozens of servers.
Its also worth remebdering that if you are using Windows MFC, Gnome or Qt windowing environment you are automatically in a multithreaded environment. If you are using Apache ISS or J2EE your application is already running inside a multi-threaded multi-process environment.
Most of the concurrent programs I have written were in Ada, which has full support for parallelism natively in the language. One of the nice benifits of this is that your parallel code is portable to any system with an Ada compiler. No special library required.
+1 for PLINQ
Win32 Threads, Threadpool and Fibers, Sync Objects
I maintain a concurrency link blog that has covered a bunch of these over time (and will continue to do so):
I only know Java so far, multi threading support there worked well for me..
I used OpenMP alot mainly due to its simplicity, portability and flexibility. It's supports mulitple languages even almighty C++/Cli :)
I use MPI and like it very much. It does force you to think about the memory hierarchy, but in my experience, thinking about such things is important for high performance anyway. In many cases, MPI can largely be hidden behind domain-specific parallel objects (e.g. PETSc for solving linear and nonlinear equations).
pycuda... nothing like 25000 active threads :) [warp scheduled with scoreboarding]. cuda 2 has stream support, so I'm not sure what streamit would bring. CUDA Matlab extensions look neat, as do PLUTO and the coming PetaBricks from MIT.
as far as others, python's threading is lacking; MPI, etc. are complicated, and I don't have a cluster, but I suppose they achieve what they are built for; I stopped c# programming before I got to thread apartments (probably a good thing).
It's not parallel per se and does not have a distributed model, but you can write highly concurrent code on the JVM using Clojure. Subsequently you get the plethora of Java libraries available to you. You would have to implement your own parallel algo's on top of clojure, but that should be relatively easy. I do repeat it does not yet have a distributed model.
gthreads from the glibc library http://library.gnome.org/devel/glib/stable/glib-Threads.html compile down to pthreads, so you don't loose any performance. They also give you very powerful thread pools, and message queues between threads. I have used them successfully several times, and been very happy with the available features.
I use open cl.I think its pretty easier to use as compared to mpi.I have also used mpi before as a requirement for my parallel and distributed computing course but I think you have to do too much manual labor.I am going to start work in CUDA in a few days.CUDA is very similar to open cl but the issues is CUDA is only for nvidia products.