Home > Development > n-Body Galaxy Simulation using Compute Shaders on GPGPU via Unity 3D

n-Body Galaxy Simulation using Compute Shaders on GPGPU via Unity 3D

Galaxy

Download complete Unity 3D project here

Following on from my prior article about GPGPU, I thought I try my hand at n-Body simulation.

Here’s my first attempt at n-Body simulation of a galaxy where n equals 10,000 stars with the algorithm running as a compute shader on the GPU.  Compute shaders are small HLSL-syntax-like kernels but can run in massive parallel compared to a regular CPU equivalent.

It’s important to remember that conventional graphics including those in games, do not use GPGPU for star fields.  Generally this is accomplished by either manipulating the star coordinates CPU-side and uploading to the GPU every frame or so; or utilise a small shader that renders a pre-uploaded buffer of stars that may or may not move over time.  In any event, it would be unlikely to perform n-Body simulation using conventional non-GPGPU means due to the slow performance.

My simulation entails n2 gravity calculations per frame or 10,0002 = 100,000,000 calculations per frame at 60 frames per second – something that is quite impossible if I attempted to do it from the CPU-side even if the GPU was still rendering the stars.

Here’s the compute kernel (note I’m hosting the Gist on GitHub as a “c” file in order to display syntax hi-lighting.  Readers should be aware the correct file extension is .compute):

Here’s my Unity MonoBehaviour controller that initialises and manages the simulation:

Advertisements
  1. 2014/06/26 at 10:39 pm

    That’s a cool demo! Just as a frame of reference, however, on a modern-day Haswell you can achieve around 200gflops (http://www.pugetsystems.com/blog/2013/08/26/Haswell-Floating-Point-Performance-493/) – a 3d distance calculation approaches just 8 flop , a div for the attraction constant, then 6 ops to scale+compute the delta, and another 6 to apply that to x and y, with 3 potentially reusable; so that’s 17-20 ops per pair (and due to symmetry, you only need slightly less than half the pairs). This suggests a 10000 point starfield in 3d might just barely be within reach; you’d have a factor 3-4 to spare, but since you’re dealing with tiny 3d vectors you’re likely not to get as good efficiency as with linpack.

    Like

    • MickyD
      2014/06/26 at 11:07 pm

      Thanks for the kind words Eamon and for the link! Yes I tried nBody on my same rig but via the CPU (Intel Core i7 2600k @ 1.6 MHz) and I can only manage 100 stars at 60 FPS. 😦

      Like

  2. 2014/06/27 at 4:34 pm

    Yeah, but I think the problem there was mono/.net. To get this kind of performance, you need to be sure you’re really only executing numeric ops, and not stalling the pipeline for 100s of cycles in some virtual method lookup or whatever. Ideally, any function calls must be completely inlinable. Secondly, you simply need SSE at a minimum, and hopefully AVX. Only very recently has .NET started to support those, but in a fashion that you’re not going to be using them unless you rewrite your code intentionally (i.e. kind of like compute shaders).

    It’s been years now, but back when I needed to do heavy numeric lifting I compared C++ to C#, and the results were fairly dramatic: http://eamon.nerbonne.org/2009/02/net-numeric-performance-disappointing.html.

    Even highly tuned C# with no function calls and using only raw arrays was 30-50% slower (and that code was longer that the equivalent C++). If you use any abstractions such as (say) a List, then you’re looking at 10 times slower; if you use interfaces it sometimes approached 100 times slower. Back then (and now still) mono was even worse, although now mono supports and LLVM mode in certain limited cases.

    And really, my testing was unfairly biased against C++; if you use a shorter vector (I used 100 dim vectors), overhead matters more which would slow C++ down much less than C#. And if you used a library like Eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page) the C++ code is much shorter *and* much faster because it auto-vectorizes. Oh, and finally, C++ was compiled with MSC, but my testing has consistently shown that GCC generates significantly better code that MSC for this kind of problem.

    So if you want fast numeric code on .NET, use C++/CLI; barring that be really careful about every single abstraction penalty (particularly every single method call, esp. virtual/interface/delegate calls) since C# doesn’t make it easy to see where the performance hits are.

    Of course, your solution is quite a bit cooler :-). If I might ask, what GFX card were you running on, and how many stars can you support @ 60fps?

    Like

    • MickyD
      2014/06/27 at 4:48 pm

      Hehe yes my .NET port was really just for amusement. For the GPU version I used an AMD Radeon HD 6950 SLI and can maintain 10,000 nBody stars at 60 FPS with VSYNC on.

      Sounds like you really know your stuff re SSE support in c/c++compilers. 🙂 Alot of that stuff was coming in just as I was moving to .NET so I missed out on playing with it.

      Like

  3. 2015/06/09 at 7:44 pm

    I have a recently acquired interest in GPU stuff (you can tell I don’t know much about it when I use terms like ‘gpu stuff’) and your code has been very helpful in helping me understand things. Thanks for sharing! Everything I read so far refrained from actually showing something and providing some useful code to analyze. It becomes tiresome to read about buffers and threads without something to satiate your appetite.

    Everything is still a big mystery but slightly less so now. I hope you talk more about it! 🙂

    Like

    • MickyD
      2015/06/09 at 9:45 pm

      Thanks Diego. I’m glad to have helped in anyway. 🙂 Yes I found there to be scant information on GPGPU specifically DirectCompute.

      Like

  4. 2017/06/14 at 3:57 pm

    Can you please upload the whole project. The above dropbox link seems to have gone.
    Thanks !!

    Like

    • MickyD
      2017/06/24 at 10:20 am

      Try now. It’s for Unity 4.x which I’m not running at the moment.

      Like

  5. 2017/06/24 at 5:04 am

    The link to the source code is 404. Id love to see this project in action!

    Like

    • MickyD
      2017/06/24 at 10:20 am

      Try now. It’s for Unity 4.x which I’m not running at the moment.

      Like

  1. 2016/03/07 at 3:05 am
  2. 2017/04/02 at 9:44 pm

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: