n-Body Galaxy Simulation using Compute Shaders on GPGPU via Unity 3D

Galaxy

Following on from my prior article about GPGPU, I thought I try my hand at n-Body simulation.

Here’s my first attempt at n-Body simulation of a galaxy where n equals 10,000 stars with the algorithm running as a compute shader on the GPU.  Compute shaders are small HLSL-syntax-like kernels but can run in massive parallel compared to a regular CPU equivalent.

It’s important to remember that conventional graphics including those in games, do not use GPGPU for star fields.  Generally this is accomplished by either manipulating the star coordinates CPU-side and uploading to the GPU every frame or so; or utilise a small shader that renders a pre-uploaded buffer of stars that may or may not move over time.  In any event, it would be unlikely to perform n-Body simulation using conventional non-GPGPU means due to the slow performance.

My simulation entails n2 gravity calculations per frame or 10,0002 = 100,000,000 calculations per frame at 60 frames per second – something that is quite impossible if I attempted to do it from the CPU-side even if the GPU was still rendering the stars.

Here’s the compute kernel (note I’m hosting the Gist on GitHub as a “c” file in order to display syntax hi-lighting.  Readers should be aware the correct file extension is .compute):


// Copyright 2014 Michael A. R. Duncan
// You are free to do whatever you want with this source
//
// File: Galaxy1Compute.compute
// Each #kernel tells which function to compile; you can have many kernels
#pragma kernel UpdateStars
#include "Galaxy.cginc"
// blackmagic
#define BLOCKSIZE 128
RWStructuredBuffer<Star> stars;
Texture2D HueTexture;
// refer to http://forum.unity3d.com/threads/163591-Compute-Shader-SamplerState-confusion
SamplerState samplerHueTexture;
// time ellapsed since last frame
float deltaTime;
const float Softening=3e4f;
#define Softening2 Softening * Softening
static float G = 6.67300e-11f;
static float DefaultMass = 1000000.0f;
// Do a pre-calculation assuming all the stars have the same mass
static float GMM = G*DefaultMass*DefaultMass;
[numthreads(BLOCKSIZE,1,1)]
void UpdateStars (uint3 id : SV_DispatchThreadID)
{
uint i = id.x;
uint numStars, stride;
stars.GetDimensions(numStars, stride);
float3 position = stars[i].position;
float3 velocity = stars[i].velocity;
float3 A=float3(0,0,0);
[loop]
for (uint j = 0; j < numStars; j++)
{
if (i != j)
{
float3 D = stars[j].position – stars[i].position;
float r = length(D);
float f = GMM / (r * r + Softening2);
A += f * normalize(D);
}
}
velocity += A * deltaTime;
position += velocity * deltaTime;
if (i < numStars)
{
stars[i].velocity = velocity;
stars[i].position = position;
stars[i].accelMagnitude = length(A);
}
}

view raw

GalaxyCompute.c

hosted with ❤ by GitHub

Here’s my Unity MonoBehaviour controller that initialises and manages the simulation:


#region Copyright
// Copyright 2014 Michael A. R. Duncan
// You are free to do whatever you want with this source
// File: Galaxy1Controller.cs
#endregion
#region
using Assets.MickyD.Scripts;
using UnityEngine;
#endregion
public class Galaxy1Controller : MonoBehaviour
{
private const int GroupSize = 128;
private const int QuadStride = 12;
#region Fields
/// <summary>
/// The galaxy radius
/// </summary>
/// <remarks>This will appear as a Property Drawer in Unity 4</remarks>
[Range(10, 1000)]
public float GalaxyRadius = 200;
public Texture2D HueTexture;
public int NumStars = 10000;
public ComputeShader StarCompute;
public Material StarMaterial;
private GameManager _manager;
private ComputeBuffer _quadPoints;
private Star[] _stars;
private ComputeBuffer _starsBuffer;
private int _updateParticlesKernel;
#endregion
// Use this for initialization
#region Properties
private Vector3 StartPointA
{
get { return new Vector3(GalaxyRadius, 0, 0); }
}
private Vector3 StartPointB
{
get { return new Vector3(-GalaxyRadius, 0, 0); }
}
#endregion
#region Methods
private void CreateStars(int offset, int count, Vector3 T, Vector3 V)
{
for (var i = offset; i < offset + count; i++)
{
var star = _stars[i];
star.color = Vector3.one; // white
star.position = Random.insideUnitSphere*GalaxyRadius + T;
star.velocity = V;
_stars[i] = star;
}
}
private void OnDestroy()
{
// must deallocate here
_starsBuffer.Release();
_quadPoints.Release();
}
private void OnDrawGizmos()
{
Gizmos.color = Color.yellow;
Gizmos.DrawWireSphere(transform.position, GalaxyRadius);
Gizmos.DrawWireSphere(transform.position + StartPointA, GalaxyRadius);
Gizmos.DrawWireSphere(transform.position + StartPointB, GalaxyRadius);
}
private void OnRenderObject()
{
if (!SystemInfo.supportsComputeShaders)
{
return;
}
// bind resources to material
StarMaterial.SetBuffer("stars", _starsBuffer);
StarMaterial.SetBuffer("quadPoints", _quadPoints);
// set the pass
StarMaterial.SetPass(0);
// draw
Graphics.DrawProcedural(MeshTopology.Triangles, 6, NumStars);
}
private void Start()
{
_updateParticlesKernel = StarCompute.FindKernel("UpdateStars");
if (_updateParticlesKernel == -1)
{
Debug.LogError("Failed to find UpdateStars kernel");
Application.Quit();
}
_starsBuffer = new ComputeBuffer(NumStars, Constants.StarsStride);
_stars = new Star[NumStars];
var n = NumStars/2;
var offset = 0;
CreateStars(offset, n, StartPointA, new Vector3(-10, 5, 0));
offset += n;
CreateStars(offset, n, StartPointB, new Vector3(10, -5, 0));
_starsBuffer.SetData(_stars);
_quadPoints = new ComputeBuffer(6, QuadStride);
_quadPoints.SetData(new[]
{
new Vector3(-0.5f, 0.5f),
new Vector3(0.5f, 0.5f),
new Vector3(0.5f, -0.5f),
new Vector3(0.5f, -0.5f),
new Vector3(-0.5f, -0.5f),
new Vector3(-0.5f, 0.5f),
});
_manager = FindObjectOfType<GameManager>();
}
private void Update()
{
// bind resources to compute shader
StarCompute.SetBuffer(_updateParticlesKernel, "stars", _starsBuffer);
StarCompute.SetFloat("deltaTime", Time.deltaTime*_manager.MasterSpeed);
StarCompute.SetTexture(_updateParticlesKernel, "hueTexture", HueTexture);
// dispatch, launch threads on GPU
var numberOfGroups = Mathf.CeilToInt((float) NumStars/GroupSize);
StarCompute.Dispatch(_updateParticlesKernel, numberOfGroups, 1, 1);
}
#endregion
}

12 thoughts on “n-Body Galaxy Simulation using Compute Shaders on GPGPU via Unity 3D

  1. That’s a cool demo! Just as a frame of reference, however, on a modern-day Haswell you can achieve around 200gflops (http://www.pugetsystems.com/blog/2013/08/26/Haswell-Floating-Point-Performance-493/) – a 3d distance calculation approaches just 8 flop , a div for the attraction constant, then 6 ops to scale+compute the delta, and another 6 to apply that to x and y, with 3 potentially reusable; so that’s 17-20 ops per pair (and due to symmetry, you only need slightly less than half the pairs). This suggests a 10000 point starfield in 3d might just barely be within reach; you’d have a factor 3-4 to spare, but since you’re dealing with tiny 3d vectors you’re likely not to get as good efficiency as with linpack.

    Like

    • Thanks for the kind words Eamon and for the link! Yes I tried nBody on my same rig but via the CPU (Intel Core i7 2600k @ 1.6 MHz) and I can only manage 100 stars at 60 FPS. 😦

      Like

  2. Yeah, but I think the problem there was mono/.net. To get this kind of performance, you need to be sure you’re really only executing numeric ops, and not stalling the pipeline for 100s of cycles in some virtual method lookup or whatever. Ideally, any function calls must be completely inlinable. Secondly, you simply need SSE at a minimum, and hopefully AVX. Only very recently has .NET started to support those, but in a fashion that you’re not going to be using them unless you rewrite your code intentionally (i.e. kind of like compute shaders).

    It’s been years now, but back when I needed to do heavy numeric lifting I compared C++ to C#, and the results were fairly dramatic: http://eamon.nerbonne.org/2009/02/net-numeric-performance-disappointing.html.

    Even highly tuned C# with no function calls and using only raw arrays was 30-50% slower (and that code was longer that the equivalent C++). If you use any abstractions such as (say) a List, then you’re looking at 10 times slower; if you use interfaces it sometimes approached 100 times slower. Back then (and now still) mono was even worse, although now mono supports and LLVM mode in certain limited cases.

    And really, my testing was unfairly biased against C++; if you use a shorter vector (I used 100 dim vectors), overhead matters more which would slow C++ down much less than C#. And if you used a library like Eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page) the C++ code is much shorter *and* much faster because it auto-vectorizes. Oh, and finally, C++ was compiled with MSC, but my testing has consistently shown that GCC generates significantly better code that MSC for this kind of problem.

    So if you want fast numeric code on .NET, use C++/CLI; barring that be really careful about every single abstraction penalty (particularly every single method call, esp. virtual/interface/delegate calls) since C# doesn’t make it easy to see where the performance hits are.

    Of course, your solution is quite a bit cooler :-). If I might ask, what GFX card were you running on, and how many stars can you support @ 60fps?

    Like

    • Hehe yes my .NET port was really just for amusement. For the GPU version I used an AMD Radeon HD 6950 SLI and can maintain 10,000 nBody stars at 60 FPS with VSYNC on.

      Sounds like you really know your stuff re SSE support in c/c++compilers. 🙂 Alot of that stuff was coming in just as I was moving to .NET so I missed out on playing with it.

      Like

  3. I have a recently acquired interest in GPU stuff (you can tell I don’t know much about it when I use terms like ‘gpu stuff’) and your code has been very helpful in helping me understand things. Thanks for sharing! Everything I read so far refrained from actually showing something and providing some useful code to analyze. It becomes tiresome to read about buffers and threads without something to satiate your appetite.

    Everything is still a big mystery but slightly less so now. I hope you talk more about it! 🙂

    Like

    • Thanks Diego. I’m glad to have helped in anyway. 🙂 Yes I found there to be scant information on GPGPU specifically DirectCompute.

      Like

  4. Pingback: Unity3D: Tutoriais e Documentação de Shaders |

  5. Pingback: Ultimate list of unity blogs/links (Updated constantly) – LukaKostic's unity stuff

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.