Opencl

Koshling · Jun 20, 2013

Anyone here (AIAndy, I'm probably looking at you) have any openCL experience? Just for fun and interest I was wondering about a variant of the property calculation and propagation algorithm that runs on openCL. I think it wold map pretty well. Of course it's not really justified for us (property calculation is less than 10% of turn compute time) but it MIT be fun, and an interesting exercise to get into it as a technology (I.e. really for personal interest)

ls612 · Jun 20, 2013

Given the constraints of the game engine, would even be practical to have the DLL use a graphics card for massive parallelization?

AIAndy · Jun 20, 2013

Koshling said:
Anyone here (AIAndy, I'm probably looking at you) have any openCL experience? Just for fun and interest I was wondering about a variant of the property calculation and propagation algorithm that runs on openCL. I think it wold map pretty well. Of course it's not really justified for us (property calculation is less than 10% of turn compute time) but it MIT be fun, and an interesting exercise to get into it as a technology (I.e. really for personal interest)

I only had some limited exposure to CUDA but not OpenCL.
That kind of algorithm would definitely map well except for three issues:

The properties are currently not stored in a way that would be easily transferrable to graphics memory.
Property Manipulators can get a dependency on a lot of other game state information via expressions.
I am not sure how good OpenCL will get along with the Civ4 graphics engine while both attempt to use the graphics card.

Koshling · Jun 21, 2013

ls612 said:
Given the constraints of the game engine, would even be practical to have the DLL use a graphics card for massive parallelization?

It would only be feasible for VERY constrained and localized parts, which is why I suggested property propagation. Frankly it was more a curiosity/interesting learning experience thing, than a truly useful in C2C thing.

AIAndy said:
I only had some limited exposure to CUDA but not OpenCL.
That kind of algorithm would definitely map well except for three issues:

The properties are currently not stored in a way that would be easily transferrable to graphics memory.

Property Manipulators can get a dependency on a lot of other game state information via expressions.

I am not sure how good OpenCL will get along with the Civ4 graphics engine while both attempt to use the graphics card.

The expressions are probably a bit of a killer issue, except perhaps for handling some really common special cases (disease or crime adjacency tile propagation for example), but the overhead of having to translate to/from a suitable simple representation (simple 2 d, or perhaps 3d (property type) array most likely) to pass to the GPU would probably wipe out most of the benefit in the C2C context.

As to the sharing of the GPU, it could be set to only use a secondary GPU. The thing that drive my interest was the growth of APUs. People with IVB, Haswell, or recent AMD processors likely have unused GPUs on package, since most gamers (even Civ) likely have a discrete graphics card also (desktops anyway).

This is actually more interesting in the context of AXXXE, where we talked about everything being a fairly ample (at base level semantics) entity-property system. It may well be that a small set of common property manipulators operating a that level cold be quite CUDA/openCL friendly, and useful to compose higher level semantics.

Not really proposing anything concrete, just musing...

ls612 · Jun 21, 2013

Koshling said:
This is actually more interesting in the context of AXXXE, where we talked about everything being a fairly ample (at base level semantics) entity-property system. It may well be that a small set of common property manipulators operating a that level cold be quite CUDA/openCL friendly, and useful to compose higher level semantics.

Not really proposing anything concrete, just musing...

Regarding AXXXXE, I've been doing a bit of research and screwing around with OGRE recently, and I'm beginning to think that it might be better to go native C++ for two reasons. The first is that the MOGRE managed wrapper seems to have ceased development in early 2012, meaning that they use an older and less efficent version of the underlying engine. The second is that from what I've read the OGRE engine is primarily CPU limited as it has some internal inefficencies. So the overhead from using the .NET framework would be more noticable. Just my two cents on the matter though.

Koshling · Jun 22, 2013

ls612 said:
Regarding AXXXXE, I've been doing a bit of research and screwing around with OGRE recently, and I'm beginning to think that it might be better to go native C++ for two reasons. The first is that the MOGRE managed wrapper seems to have ceased development in early 2012, meaning that they use an older and less efficent version of the underlying engine. The second is that from what I've read the OGRE engine is primarily CPU limited as it has some internal inefficencies. So the overhead from using the .NET framework would be more noticable. Just my two cents on the matter though.

If we make the Renee engine a separate process as proposed anyway, the choice for one won't ealy impact the other.

Yid · Jun 25, 2013

As it happens I'm learning OpenCL just now for my professional project. So here are my 0.02$:

OpenCL is mostly useful for complicated calculations. Simple additions (like "disease = disease + offsetDisease") do not count. Checking many conditions also slows down the processing. (GPUs are great at number crunching, but poor at decision making...) Also dynamic memory allocation (malloc, free) are not possible in OpenCL programs, which kind of limits your data.

Also, if the data does not "live" on the Graphics card, but the results would have to be transferred back and forth all speedup will be lost soon. I did some speed tests with the OpenCV library, testing the face detection algorithm with "Haar like features". In the end it turned up that the OpenCV implementation of Haar on my Graphics card (AMD Radeon 6750M) took about the same time as my CPU (Intel I7 2.2GHz). Of course you'd have a free CPU then. If you can occupy it with other tasks there might be a gain.

In other words: I think it might be a good idea to use it in another system, which is streamlined for it, but not in C2C.

Opencl

Koshling

Vorlon

ls612

Deity

AIAndy

Deity

Koshling

Vorlon

ls612

Deity

Koshling

Vorlon

Yid

Chieftain

Similar threads