Paging System v2

I truly wish I had a clue how to do that... but it sounds like as long as we didn't get too crazy, this approach could enable us to break ALL boundaries and would certainly pave the way for Multimaps to become a reality. Jeez... this is truly an amazing proposal.

What's the chances we could get you back on the team to help us with this directly @AIAndy? You've seen how much progress Bill has made in making the development environment much more programmer-friendly right? I know one of your complaints was how long one has to wait on the dll to compile with every adjustment. I think you'd really enjoy working with this current team.
These days I program all day at work so I tend to do something else in my free time.

Yeah overriding the allocation system in the exe was one of the two things I was considering, the other being just remapping all the objects between the exe and d3d, if the problem was there. However upon (much) further investigation it turns out the error is really just that we allocate too many lights! We hit the current limit of 1024 at one time when doing paging, but not when just showing everything at once.

I actually fixed the initial problem by intercepting d3d.dll light functions and rejecting the -1 light index, however it now crashes the exe on light *deallocation* of course! So investigations are ongoing...
Good to see that you are making progress.
So you think the problem is not actually with memory over 2G?
 
So you think the problem is not actually with memory over 2G?
No it isn't. That was just a very preliminary guess based on a quick reading the disassembly and noting the issue seemed to occur as memory approached 2GB usage, and that it didn't happen if everything was just loaded from the start. I guessed that paging in and out combined with other ongoing allocations would push later allocations above 2GB.

I can look into the memory and see that the entire bit field array that indicates light usage is full at the point where it crashes, so the failure to allocate a new light index is genuine. I'm going to add logging break points in the asm and then try and work out what pattern of usage caused them to fill up only when doing paging, but not when just loading *everything* from the start. My guesses right now are some kind of delayed deallocation of the lights, or a leak.
 
No it isn't. That was just a very preliminary guess based on a quick reading the disassembly and noting the issue seemed to occur as memory approached 2GB usage, and that it didn't happen if everything was just loaded from the start. I guessed that paging in and out combined with other ongoing allocations would push later allocations above 2GB.

I can look into the memory and see that the entire bit field array that indicates light usage is full at the point where it crashes, so the failure to allocate a new light index is genuine. I'm going to add logging break points in the asm and then try and work out what pattern of usage caused them to fill up only when doing paging, but not when just loading *everything* from the start. My guesses right now are some kind of delayed deallocation of the lights, or a leak.
It is possible that if some of the objects that reference the lights are above 2G, some deallocation does not work properly and the lights are leaked.
So a good solution is still to avoid that the exe gets any of the memory above 2G (e.g. by taking those virtual memory pages from the OS right at the start and only using them in the DLL).
 
It is possible that if some of the objects that reference the lights are above 2G, some deallocation does not work properly and the lights are leaked.
So a good solution is still to avoid that the exe gets any of the memory above 2G (e.g. by taking those virtual memory pages from the OS right at the start and only using them in the DLL).
It would be a solution looking for a problem at this point, as there isn't any evidence thus far that the address of the allocation is the problem (I was just wrong with my initial guess). I already tested by forcing all paged allocations above 2GB and it made no difference to how much paging I could do before it crashed.

What *does* make a difference is how aggressive the paging is: If I page out everything outside the view range that I can then it is stable, if I allow lots of paged in data outside the view range this is when it will crash. But not if I just disable the paging entirely and load ALL data. So it is some combination of the high amount of paged in data combined with a constant stream of stuff being paged in and out that can cause it to run out of light indices.
This is why delayed freeing of the resources is my best current guess. Certainly the callstack where the crash happens on *deallocation* of a light with an invalid index does *not* contain any of our own code. i.e. the deallocation is NOT happening when I am telling the exe to destroy the graphical component. It happens some time later (pretty common behaviour for this kind of system). If that some time later is in fact *a long* time later, then paging in and out more total data than exists on the map could theoretically allocate more light indices than is required for the entire map before it gets around to freeing them again.
I might be able to find a function that can force a cleanup cycle (or maybe someone here knows of one), thus freeing the unused light indices.
Regarding replacing the allocator: at some point it would be nice to take more control over the exe allocation system, e.g. so I can add tracking to it (using the knowledge I now have of what objects the exe is working with), but to solve the current crash it isn't relevant as far as I can see.
 
Have you looked at the properties of the lights which are added but not removed are they all the same?
 
Have you looked at the properties of the lights which are added but not removed are they all the same?
I didn't confirm that any lights are added but not removed, only that the light allocation table (a 1024 bit array of flags indicating which indices are in use) is full up at the point it crashes. Certainly one possibility is that some aren't being freed at all, in which case I will investigate if there is some problem with them.
 
I added some very simple logging to Direct3DDevice9Ex::SetLight and when you move around the map you can see new lights being added up to the index 1023.
But after reaching 1023 I don't get a crash the engine simply starts again at index 3.
 
I added some very simple logging to Direct3DDevice9Ex::SetLight and when you move around the map you can see new lights being added up to the index 1023.
But after reaching 1023 I don't get a crash the engine simply starts again at index 3.
I'm guessing index 0, 1 and 2 are reserved for global map lighting (and such light sources).
 
I added some very simple logging to Direct3DDevice9Ex::SetLight
By what method? I am looking for what the best graphics debugger is for dx9, I didn't try any with C2C yet though.

But after reaching 1023 I don't get a crash the engine simply starts again at index 3.
Yeah because it has freed the ones at the lower end again. The light manager keeps track of the last index used and tries to allocate the next available one, wrapping at 1024. The crash happens when the other ones didn't get freed quickly enough and it can't find a free index. To reproduce you need to increase the max memory setting in the globals XML to around 2000000 or so and pan around on a large and fully revealed map.
 
By what method? I am looking for what the best graphics debugger is for dx9, I didn't try any with C2C yet though.
Wrapper dll using the code from https://github.com/elishacloud/DirectX-Wrappers/tree/master/d3d9 with some logging code added.
Code:
HRESULT m_IDirect3DDevice9Ex::SetLight(DWORD Index, CONST D3DLIGHT9 *pLight)
{
    fmt::print(logf, "SetLight:{} x:{} y:{} z:{}\r\n", Index, pLight->Position.x, pLight->Position.y, pLight->Position.z);

    return ProxyInterface->SetLight(Index, pLight);
}
 
Oh okay same as I did then! As a side note: we can actually use that light pointer to back track through all the game objects :mischief:
 
I finally got an error but the ram usage was still below 2GB:
mem.PNG
 
Huh okay, will have to look at that. Good to know what version of gamebryo they are using as well.
edit: There it is video memory it fails to allocate not main RAM, how much VRAM do you have?
Most newer cards have 3GB+
 
My laptop has a 1050Ti with of 4GB VRAM.

The crash happens when the other ones didn't get freed quickly enough and it can't find a free index.
I always limit games like CivIV to 30fps when using the laptop because more fps aren't necessary for these games.
Your computer is faster than my laptop that and the limited fps could make a difference here.
 
Huh okay, will have to look at that. Good to know what version of gamebryo they are using as well.
edit: There it is video memory it fails to allocate not main RAM, how much VRAM do you have?
Most newer cards have 3GB+
A reminder, the BtS game Recommend video ram is 128Mb with DirectX 8 support (pixel and vertex shaders). Straight from the game manual.
DirectX 9.0c (included in game CDs)
Recommended OS are: Win 2000 SP1 or higher, XP Home or Pro (plus SP1, or higher), and Vista.

You have more players with vid cards having less than 3GB of Vram than you have playing with 3+GB Vram.

I have a GTX 760 Ti OEM card with 2GB DDR5. But I only within the past 2 years upgraded fro the Comp's original CTX 550 Ti with 1GB DDR5.

When ever ram usage from Task manager hits 2.3-2.7GB I will have crashes.

Maybe this means nothing to you billw or alberts2, but I thought it expedient to remind you both.
 
A reminder, the BtS game Recommend video ram is 128Mb with DirectX 8 support (pixel and vertex shaders). Straight from the game manual.
DirectX 9.0c (included in game CDs)
Recommended OS are: Win 2000 SP1 or higher, XP Home or Pro (plus SP1, or higher), and Vista.

You have more players with vid cards having less than 3GB of Vram than you have playing with 3+GB Vram.
Sure I was just using 3GB as an example of what to expect on new cards. However I don't think the min specs are relevant to us at all, C2C won't run on them regardless. Video RAM wise I would say 1GB is a fine spec though.

When ever ram usage from Task manager hits 2.3-2.7GB I will have crashes.
Yeah seems about right, it will never be much better than this. Although the theoretical limit is 3GB, fragmentation means we can never actually get there. There is definitely still the bug with the light allocation when using paging, so hopefully I can fix that. Then we can be more stable when using heavy paging with high memory limits and usage, so we can look at other projects that need more memory.
 
After doing a 1.2GB dummy memory allocation I finally got the game to crash while panning the map.
But the crash was in the visual c++ runtime dll not inside a d3d dll. However d3d light log showed the same index problem you had.

To see the real memory usage I think it's necessary to look at the Commit size not the Working set size.
memusage.PNG
 
Yeah commit is the total amount of pages you have allocated, working set is those you actually tried to use "recently". I still didn't work out the implications.
Late last night I did manage to add logging breakpoints to light create and delete functions. I paged around the map until it crashed from light index issue and there were 350 lights deleted and 800 created. So I think it is simply a leak where some aren't deleted rather than caused by delayed deletion. I did notice that light delete happened pretty much as soon as I moved the camera, indicating that the delay is not big (probably on the same frame as the entity destroy is requested).
Next I am going to test will all different layers of the paging enabled separately to see which one is generating lots of lights and not deleting them.
 
Back
Top Bottom