Multiplayer Woes

Thunderbrd · Oct 15, 2012

I'll have to ask her about the bug options then. If she automates she often has to set the risk tolerance on those automations.

Thunderbrd · Oct 16, 2012

Well... apparently we had differing bug settings in the automation portion.

However, aligning them didn't help much.

I have another 7 OOS reports to deliver later. They're happening about every 1-8 turns so when they do we often just play through another 20 so we can feel like we're getting somewhere.

I'm still working on trying to understand your OOS debugging tutorial. It helps some... enough to get an idea of the process conceptually. But I still have a lot to learn about how it all shows up identifiably in code and what one would change when one finds the problem.

AIAndy · Oct 17, 2012

Thunderbrd said:
Well... apparently we had differing bug settings in the automation portion.

However, aligning them didn't help much.

I have another 7 OOS reports to deliver later. They're happening about every 1-8 turns so when they do we often just play through another 20 so we can feel like we're getting somewhere.

I'm still working on trying to understand your OOS debugging tutorial. It helps some... enough to get an idea of the process conceptually. But I still have a lot to learn about how it all shows up identifiably in code and what one would change when one finds the problem.

As long as the commerce caching issue is not fixed, you will get OOS pretty much whenever the commerce output of a city changes.

Thunderbrd · Oct 17, 2012

So, in otherwords, for now, the OOS reports I have are pretty much meaningless unless they somehow relate to another problem entirely right?

AIAndy · Oct 17, 2012

Thunderbrd said:
So, in otherwords, for now, the OOS reports I have are pretty much meaningless unless they somehow relate to another problem entirely right?

Yeah, it is just too likely to get the commerce OOS.
But I am currently writing the fix.

Thunderbrd · Oct 17, 2012

You're a GOD! THANK YOU!

AIAndy · Oct 17, 2012

I have now submitted the fix. Hopefully it will work.

Thunderbrd · Oct 17, 2012

I'll test it later tonight then. (and I'll check out the coding change to see what you've done to see if I can understand it any better that way.) Thanks again!

Thunderbrd · Oct 18, 2012

Ok, trying to play tonight was sadly rougher than before (but I don't think its for the same causes. There appears, in the first OOS to be an issue in the AI Defend random... and perhaps that's why the animal spawns vary quite a bit on our logs too.) Additionally, it seems our research selections may have something to do with some of these. But if the an AI random sends us out of synch, I can see that being a MAJOR cause of very frequent issues! lol.

Anyhow, here's the logs we collected tonight. It was pretty much clockwork this evening. Load, we're fine. End the turn, we're out of synch at the beginning of the next turn.

Logs Here

AIAndy · Oct 18, 2012

Now that are interesting OOS issues but I think I know what the problem is:
There is a best unit AI cache for cities that is populated regardless of if the method is called in sync or async context (it is called with a bool to tell if it is async). That means the cache can become async as it is not deterministic (there are some random numbers used).
Newer code also calls that from unit AI to compare some defensive strengths (it uses the best unit AI of the capital). So it can easily desync at that point when that cache is async.
To fix that the calculated value may only be added to the cache in synced context.

AIAndy · Oct 18, 2012

I have changed that accordingly so if the guess that that caused the issue is correct then there should be less or different OOS now.

Thunderbrd · Oct 18, 2012

Awesome! Did any of those reports point to any other possible problem?

AIAndy · Oct 18, 2012

Thunderbrd said:
Awesome! Did any of those reports point to any other possible problem?

It could all have been the same issue (most of them definitely were). Hard to say though.

Thunderbrd · Oct 18, 2012

We'll do some further OOS collecting tonight then. Hopefully this helped significantly!

Thanks again!

Koshling · Oct 18, 2012

AIAndy said:
I have changed that accordingly so if the guess that that caused the issue is correct then there should be less or different OOS now.

For the past few days I've been working n performance issues in Talin's very large save. I had it down from 645 seconds for the end turn to 380, and then had to merge with ohercrecent changes. After the merge its back up over 600 seconds and the profile shows that the ai best unit routine for cities went from 650 milli seconds in the pre merge version, to 132 seconds afterwards. Since this is just the end turn, there should be no async activity at all, so I'm not yet sure why aiandy's change caused this, but it's obviously prime suspect. I'll try to find a fix that should be ok for OOS purposes that doesn't cripple the overall performance. Worst case I'll leave it optimized for single player, with the performance hit, but OOS safe for multiplayer.

Thunderbrd · Oct 18, 2012

End turn performance is not nearly as important for multiplayer as it is for single player since turns run simultaneous in MP, allowing the AI to process while you're taking your turn, so yeah, if an OOS fix causes a performance problem in AI evaluations on single player, its perfectly ok to bypass the fix and run things differently if not running a multiplayer game. I wouldn't be sure how to isolate whether the game was running in MP or SP mode in code but just saying that the performance can be quite slow in MP and still be a much faster game to play.

On a side note, this is somewhat disturbing that AI is so heavy on evaluating units already. I have on my list of 'to dos' the task to add consideration for current projected free promotions the unit is going to receive if built in that city to be considered in these evals and surely that's not the lightest of processing tasks as it is. Do you think that could be done without causing much greater delay?

EDIT: Just took a look at AIAndy's change there and realized just up from that, we already ARE considering Free Promotions - from the base unit source and a few others anyhow. Will still need to consider Free Promotions from equipment sources though.

Thunderbrd · Oct 19, 2012

Ok. We're still getting OOS errors with every turn now. Here's more logs for ya!

AIAndy · Oct 19, 2012

Koshling said:
For the past few days I've been working n performance issues in Talin's very large save. I had it down from 645 seconds for the end turn to 380, and then had to merge with ohercrecent changes. After the merge its back up over 600 seconds and the profile shows that the ai best unit routine for cities went from 650 milli seconds in the pre merge version, to 132 seconds afterwards. Since this is just the end turn, there should be no async activity at all, so I'm not yet sure why aiandy's change caused this, but it's obviously prime suspect. I'll try to find a fix that should be ok for OOS purposes that doesn't cripple the overall performance. Worst case I'll leave it optimized for single player, with the performance hit, but OOS safe for multiplayer.

That is highly suspicious and probably means that there are synced functions that use the async version.
The same OOS issues as yesterday is still present in Best Unit AI in the new OOS logs.

So we need to find out which calling function causes that. Easiest would probably be to set a watch on async == true in that function, hit end turn and then check the call stack when it triggers.

Thunderbrd · Oct 19, 2012

What would we do when we find the calling function then? I suppose we'd have to look at that specific function at that point and see if there's a way to allow the function to operate while creating a workaround where it goes async? Am I anywhere in the ballpark here?

I looked at your first fix and... wow... I would've never been able to find and/or fix along those lines with what I little I know of Dirty calls and such.

AIAndy · Oct 19, 2012

Thunderbrd said:
What would we do when we find the calling function then? I suppose we'd have to look at that specific function at that point and see if there's a way to allow the function to operate while creating a workaround where it goes async? Am I anywhere in the ballpark here?

Depends on the calling function. It might be a function that is always synced but calls the best unit AI function with the wrong parameters. In that case it would be an easy fix but it might also mean adding a bAsync parameter to some other functions.

Multiplayer Woes

C2C War Dog

C2C War Dog

Deity

C2C War Dog

Deity

C2C War Dog

Deity

C2C War Dog

C2C War Dog

Deity

Deity

C2C War Dog

Deity

C2C War Dog

Vorlon

C2C War Dog

C2C War Dog

Deity

C2C War Dog

Deity

Similar threads