Multi Player bugs and crashes - After the 23rd of September 2013

Oh another unrelated (and honestly rather unimportant) bug:
When a hidden nationality unit kills a human players unit, their identity will be shown. Example:
Toffer kills my Tracker with a Rogue. The message "Toffer has plundered 20 gold" shows up on my screen, telling me he owned the Rogue.
 
KHH speaks the truth.

Having looked at all the logs myself, I have to say that the random-loggers of OOS 2 are the most interesting ones.
Pursuit seems to happen on one computer while not at the other upsetting the course of combat.

I read the thread and realized that I should have informed you that we played with simultaneous turns; if there is anything more we can do to help you figure out OOS problems we would be happy to try and to learn.

Edit: Here are some older (rev.6061) OOS reports from a different game that no one has looked at yet; OOS 4-12: http://forums.civfanatics.com/showthread.php?t=506621
The truly puzzling thing about this is:
1) Pursuit has been rolled numerous times before in that stream without going OOS.
2) Withdrawal randoms don't seem to be rolled at all in this segment.
3) Pursuit ROLLS only apply to combat limit applicable battles and... we shouldn't have any units with combat limits aside from some siege units. So did a catapult attack a tracker in this case???
4) It appears Pursuit has been rolled quite often in that stream which wouldn't happen in the original coding (I've tweaked some things) unless the attacker did have a combat limit. Cobras no longer have those issues sooooo... huh. Perhaps I'm getting twisted up in my understanding of some equation somewhere. Either way... what would make it possible for pursuit to be rolled normally on both computers until THIS particular event?


Banging my head against a brick here... ugh.
 
1) Is there a condition in pursuit roll that might be in local context, yield different results for different computers, or not run at all on every computer?

3) Prehistoric game, so no catapults or archer bombards whatsoever. Only animals and hunters involved the turns we have gotten OOS...; my impression is that it is more frequent when the hunter gets to high experience level. Leading me to believe it might be promotion related.

We don't play with spontaneous battle promotions or infinite experience.
Perhaps something is weird about the infinite experience for animals and barbs promotions...

4) perhaps something went wrong earlier in the game, do you think we should post longer random number sets than we currently do?

I've seen that there is difference in city yields for the different OOSLogs which could indicate that the problems lies elsewhere than combat... combat and withdrawal happens every turn, meaning it is easy to suspect when innocent...

Maybe the city yield difference stems from one computer calculating hunting kill yield while the other just moves the hunter without combat, or that an animal can spawn on one computer but not on the other. OOS problems can be seriously complicated and it might be wise to get help from the other team-members. I don't want you to burn yourself out on this issue.
 
1) Is there a condition in pursuit roll that might be in local context, yield different results for different computers, or not run at all on every computer?
Not anything I can think of except perhaps a difference in one computer having defenders withdraw on and the other not but not only do I think that's not supposed to be possible, the fact that pursuit has been rolled numerous times in this stream without causing an oos suggests that it would be a rare event that there would be a differing gamestate when this role is made. The code is NOT that complex there... very straightforward.

However, just IN-CASE, I moved the location of when the roll is generated to the same location as the generation of the basic attack rolls so it should NOT have any reason to vary the gamestate now.

3) Prehistoric game, so no catapults or archer bombards whatsoever. Only animals and hunters involved the turns we have gotten OOS...; my impression is that it is more frequent when the hunter gets to high experience level. Leading me to believe it might be promotion related.
I was wondering that too but both systems show the same promotions on all units. I suppose it might be good to check out the actual promotions its listing for that hunter though... it COULD be possible that there is some OOS causing python attached to one of those promos.

We don't play with spontaneous battle promotions or infinite experience.
Perhaps something is weird about the infinite experience for animals and barbs promotions...
Possible but I doubt that's the issue as they wouldn't really ever be called for information during these battles. If that were true, infinite xp would be a safer option for MP BUT my wife and I have experienced this OOS error numerous times as well - being fairly sure it's coming from the battle somehow. So those particular promos aren't very suspect imo.

4) perhaps something went wrong earlier in the game, do you think we should post longer random number sets than we currently do?
While possible, the thing that bugs me is that the OOS logs don't show any other discrepancies and in fact, it's certainly during the middle of the battle somewhere that the randoms go off track somehow (and relatively rarely at that... you'd think this would be something that would be regularly triggered somehow!)

I've seen that there is difference in city yields for the different OOSLogs which could indicate that the problems lies elsewhere than combat... combat and withdrawal happens every turn, meaning it is easy to suspect when innocent...
Yeah, I noticed those too and I'm sure there's something wrong elsewhere as well. If I can find this one perhaps I can start looking for those.

Maybe the city yield difference stems from one computer calculating hunting kill yield while the other just moves the hunter without combat, or that an animal can spawn on one computer but not on the other. OOS problems can be seriously complicated and it might be wise to get help from the other team-members. I don't want you to burn yourself out on this issue.
The problem on those I'm pretty sure is either located somewhere in the AI that determines what plots are worked OR in the segment where the other player is updated when a change to the plots worked is made. This is because the plots worked in a particular city is really all that differs.

Burn out? Yeah, this is scouring the brainpan for sure.

One thing I've noticed is that in the original withdrawal coding there's a CvEventReporter::getInstance().combatRetreat(this, pDefender); call that is missing in most other withdrawal event processing codes. I'm not entirely sure if this is to communicate from one system to the other what just happened or if it's just the trigger for the combat log report.

Since many battles take place without problem, it makes this one issue all the more confusing.

No chance we can figure out or recall which type of unit that tracker took out is there?
 
One thing I've noticed is that in the original withdrawal coding there's a CvEventReporter::getInstance().combatRetreat(this, pDefender); call that is missing in most other withdrawal event processing codes. I'm not entirely sure if this is to communicate from one system to the other what just happened or if it's just the trigger for the combat log report.

No chance we can figure out or recall which type of unit that tracker took out is there?

I would also guess it's combat log related because I don't see why any other code needs to evaluate if the unit withdrew or not, why it's not in the others beats me.

No chance of remembering as we played with quick combat and minimize AI turns. Makes it hard to know due to hunters defending 10 times per attack, hence most combat OOS will happen through defense.
The OOS often happens shortly after the players turn become active, weird thing is that about half of the animals attack before we get our turns activated and the other half attacks during our turn when playing simultaneous.

We will change these options and be on the lookout for more detailed information for our next report. Animal type, promotions involved, the whole shebang.
 
Revision 6439
Only had time for one OOS today but it seems to be quite unique.

OOS logs shows that an AI have three healers with data discrepancies.
Player 14, Unit ID: 639007, Healer
Player 14, Unit ID: 630814, Healer
Player 14, Unit ID: 638987, Healer
Player 14 would be Mehmed II, who recently started a war with Walesa.

-Both computers have "Unit ID: 630814". but their position differs by one tile.
-Player-0's comp. have it in pos. "X: 84, Y: 68"
-Player-1's comp. have it in pos. "X: 84, Y: 67"

-Only Player-0's comp. have "Unit ID: 638987", and it's in pos. "X: 84, Y: 67"
-Only player-1's comp. have "Unit ID: 639007", and it's in pos. "X: 84, Y: 68"
-These two have different promotions, "Unit ID: 639007" would be considered stronger in a fight...

-"X: 84, Y: 67" Is the tile of their capital (Tribal guardian has that position).

-RandomLoggers are identical.

A wild guess on what happened assuming the logs were made at beginning of player turn showing the outcome of some combat involving the healers: A large stack of barbs attacked the capital and Mehmed had all healers in question stacked in the capital.
(Unit ID): (Player-0 Outcome) - (Player-1 Outcome)
"630814": withdrew - Survived
"639007": died - withdrew
"638987": survived - died

The withdrawal of "639007" is of course speculation as there is no way of knowing where it was to begin with and survived could mean no battle took place. The big question would then be... what could make this happen without discrepancies in the RandomLogger.

View attachment 364555

Please, do tell if my speculations are only considered a distraction.

Edit: there is no indication of that hypothetical battle as the tribal guardian have full health.

EditEdit: could multiple production lead to different unit ID assignments if two AI units were produced in the same city on the same round, and could the difference in ID lead to the AI assigning different promotions for them?
Another thing, there is a Korean thief inside the capital that could have fought the healers around the city.
 
Your musings are certainly interesting... I'm assuming you are using my latest update? What you could be proving is that some part of the recalculation of a unit's strength between combat rounds is our culprit. I'd wondered but it's a hell of a labyrinth to sort that out. I figured if I changed the location of the random and it still took place then all the changed random location was doing was reacting to some other condition that had already thrown things out of sync.

So... yeah. Looks like it might be a very long journey through a lot of code to determine what may be allowing the computers on the network to get a different result during combat recalculations. ugh...

Either that or yeah, there was something interesting about the multi-threading as you suggested.
 
A suggestion for something to look at: stack attack. Specifically, the one that is enabled via the BUG options screen.

Regular attacks use CvUnit::UpdateCombat. The DCM stack attack option causes it to use CvUnit::updateStackCombat instead (update combat is still the thing called, but right at the start of that it calls updateStackCombat and then returns if the option is on).

What happens if one player has stack attack on and the other doesn't? (Is that supposed to be possible? If it isn't, can it be done anyway?) This will cause problems if the mechanics of the actual combats across those two functions are not exactly the same - every random number has to be generated in the same order and there can't be any extras for the stack attack (like when making the evaluation for which unit it will attack with next, which must be done somewhere but I didn't look for it).

Perhaps not the problem, but it occurred to me since I know that stack attack uses a completely separate function to do the combat calculations. Two different functions to do essentially the same thing is a potential source of an OOS.
 
@God-Emperor:
Stack attack might be a reasonable suspect for OOS issues but in our case it can't be as we do not play with any of the stack attack options turned on.

Edit: OOS report.
Rev. 6465.

I'm pretty sure OOS-1 is related to the fact (only for simultaneous MP in times where there are crazy amounts of animals on the map; turn 200-1000 on eternity) that player turn get activated before barbs turn is deactivated leading to animals and Neanderthals attacking players at the same time as players are active.
More on this in Readme file. This might also be the reason behind a lot of our earlier posted OOS reports.

View attachment 364698

Other problems that occur due to player turn being activated before barbs finish theirs are:
-buggy UI; unit cycle focus on first unit, before player can do anything animals attack it, after battle animations all unit action buttons are gone and the player has to reselect the selected unit to get them back.
-Neanderthals spawning before players turn is activated and attacks the one it spawned beside right afterward without giving the unit a chance to move to safety, this is probably because they get their movement points back when player turn is activated which is before the barb unit cycle have reached the Neanderthal.
I've also experienced that an elephant moved towards my unit before player turn activation and attacked before I got it to safety during my turn, probably the same issue as above.
-Unit level up one turn later than without "simultaneous turns" because animals attack after the turn have begun instead of before, also because of this, subdued animals from defense battles will cost player gold for one turn before they can be slaughtered in the same manner as it normally is with animals subdued through offense battles (can lead to huge costs since most animals are subdued through defense rather than offense).

EditEdit: More logs, rev. 6486.
View attachment 364779 - View attachment 364780

Tracker in OOS-5 had promotions:
Hunter III, Woodsman III, Guerilla III, Arctic Combat II, Hunt Down II, Hit & Run, Sentry, Mountanair, Barb. Hunter and Animal Hunter.
Opponent were barb. "Stone Thrower" on terrain "Peat Bog/Marsh" 25%.

EditEditEdit:
We have four more logs that I won't upload for a while because of lack of attachment space. OOS - 11-14 _ Rev. 6493. (placeholder)
RandomLogger from all of these are clearly showing a new issue with multi-threading that did not exist in rev. 6486.
 
Another round of OOS logs from the same game as above. The ones Toffer didn't have space for.

edit: New OOS logs.
OOS - 15 - 20.7z
 

Attachments

  • OOS - 11.7z
    8 MB · Views: 38
  • OOS - 12-14.7z
    7.6 MB · Views: 33
[OOS-1, Lots of OOSLogs from an earlier game] (simultaneous turn specific OOS)
In the beginning of the game we had a lot of trouble from barb. activity during human player turn for simultaneous game, this went away later probably because there were a lot less animals after scouts and trackers started to roam the map.
We quickly figured out how to mostly avoid this by waiting for some seconds at the beginning of every turn, letting all animals and Neanderthals make their move, before we made any of ours.

[OOS-2, OOS-3, OOS-7, OOS-8, OOS-12-13] (Note to Koshling: I suspect the changes you made to city governor early 2013 may be the source of this OOS.)
Throughout the game we had trouble from automated citizens when adding something to the top of a non-empty queue list that require citizen recalculation; leading to city yield difference in the OOS logs.
We eventually figured out that this one was avoidable by turning off automated citizens before we added something to the top of the build list, although avoidable I would like this one to be prioritized as it is extremely easy to forget to avoid which makes it more frustrating than other sorts of OOS when you do forget. This OOS is also clearly identified and always reproducible and should therefore not be too hard to debug and fix.

[OOS-4, OOS-6]
These were quite seldom because they only happened when giving long go-to orders to units with more than 1 movement points where they stop before all movement is used up due to an animal blocking the way; the OOS happens if the original go-to order is issued again. We suspect there is a bug in path-finding code (we play with UseAIPathing off). Again, this one is avoidable by moving the last steps tile by tile.

[OOS-5]
Combat related. unavoidable.

[OOS-9-11, OOS-14-15, OOS-19]
Mystery.

[OOS-16-18]
We believe these were all related to an AI player's newly acquired Great Commander (Maybe the first GC in this game) since it is involved in all OOSLogs and because when I forced him away from the war front by killing off his guards with an ambusher, OOS of this kind completely stopped.

[OOS-20]
Automated border patrol function believed to be of blame.

[OOS-11-20] (the scrambled order is not the cause of the OOS's)
RandomLogger have discrepancies all over the place in all these.
Seems like a multi-threading issue as the only difference really is a scrambled sequence order. It is new; it did not exist in rev. 6486.

We will now take a break from OOS Reporting as the game got almost unplayable before we reaching sedentary lifestyle.
 
Yeah, this is all very frustrating. I'd like to be able to play a clean multiplayer game myself. But I'm finding these to be incredibly difficult to find. I feel like I could spend the next year looking and resolving nothing. Keep in mind though that you CAN turn off the multi-threading in A_New_Dawn_Global_Defines under NUM_CITY_PIPELINE_THREADS (turn to 1). This might help if you have that set up from 1.
 
Will take a look when I can thanks!
We have some sort of known crash bug when reloading a game from within a running game and I think that's a bigger issue for hotseat. Something about not everything clearing out and the new load retaining some erroneous leftover info from the previous game. Are you having trouble simply loading that save from the main menu? If so... send me the save itself. Will probably be a lot more useful than the mini alone.
 
Top Bottom