Easier iteration in C++?

Leoreth · Aug 10, 2020

I think anyone who has written any significant C++ change has been at the point where the problem boiled down to "iterate all instances of a type, maybe filter and then do something with them". Often this is centered around an enum ArbitraryTypes for which an NUM_ARBITRARY_TYPES or GC.getNumArbitraryTypes(). You then want to either access CvArbitraryInfo or call something like SomeObject::someMethod(ArbitraryTypes eType).

This usually produces code like this:

Code:

for (int iI = 0; iI < NUM_ARBITRARY_TYPES; iI++)
{
    CvArbitraryInfo* kArbitraryInfo = GC.getArbitraryInfo((ArbitraryTypes)iI);
    // do some stuff with your info class or whatever
}

This is not only very tedious to write, and can quickly become unreadable and unsafe, especially when multiple nested loops are involved. Which types were iI and iJ again? The compiler can't help you because you're casting anyway. Not to mention that unless properly refactored, those casts combined with meaningless variable names add a lot of needless business to the code.

Of course that could be somewhat improved with better naming conventions (hope is lost for Firaxis code though), but shouldn't a higher abstraction level of this be the preferred solution?

So I have no practical experience with C++ so I apologise for inaccurate terminology after this point, but how can we generify this problem? Is there a template or pattern I can use?

Some iterator generator that produces an iterator<ArbitraryTypes> of length NUM_ARBITRARY_TYPES would already help, but plain for looping over an iterator is still very tedious. Is there a template or macro that can help with this?

Some fantasy pseudocode:

Code:

MAGIC_TYPE_FOR(ArbitraryTypes, NUM_ARBITRARY_TYPES)
{
    // this is a loop body where I have eArbitraryType of type ArbitraryTypes available (fine if needed to be declared earlier) and will iterate through all NUM_ARBITRARY_TYPES values and then stop.
}

As I said, I don't have enough C++ experience to produce sufficiently generic C++ sugar like that but I am tired of typing so much boilerplate. I know there are people who are better than me and can maybe help out with what's possible, and where to start, or even point me to mods that have already improved something in that area.

f1rpo · Aug 10, 2020

In AdvCiv, I use

Code:

FOR_EACH_ENUM(Arbitrary)
{
    CvArbitraryInfo const& kLoopArbitrary = GC.getInfo(eLoopArbitrary);
    // ...
}

This macro is mostly @Nightinggale's idea:
https://github.com/We-the-People-civ4col-mod/Mod/issues/303
In the comments, him and I discussed the pros and cons of letting the macro set the variable name. In "We the People," the macro is based on code generated by a Perl script. In AdvCiv, I use only the preprocessor; most of the relevant code is in CvEnums.h and CvInfoEnums.h. Either way, the setup is a bit involved – but also enables Nightinggale's EnumMap class (and allows parts of CvGlobals to be generated as well). Perhaps doesn't have to be so involved either. The only thing that is really needed is a global function ("getEnumLength" in my code) that is overloaded for every enum type and returns the length of that enum. Edit: Here's a poor man's version I wrote for another mod - without an enum length function: Git commit | amendment

I've added this variant with a custom variable name:

Code:

FOR_EACH_ENUM2(Arbitrary, eMyArbitrary)
{
    CvArbitraryInfo const& kMyArbitrary = GC.getInfo(eMyArbitrary);
    // ...
}

The "2" is supposed to mean that there are two arguments. Clumsy, but can't overload macros.
I also have a macro to abbreviate the access to the CvInfo object:

Code:

FOR_EACH_ENUM(Arbitrary)
{
   _kLoop_(Arbitrary); // defines kLoopArbitrary
    // ...
}

but I'm not using it; too obscure.

Since references can't be reassigned and C++03 doesn't have range-based loops, there's no way to define a CvInfo& in a loop header. Another way how a reference could be declared through a pair of macros:

Code:

FOR_EACH_INFO_START(Building)
   // ...
END_FOR_EACH

The opening macro would include the opening curly brace of the loop body (before defining the CvInfo&), and the closing macro would close the loop body. Not worth it imo. With a CvInfo*, there should be a way to define everything in or before the loop header, and wrap that into a single macro (not tested beyond a syntax check):

Spoiler :

One can't define more than one variable in a loop header, so they have to be in the surrounding scope. Well, that's kind of how MSVC03 treats variables in loop headers too. Or they could be global "metavariables." However, I'm partial to using references when appropriate, and shortening getArbitraryInfo to an overloaded (and inlined) getInfo already makes it less painful to access the CvInfo objects, which some loops don't need to access anyway. Indeed, an id-only loop macro can also be used for enum types that don't correspond to a CvInfo class, e.g. FlavorTypes.

Edit (29 Sept): The above compiles, but isn't correct. Part of the increment has to happen in the termination check:

Spoiler :

As for an iterator, I'd see the main appeal in overloading the dereference operator:

Code:

for (InfoIter<BuildingTypes> it; it.hasNext(); ++it) // Could still wrap this into a macro for brevity
{
   GC.getGame().getNumFreeBonuses(*it);
   it->getNumFreeBonuses();
}

Of course dereferencing can only provide either the id or the CvInfo&. If it's the id, then the CvInfo& could be obtained through something like it.info(). However, I don't think an iterator could be 100% as fast as a for loop that directly enumerates the ids, at least not when the enum length is known at compile time.

Firaxis shouldn't have established the convention of passing objects around by id. If they hadn't done that, then ids would only be needed occasionally, mainly for serialization and Python calls. In C++11, one could then simply write:

for (auto const& kArbitrary : xml.arbitrary())

where xml.arbitrary would be an accessor for the vector containing the Arbitrary info objects.

Leoreth · Aug 10, 2020

Oh how great! I vaguely remembered you and Nightinggale talking about something like that, but I wasn't sure if it was the exact same problem. I think I will adapt this at some point, even if I won't refactor the entire codebase onto it.

How does it know the length of the enumeration? Does the enum need to be typed out? Many enums only exist for typing reasons but are XML backed, so obviously the length isn't known at compile time. Is there a mapping between types and their num functions defined somewhere (obviously I can go look later, so no need for an in depth explanation).

I wonder if there is also a solution for "cities of a player" and "units in a selection group" and "unit on a plot" type loops, which also seem very frequent and verbose.

And yeah, there are a number of dubious decisions in how Firaxis decided to structure their code. Even the whole decision to make everything typesafe (which I support, obviously) is undercut in certain points where a function that should return a type enum instead returns an int for basically no reason, leading to more casting. Always with the casting, what is this, Hogwarts? Although I can't say if the reason for this is lack of engineering lead and inexperienced programmers that didn't coordinate or just if that was just how things were done at the time. I've certainly seen worse codebases though.

But still, this is very cool. Can't wait to try it.

f1rpo · Aug 10, 2020

@MattCA is in the process of refactoring loops in C2C. Two examples: Git commit 1 2
The iterator and functor framework (based on Boost 1.55 rather than ~~1.35~~ edit: 1.32) was set up by billw at the end of last year. I think it covers all commonly used types of iterations except most enums. I might've adopted that framework if I didn't already have my own solutions in place. It also looks a bit alien; powerful though.

Leoreth said:
How does it know the length of the enumeration? Does the enum need to be typed out? Many enums only exist for typing reasons but are XML backed, so obviously the length isn't known at compile time. Is there a mapping between types and their num functions defined somewhere (obviously I can go look later, so no need for an in depth explanation).

The length is either

GC.getNum##MixedCase##Infos()

or

NUM_##UPPER_CASE##_TYPES

The preprocessor can't convert between upper case and mixed case. The enum type names are in mixed case, so, for the info enums (dynamic length), it's easy to write a for-each macro. For the static-length enums, one has to either write a separate macro that takes two parameters (one mixed case, one upper case), or write or generate a length function for each enum type. Or (Nightinggale's idea) add e.g.

#define NUM_Yield_TYPES NUM_YIELD_TYPES

after each (relevant) static enum.

I wonder if there is also a solution for "cities of a player" and "units in a selection group" and "unit on a plot" type loops, which also seem very frequent and verbose.

For the FFreeListThrashArrays (cities, units, selection groups of a player; all areas, deals), I also use macros:

Code:

bool CvPlayer::hasBonus(BonusTypes eBonus) const
{
   FOR_EACH_CITY(pLoopCity, *this)
   {
       if (pLoopCity->hasBonus(eBonus))
           return true;
   }
   return false;
}

Adopting these two files might be enough to enable them:
FreeListTraversal.h | cpp.hint
The AI versions shouldn't be needed. (I need them because I removed the virtual CvCity::AI_... etc. functions.) Maybe the hint file isn't needed either with a recent version of Visual Studio. It's just for IntelliSense.

For the CLinkList<IDInfo> (unit in group or plot), I don't have a good solution, and envy C2C. I haven't had to write many of those, probably because I don't work with the Unit AI much. I did replace while loops (which allow the list to change during traversal but are error-prone to write and modify) with for loops when it was obvious that the traversal didn't alter the list. And I've constified the CLLNode* when possible. However, that's not a proper protection against accidentally altering the list (e.g. through CvUnit::kill). C2C has added some clever assertion to catch those problems, but it seems that I haven't kept a link to the respective Git commit and I don't think it's easily portable anyway. I did bookmark this post by billw about his iterator for unit lists:
https://forums.civfanatics.com/threads/pointer-invalidation-issues.647875/#post-15529678

For more complex recurring loops, I've written iterators:
• PlayerIter, TeamIter, MemberIter (AgentIterator.h) – not really portable
• CityPlotIter (city plots), SquareIter (stepDist metric), PlotCircleIter (plotDist metric)

Not sure what a lightweight solution for players and teams could be.

For iterating over the whole map, FOR_EACH_ENUM can be used if an enum type (with a length function) for plot ids is added. (I've named mine "PlotNumTypes".)

Always with the casting, what is this, Hogwarts?

In the case of CvInfo functions that return and accept int ids, the reason may have been to allow CyInfoInterface*.cpp to link directly to the CvInfo getters. For return types, it actually seems to be fine to use enums; for enum parameters, an adapter function is needed – but is easy enough to generate in-line through a macro, e.g.: CvInfo_Building.h#L257

Whoever came up with the coding conventions and general structure (to the extent that it exists) of the DLL seems to have had more experience with C than C++. I get that impression from the (mis-)use of virtual functions (AI_...), disuse of the STL and variable declarations only at the start of functions (surely that must've been widely discouraged in C++ by 2003?). Well, at least most of the conventions were followed very consistently in the game rule and AI classes.

Leoreth · Aug 11, 2020

Really cool. I hope to find the time to port this soon.

f1rpo · Aug 12, 2020

Small addendum about the iterator approach to enums (probably still not worthwhile though):

f1rpo said:
As for an iterator, [...] of course dereferencing can only provide either the id or the CvInfo&. If it's the id, then the CvInfo& could be obtained through something like it.info(). However, I don't think an iterator could be 100% as fast as a for loop that directly enumerates the ids, at least not when the enum length is known at compile time.

Come to think of it: One actually could let operator-> return a CvInfo& and operator* an id, taking advantage of the fact that the id can't have members and that the CvInfo& is (in most cases) only needed for accessing members. An "x" as the Hungarian notation prefix could hint at the polymorphic nature of this iterator/ smart pointer. With a wrapper macro, it could look like this:

Code:

FOR_EACH_INFO(Building)
{
   if (GC.getGame().getNumFreeBonuses(*xBuilding) == xBuilding->getNumFreeBonuses())
   {
       // ...
   }
}

This would only be used for info enum types, and almost all of those have dynamic length (counterexample: GameOptionTypes), so performance might not be an issue after all. For non-info enum types - or when the info object isn't needed - FOR_EACH_ENUM could be used. I don't like the frequent use of the arrow and star operators; adds to the overall weirdness of this non-idiom.

Nightinggale · Sep 21, 2020

This thread gave me an idea. How about treating the info classes as linked lists?

PHP:

for (CvBuildingInfo *xBuilding = &GC.getBuildingInfo((BuildingTypes)0); xBuilding != NULL; xBuilding = xBuilding->next())

With a well written define, it can be shortened to

PHP:

FOR_EACH_BUILDING(xBuilding)

CvBuildingInfo then needs to gain the functions:

PHP:

CvBuildingInfo* next() const;
BuildingTypes getID() const;

Ideally this should be cached values returned in inline functions. The cache then has to be set after loading xml data, but prior to usage. readPass2() is a good candidate.

I think this is likely the fastest code we can get if you have to look up each info class anyway. If you only need the index, but not the info or only a minority of the info classes, then this one will be slower due to memory latency issues.

This is also nice when debugging because the GC.get*Info() is a number of jumps while stepping through some code while the cached version is just a single function call to a one line function.

Leoreth · Sep 28, 2020

Oh, yeah, that looks cool. Is that meant to be applied generically to all info types?

Even unrelated to iteration, I think getID() should be a member of info classes. I remember being at times frustrated that I have to carry the ID forward when I already have the info class to pass around.

Nightinggale · Sep 28, 2020

Leoreth said:
Even unrelated to iteration, I think getID() should be a member of info classes. I remember being at times frustrated that I have to carry the ID forward when I already have the info class to pass around.

I'm not entirely sure it's safe to add/remove/alter the base info class. The reason is memory layout. CvPlayer has virtual functions, which at compile time changes into function pointers. Those pointers are then class pointer+offset. Nothing special except the offset is hardcoded into the exe. This means adding or removing virtual functions will make the exe call incorrect functions. The exe is interested in the info classes too and possibly even child classes. The memory layout for a child class will be altered if the parent is 4 bytes bigger.

Adding a function is safe as long as it's not virtual. Adding a virtual function or member variables might not be safe. Just something to keep in mind.

Apart from that, I agree. getID() should be present in all instances of info classes.

One way to do this would be to make a function, which looks up the index of getType(). It's however slower than storing an int, but it will work without touching the memory layout. You can then add getID() to say building info where it is cached. This way it will always work, but if you know you frequently use it for a specific class (building, unit, etc), then you can add a shortcut in those for performance reasons and still be safe regarding the memory layout.

EDIT: I just realized you need to add getID() to each class. The base class can return an int, but you need class specific functions if you want to return BuildingTypes etc and you most likely want to do that. Being type specific is a good way to catch bugs, particularly BuildingTypes != BuildingClassTypes. I have more than once caught bugs caused by mixing UnitTypes and UnitClassTypes.

Speaking of memory layout, if you call get*Info(), you will get a reference. If you store it in a local variable without declaring that variable a reference, it will be freed when it goes out of scope and then it crashes the next time get*info() is called with the same index. The solution is to tell the compiler that you don't want to ever try to copy the info classes, which boost can do if you type the following:

PHP:

class CvInfoBase : private boost::noncopyable

This will then make the compiler create an error if you forget the & sign. It can save you from a whole lot of headache later.

Speaking of missing &, I added one to the code in my previous post (oh the joy of typing code directly in a browser). You need a pointer to the first info class, not the reference. Also I would argue that it should be a const pointer, but then you might need to add const to a lot of places in the info classes because not all const functions have the const keyword.

f1rpo · Sep 29, 2020

Nightinggale said:
This is also nice when debugging because the GC.get*Info() is a number of jumps while stepping through some code while the cached version is just a single function call to a one line function.

I normally allow (some) inlining in my debug builds (through /Ob1), and then the CvGlobals::getInstance calls disappear. The getInfo calls remain, I suppose, because inlining the array-bounds assertions would hurt branch prediction. It would be nice to avoid those assertions because they're unnecessary when going through all info objects. Your linked-list approach should accomplish that and should get fully inlined with /Ob1.

That said, so long as most functions in the code base take info objects by id and not by pointer or reference, ids are going to be needed frequently, so there'll be plenty of

pBuilding->getID()

calls, which seems not much better than frequent calls to

GC.getInfo(eBuilding).

Considering the effort of writing code that sets and returns the ids in a type-safe manner, and that id-based loops will still be needed, I think I'll stick to FOR_EACH_ENUM macros and to the convention of passing by id. I could easily let the preprocessor generate getInfoUnsafe functions that don't assert array bounds, but then I'd really need to wrap

CvBuilding const& kLoopBuilding = GC.getInfoUnsafe(eLoopBuilding);

into a macro that works only inside a FOR_EACH_ENUM loop. As I wrote earlier

Code:

FOR_EACH_ENUM(Building)
{
   _kLoop_(Building);
   // ...
}

looks a bit too strange. Maybe SET_LOOP_INFO(Building)? NEXT_INFO(Building)?

Nightinggale said:
Being type specific is a good way to catch bugs, particularly BuildingTypes != BuildingClassTypes. I have more than once caught bugs caused by mixing UnitTypes and UnitClassTypes.

On this note: I've added declarations of comparison operators for pairs of enum types that are easily confused, so that the linker will report such comparisons. GitHub.com/f1rpo/.../CvEnums.h#2279

On a different note, I've now written some macros (GitHub.com/f1rpo/.../LinkedListTraversal.h) for traversing the commonly used types of CLinkList, namely CvPlot::m_units, CvSelectionGroup::m_units and various CLinkList<TradeData> objects. Conveniently, CvSelectionGroup and CvPlot have the same wrapper functions headUnitNode, nextUnitNode for accessing the private member m_units, so a single FOR_EACH_UNIT_IN macro can take care of both. (I already have a FOR_EACH_UNIT macro for units owned by a given player – so that macro name is already taken.) For CLinkList<TradeData>, the list object needs to be passed to the macro FOR_EACH_TRADE_ITEM.
Examples:

Spoiler :

Easier iteration in C++?

Leoreth

Bofurin

f1rpo

plastics

Leoreth

Bofurin

f1rpo

plastics

Leoreth

Bofurin

f1rpo

plastics

Nightinggale

Deity

Leoreth

Bofurin

Nightinggale

Deity

f1rpo

plastics

Similar threads