• We are currently performing site maintenance, parts of civfanatics are currently offline, but will come back online in the coming days (this includes any time you see the message "account suspended"). For more updates please see here.

Class for fixed-point arithmetic

f1rpo

plastics
Joined
May 22, 2014
Messages
2,204
Location
Germany
I've written a small class for fixed-point arithmetic that I wanted to share. At least as a basis for discussion – maybe we can collaboratively come up with something better.
GitHub.com/f1rpo/.../ScaledInt.h#L51

Edit
(20 May): latest version -- pretty much(?) ready for inclusion in other mods

For computations in which large rounding errors are problematic and small ones tolerable, and for computing fractional powers, e.g. sqrt(0.9), both without risking OOS errors in multiplayer games. For background on that, see @Nightinggale's description of open issue #176 in the "We the People" (Colonization) mod, specifically under "Why we can't use floating point calculations."

(My own mod "AdvCiv" actually uses floating-point numbers extensively in AI code, and I've experienced no OOS issues – but synchronicity has only been tested on fourseven (edit: as of 20 May) different PCs. It might be that all recent Intel and AMD processors have the same floating-point behavior, perhaps due to their support for SSE2, but it's better not to rely on this poorly tested hypothesis.)

I've named the class "ScaledInt" (edit 20 May: now "ScaledNum") with one template parameter for the scale factor and another for the integer data type that stores the scaled number. It resembles the scaled_integer class in the Compositional Numeric Library (which isn't compatible with MSVC03).

Speaking of external libraries, we do have boost::rational at our disposal – but that's far less efficient because the denominator can't be resolved at compile time, and I don't think that's worth the greater precision and range. (Not to mention that boost::rational keeps the ratio in a normalized form at all times, which is really slow.)

In the hope that a single scale factor will work well enough in most situations, I've defined
typedef ScaledInt<1024,int> scaled_int;
typedef ScaledInt<1024,uint> scaled_uint;
This way, scaled_int has a range of ca. -2 mio. to +2 mio. A much smaller range would be awkward in AI computations, but perhaps 2048 would be a better standard scale factor.
Edit (20 May): Now defined as
typedef ScaledNum<2048,int> scaled;
typedef ScaledNum<2048,uint> uscaled;

Performance: For addition and subtraction, the only overhead compared with regular integer math should be the conversion from (non-scaled) integers that aren't known at compile time to scaled_int (a bitshift) and from scaled_int back to integer (costlier than a bitshift when rounding to nearest). Fast multiplication is a challenge because intermediate results can overflow; with scaled_int, that happens already when squaring 45.5 (because 45.5*1024*45.5*1024 exceeds INT_MAX). As for division, there is a potential for saving some time when applying modifiers that are given as percentages in XML – provided that the CvInfo classes store those modifiers as scaled_int so that the conversion from scale 100 happens only once during XML loading. Also, any hardcoded fractional modifiers can be converted to scaled_int at compile time. Example:
Spoiler :
Optimized assembly for two lines of K-Mod code in CvPlayerAI::AI_yieldWeight:
Code:
  iWeight *= 275;
lea    eax,[esi*8]      # eax = iWeight * 8
sub    eax,esi          # eax -= iWeight
add    eax,eax          # eax *= 2
add    eax,eax          # eax *= 2
add    eax,eax          # eax *= 2
sub    eax,esi          # eax -= iWeight
lea    ecx,[eax+eax*4]  # ecx = eax + eax * 4
  iWeight /= 100;
mov    eax,51EB851Fh    # eax = magicConstant
imul   ecx              # eax *= ecx
sar    edx,5            # edx >>= 5
mov    esi,edx          # esi = edx
shr    esi,1Fh          # esi >>= 31
add    esi,edx          # esi += edx
If the weight variable is turned into a scaled_uint that doesn't check for overflow, the assembly becomes:
Code:
  rWeight *= per100(275);
lea    edx,[esi+esi*2]    # edx = 3 * rWeight.m_i
add    edx,edx            # edx *= 2
add    edx,edx            # edx *= 2
sub    edx,esi            # edx -= rWeight.m_i
shl    edx,8              # edx <<= 8
add    edx,200h           # edx += 512 (round to nearest)
shr    edx,0Ah            # edx >>= 10
mov    esi,edx            # eWeight.m_i = edx
If overflow is prevented through extension to 64 bits, we get:
Code:
  rWeight *= per100(275);
mov    eax,esi            # eax = rWeight.m_i
mov    ecx,0B00h          # ecx = 2816 (=2.75*1024)
mul    eax,ecx            # eax *= ecx
add    eax,200h           # eax += 512
adc    edx,0              # edx += CF (carry flag)
mov    esi,eax            # rWeight.m_i = eax
shrd   esi,edx,0Ah        # rWeight.m_i >>= 10 (with edx shifted in from the left)
shr    edx,0Ah            # edx >>= 10
Measurements using the TSC register suggest that this is at best marginally faster than the BtS code: GitHub.com/f1rpo/.../ScaledIntTest.cpp#L260.
The fastest method I've found that avoids overflow in a signed rWeight variable (type scaled_int), is the MulDiv function from WinBase.h:
Code:
  rWeight *= per100(275);
push   400h                       # 1024
push   0B00h                      # 2816 (=2.75*1024)
push   esi                        # rWeight.m_i
call   dword ptr [__imp__MulDiv@12]
That is, the MulDiv call doesn't get inlined and will use div instead of a bitshift. This seems to be about three times slower than the BtS code.

One could introduce a third template parameter bUNSAFE_MULT that disables overflow handling in release builds and define
typedef ScaledInt<1024,uint,true> modifier_t;
specifically for fast multiplication of modifiers. Though I can't actually think of BtS/AdvCiv code where that would be important.

Public interface: Some examples from CvPlayerAI::AI_foundValue:
Spoiler :
Code:
int iValue = 1000;
-->
scaled_int rValue = 1000;
"scaled_int" is already a lot to type. Perhaps an even shorter type name would be more prudent. For Hungarian notation (to be consistent with BtS code), one can't really use 's' as that's normally for strings. 'x' for "fixed-point" might work; I like 'r' for "rational" best.
(NB: Since scaled_int has a scale factor built in, one should probably start with a smaller constant, e.g. rValue = 100.)

Code:
iValue += GC.getMap().maxStepDistance();
-->
rValue += GC.getMap().maxStepDistance();
Overloaded operators for assigning int to scaled_int.

Code:
iValue *= (1 + 4 * pPlot->getMinOriginalStartDist());
iValue /= (1 + 2 * GC.getMapINLINE().maxStepDistance());
-->
rValue.mulDiv(1 + 4 * pPlot->getMinOriginalStartDist(),
      1 + 2 * GC.getMap().maxStepDistance());
Multiply by a fraction that can't be resolved at compile time. Could also write:
rValue *= scaled_int(1 + 4 * pPlot->getMinOriginalStartDist(),
1 + 2 * GC.getMap().maxStepDistance());
But that wouldn't always be as precise as the above.

Code:
int iMinDistanceFactor = MAX_INT;
-->
scaled_int rMinDistanceFactor = scaled_int::MAX;
Limits as static const members.

Code:
int iClosenessFactor = kOtherPlayer
      .startingPlotDistanceFactor(pPlot, getID(), iMinRange);
-->
scaled_int rClosenessFactor = per1000(kOtherPlayer
      .startingPlotDistanceFactor(pPlot, getID(), iMinRange));
The return value of startingPlotDistanceFactor is scaled by 1000. Ideally, it would be returned as a scaled_int, but let's assume that it isn't. The global per100, per1000, per10000 factory functions put the cart before the horse syntactically. Perhaps better to use the two-argument constructor instead:
scaled_int rClosenessFactor(kOtherPlayer
.startingPlotDistanceFactor(pPlot, getID(), iMinRange), 1000);

Code:
iMinDistanceFactor = std::min(iClosenessFactor, iMinDistanceFactor);
-->
rMinDistanceFactor.decreaseTo(rClosenessFactor);
Since we have a custom data type, we might as well add some conveniences.

Code:
iMinDistanceFactor = std::min(1500, iMinDistanceFactor);
-->
rMinDistanceFactor.decreaseTo(per100(150));
Alternatively, one could use a floating point expression:
rMinDistanceFactor.decreaseTo(fixp(1.5));
So long as the expression gets resolved at compile time, it's safe for multiplayer. The fixp macro ensures that by forwarding the expression as a template parameter:
Code:
#define fixp(dConstExpr) \
   ( // ... overflow checks ...
   scaled_int::fromRational<(int)( \
   (dConstExpr) * 10000 + ((dConstExpr) > 0 ? 0.5 : -0.5)), 10000>())
I guess I'd prefer "scaled_int(1.5)", but the macro needs to have a unique name.

Code:
return std::max(1, iValue);
-->
return scaled_int::max(rValue, 1).round();
Currently, one could also write return rValue.increasedTo(1).round(), but I think I'll remove increasedTo in order to avoid confusion with the (non-const) increaseTo function.
Edit (20 May): Removed increasedTo

As conversion to integer shouldn't be needed all that frequently, I think it's better not to allow implicit casts from (or assignment of) scaled_int to int. round rounds to the nearest integer. That results in a branch instruction because scaled_int can be negative. In other – potentially more frequently executed – arithmetic operations, I round intermediate results to the nearest integer only if the underlying integer type is unsigned.

Last example (from CvPlayerAI::AI_demandRebukedSneak): Coin flip function
Code:
if (kGame.getSorenRandNum(100, "AI Demand Rebuked")
      < GC.getInfo(getPersonalityType()).getDemandRebukedSneakProb())
-->
if (per100(GC.getInfo(getPersonalityType()).getDemandRebukedSneakProb())
      .bernoulliSuccess(kGame.getSorenRand(), "AI Demand Rebuked"))
Power function for fractional bases and exponents: Not used in BtS code – presumably because neither the standard library nor Boost include such a function. For AI code, I find it pretty indispensable. (In AdvCiv, I currently call std:: pow in more than 50 places.) So I've written a function ScaledInt:: pow(ScaledInt), which might be good enough for my purposes, but it uses an 8 KB lookup table, which I'm pretty sure is far from optimal performance-wise, and accuracy is also subpar.

To be done (perhaps): Edit (Mar 2): I won't be updating this list; see "tbd." comments throughout ScaledNum.h instead.
Spoiler :
* Better power function, perhaps based on this paper from 2013: (Not free, but I have a copy.)
A Division-Free Algorithm for Fixed-Point Power Exponential Function in Embedded System
* Logarithm I use far less often, but a log function would still be nice to have. The accepted answer to this Stack Overflow question looks promising: link
* The code would look nicer if operators were inherited from boost/operators.hpp. Those functions take and return all parameters by reference though; that could make a difference for performance.
* My test assertions in ScaledIntTest.cpp aren't thorough enough (poor coverage).
* A Natvis file showing the ratio m_i/SCALE as a decimal fraction would be helpful for debugging. (I haven't written one because I still use VS2010; Natvis requires VS2012.)
 
Last edited:
This is great. We have talked about adding decimals to yield storage because 2 silver/turn -1% becomes -50% due to rounding. For this reason I was planning to look into making something like this, but I haven't started yet.

without risking OOS errors in multiplayer games [...] I've experienced no OOS issues
It depends on precisely how you use floats, if a player reconnects mid game etc. Also it depends on if the rounding error happens somewhere where minor differences are critical. It's a lot of unknowns, which means floats may or may not cause issues. However there is another aspect to this. While a CPU can handle both ints and floats, it can't at the same time. It has two modes and it takes around 3 cycles to switch. This means staying in int mode is faster than switching to float and back again even if the int calculations end up taking a few cycles more. Take for instance multiplication. It divides by SCALE, which is set at compile time, meaning it can be compiled as a series of bitshifts and additions. Due to modern CPUs being able to handle multiple instructions per cycle, dividing by something other than 2^n can compete against the 3 clock cycle delay for switching to float.

In short it's quite possible that using ScaledInt is faster than float/double if the CPU is in int mode and needs to be in int mode for the following calculation. Depending on your code, it might not have a major performance impact, but from what I can tell from the source, it certainly shouldn't be slower. We would need profiling to be certain though.


I have some feedback (actually a lot) after reading the source code. It's not that I'm unhappy with the quality of the code. In fact it looks good and it's likely more complete than what I would have made (time considerations). However precisely because it is so good and complete, nitpicking about details becomes more important because it is well on the way to become a community standard to be introduced in multiple mods. For this reason, feature completeness and correct handling of edge cases should be considered.

I'm not directly requesting or ordering everything I mention to be added (it could be a not insignificant amount of work), but it should all be considered.

Class instance memory size
str() should return a CvString as this will remove the need to have a fixed char array in ScaledInt. Removing the char array will bring the byte size down to 4, which in addition to saving a bit of memory will be really interesting if we need an EnumMap of ScaledInt. I'm not 100% sure it works out of the box, but if it doesn't, then EnumMap should be updated to support it. CvString::format() is likely what you need to do this.

PHP:
_snprintf(szBuf, 32, "%s%d", isInt() ? "" : "ca. ", getInt());
// becomes
szBuf.format("%s%d", isInt() ? "" : "ca. ", getInt());
Untested, but it's something like that. I often use code like (copy pasted, meaning it's actually tested)
PHP:
FAssertMsg(false, CvString::format("%s: failed to be looked up by string %s", iIterator->first.c_str(), szBuffer.GetCString()).GetCString());


Type correctness
Let's say we make a ScaledInt for movement points and one for combat strength. We then add something like
PHP:
if (pUnit->movesLeft() < pPlot->moveCost())
Seems simple enough, but what if somebody messes up and uses combat strength instead of movement? With your current code, it will work just fine technically, but obviously the game will be buggy. It would be nice if we can make incompatible types of ScaledInt where adding, comparing or whatever will cause compile time errors.

The only thing I can think of offhand would be to add another template paremeter, more specifically int iTYPE. Each type is then given a unique number (think enum, though I recall something about ane EnumMap issue with enum->int being compiled after template parameters. Can't remember the details). ScaledInt can then only accept another ScaledInt as argument for anything if iTYPE is the same. This adds the compile type compatibility check without altering anything at runtime.

Another solution is adding a static assert, but then a failure would be at the line of the static assert (the function) rather than the line calling the function. This affects how useful the compile error message is.

SCALE incompatibility
You can't add a number from one SCALE to a variable using another SCALE. Same with INT. While the existing code is good (highly optimized), storing shorts in an array and using ints for the actual calculations should be valid code. This means double up on a number of functions.

I wonder if the FROM_SCALE stuff at line 110 can handle this automatically. I'm not completely sure and would need to test. If this is the case, then this part is a non-issue. It might not be the perfect solution as it can introduce more rounding issues (particularly if one has SCALE 50 and the other 1024 or something else of that nature), but at least it meets the fundamental requirement that we can be certain all rounding errors are consistent. If doing math on two ScaledInts with different SCALE, the least rounding errors would come from converting both to SCALE_A*SCALE_B, do the math and then divide by SCALE_B as this will leave the result in SCALE_A.

Array optimized
Might need a #pragma pack(1) as this will reduce the size down to 1 or 2 bytes should the variable be small enough. -10 to 10 with a SCALE of 10 can be stored in a single byte. It's not farfetched if you know you need to store a whole lot of them in arrays. 2 byte ScaledInt has a whole lot more use cases and it's still half the size of ints, meaning the CPU cache can contain twice as much of the array.

32 bit optimized
*= relies on __int64, which is slower than int because it's a 32 bit application. Ideally speaking, it should be int (or unsigned int) if sizeof(INT) < 4. Same with scaleForComparison.

MIN/MAX
Two unused constants. Isn't it just assigning the min/max of INT and divide by SCALE? It could be useful to have those for assert checks. Speaking of which, there is no range check in multiplication. If we have 200 * 200 using unsigned bytes, then the game will happily make an overflow.

Don't compile signed and unsigned together
One way to get around that issue is to not compile signed and unsigned together. Instead each time bSIGNED is needed, make the function internal in the sense that the function takes bSIGNED as a template argument to the function. This would result in code like:
PHP:
__forceinline ScaledInt<SCALE,INT>& operator*=(ScaledInt<SCALE,INT> rOther)
{
    return multiply<bSIGNED>(rOther);
}
With two specialized private functions called multiply, this will result in code where unsigned ints will not compile int multiplication and vice versa. EnumMap uses this approach because there are cases where specialized code will cause compile errors when compiled for certain cases where they won't be used anyway. It's also much cleaner code as it's easier to see what happens in each case rather than mixing them together in one function.

Also you should make the rOther a const reference. Const for obvious reasons and reference is good practice as it's faster for variables larger than the register size (in our case 32 bit). With the current code you place szBuf on the stack in each call, something you can avoid by using a reference.

Savegame
We need to be able to save m_i and load it again. It doesn't look like there is a clean way to do it yet, but adding one shouldn't be difficult.

Readability
The class should start with function declarations followed by the current code. This way the user (aka future programmer adding this) will get an idea of the features without having to read the implementation of each function.

Also move bSIGNED to the top together with MIN and MAX. It's confusing not to have the constants at the top. Even more confusing that there are constants both at the top and the bottom.
 
This is very cool :goodjob:

However, I came across the following assertions during my AdvCiv autoplay this weekend:

Code:
1)
Assert Failed

File:  e:\civ\firaxis games\sid meier's civilization 4 complete\beyond the sword\mods\advciv\cvgamecoredll\ScaledInt.h
Line:  384
Func:  ScaledInt<1024,int>::operator`/='
Expression:  static_cast<int>(m_i) < static_cast<int>(MAX / SCALE + 1)
Message:  Index expected to be < 2097152. (value: 3551946)

----------------------------------------------------------

2)
Assert Failed

File:  ..\.\CitySiteEvaluator.cpp
Line:  466
Func:  AIFoundValue::evaluate
Expression:  static_cast<int>(iCultureModifier) >= static_cast<int>(0)
Message:  Index expected to be >= 0. (value: -376)

----------------------------------------------------------

3)
Assert Failed

File:  e:\civ\firaxis games\sid meier's civilization 4 complete\beyond the sword\mods\advciv\cvgamecoredll\ScaledInt.h
Line:  384
Func:  ScaledInt<1024,int>::operator`/='
Expression:  static_cast<int>(m_i) < static_cast<int>(MAX / SCALE + 1)
Message:  Index expected to be < 2097152. (value: 2224039)

----------------------------------------------------------

4)
Assert Failed

File:  ..\.\CitySiteEvaluator.cpp
Line:  466
Func:  AIFoundValue::evaluate
Expression:  static_cast<int>(iCultureModifier) >= static_cast<int>(0)
Message:  Index expected to be >= 0. (value: -867)

----------------------------------------------------------


1) and 3) may be result of the fractional part overflowing \ spilling over due to lack of saturation (or I may be completely "off" here :p) , while 2) and 4) are most likely "downstream effects".

PS: I hope this is not off-topic (perhaps I should have posted this in the AdvCiv thread)


Update:
My build's HEAD was this:
https://github.com/f1rpo/AdvCiv/commit/8f753f67c2a0faa5974fd3890c11979e7d7a06cb
so it's possible that this has been rectified already.
 
Last edited:
Also you should make the rOther a const reference. Const for obvious reasons and reference is good practice as it's faster for variables larger than the register size (in our case 32 bit). With the current code you place szBuf on the stack in each call, something you can avoid by using a reference.
Using a reference rather than a value will tend to inhibit optimizations due to the requirement of of having a valid\unique address. In addition you are taking your chances that the optimizer will decide that there is no aliasing, which is never certain. A 64-bit int could be passed in ECX:EDX which is likely the most efficient calling convention available (assuming something like fastcall )
See https://en.wikipedia.org/wiki/X86_calling_conventions

Regarding MulDiv, we could replace it with integer SSE2 intrinsics to get rid of the function call overhead. Besides, saturation is "free" when using these instructions, which is a nice bonus.
 
Last edited:
Using a reference rather than a value will tend to inhibit optimizations due to the requirement of of having a valid\unique address. In addition you are taking your chances that the optimizer will decide that there is no aliasing, which is never certain. A 64-bit int could be passed in ECX:EDX which is likely the most efficient calling convention available (assuming something like fastcall )
Fair enough. Do copy by value then. However then we are back to copying szBuf whenever we call one of those functions. It makes it even more important to get rid of that string as that will reduce the entire class instance to be sizeof(INT), usually 32 bit. Once the instance can fit in a single register, then we will gain the performance boost part of using a reference coming from not copying memory we won't use anyway.
 
Thank you all for putting up with my er... explications.
[...] I was planning to look into making something like this, but I haven't started yet. [...] I have some feedback (actually a lot) after reading the source code.
Great; that's what I was hoping too.
I'm not directly requesting or ordering everything I mention to be added (it could be a not insignificant amount of work), but it should all be considered.
I can act on most of your suggestions as you've stated them, so feel free to disregard my replies below. Of course, if I'm totally on the wrong track I'd appreciate it if you'd let me know. Thanks in any case (and as always) for the instructive feedback. Obviously, anyone should also feel free to fork from my code. And if a more experienced programmer than I (or even a less experienced one) wants to take charge of this little project, I've certainly no objections.

Class instance memory size
str() should return a CvString [...]
Fair enough. Do copy by value then. However then we are back to copying szBuf whenever we call one of those functions. [...]
The char array is a static member. I don't see how that could increase the size of a ScaledInt instance. That said, will a static buffer actually work correctly for something like this:
CvString szOut = r1.str() + L"+" + r2.str();
Well, I can't even write it that way because operator+ isn't overloaded for C strings. So I guess the return type should be CvString in any case. But will returning a const reference to a static CvString work in the example above? ...
Appears to work. (Though returning those small strings by value would also be fine with me.)
Removing the char array will bring the byte size down to 4, which in addition to saving a bit of memory will be really interesting if we need an EnumMap of ScaledInt.
Right, compatibility with EnumMap should be tested and, on a similar note, Read and Write functions for serialization should be added:
Savegame
We need to be able to save m_i and load it again. It doesn't look like there is a clean way to do it yet, but adding one shouldn't be difficult.

Type correctness
The only thing I can think of offhand would be to add another template paremeter, more specifically int iTYPE. Each type is then given a unique number [...].
How about a parameter class EnumType?
Spoiler :
Then there could be a macro
Code:
#define TYPEDEF_SCALED_ENUM(iScale,IntType,TypeName) \
    enum TypeName##Types {}; /* Not really supposed to be used anywhere else */ \
    typedef ScaledInt<iScale,IntType,TypeName##Types> TypeName;
and:
Code:
TYPEDEF_SCALED_ENUM(128,short,MovementPts)
TYPEDEF_SCALED_ENUM(256,short,CombatStr)
Interesting example because movement points are already being scaled by MOVE_DENOMINATOR=60 and currCombatStrength by (hardcoded) 100. I suppose using different scales wouldn't make any visible difference apart from fewer rounding artifacts. A lot of code to change though. And, currently, MOVE_DENOMINATOR can be set in XML; ScaledInt would make that impossible. Well, no real loss.

CvUnit::movesLeft could then look like this:
Code:
MovementPts movesLeft() const
{
   //return std::max(0, maxMoves() - getMoves()); // BtS
   return MovementPts::max(0, maxMoves() - getMoves());
}
MovementPts maxMoves() const
{
   //return (baseMoves() * GC.getMOVE_DENOMINATOR()); // BtS
   return baseMoves(); // (implicit constructor call)
}
I haven't tested the above, but a simpler test looks promising:
Code:
MovementPts rMoves = fixp(1.5);
MovementPts rMaxMoves = 2;
CombatStr rCombat = 5;
if (rMoves <= rMaxMoves) // OK
   //if (rMoves > rCombat) // Doesn't compile
       rMoves++;
I'd then have a mind to
#define DECL_PARAMS int SCALE, class INT, class EnumType
and
#define PARAMS SCALE,INT,EnumType
in order to unclutter the code. (And #undef them in the end.) A fourth parameter may yet be needed to disable overflow checks for intermediate results.

SCALE incompatibility
You can't add a number from one SCALE to a variable using another SCALE. Same with INT. While the existing code is good (highly optimized), storing shorts in an array and using ints for the actual calculations should be valid code. This means double up on a number of functions.
Yes, my impression is also that different instantiations may end up getting used alongside each other routinely. Especially signed/unsigned, as unsigned can be multiplied faster.
I wonder if the FROM_SCALE stuff at line 110 can handle this automatically. I'm not completely sure and would need to test. If this is the case, then this part is a non-issue.
For comparisons and assignment operators: yes, not for operator+,-,*,/
It might not be the perfect solution as it can introduce more rounding issues [...]
True. So I guess all the operators with two ScaledInt operands will need additional template parameters, e.g. operator+:
Spoiler :
Code:
// Params to be inferred (EnumType has to be the same for both summands):
template<int SCALE, class INT, int OTHER_SCALE, class OTHER_INT, class EnumType>
__forceinline
// Return type: Pick the larger of the two scales
ScaledInt< (SCALE >= OTHER_SCALE ? SCALE : OTHER_SCALE),
// C++11. Will have to be replaced somehow.
std::common_type<INT,OTHER_INT> >
operator+(
   ScaledInt<SCALE,INT,EnumType> rLeft,
   ScaledInt<OTHER_SCALE,OTHER_INT,EnumType> rRight)
{
   // Again the return type
   ScaledInt< (SCALE >= OTHER_SCALE ? SCALE : OTHER_SCALE),
            std::common_type<INT,OTHER_INT> > r(rLeft);
   r += rRight;
   return r;
}
The body scales one of the summands up – which is what we'd rather avoid. The alternative is, as you wrote, to temporarily scale up both to SCALE*OTHER_SCALE.
Array optimized
Might need a #pragma pack(1) [...]
Can a struct with a single non-static member really have padding? Hm, apparently. Anyway, I assume that types smaller than int would mostly be used in data structures, so optimizing those for arrays sounds good.
32 bit optimized
*= relies on __int64, which is slower than int because it's a 32 bit application. Ideally speaking, it should be int (or unsigned int) if sizeof(INT) < 4. Same with scaleForComparison.
Good catch; I'll change that.
Readability
The class should start with function declarations followed by the current code. This way the user (aka future programmer adding this) will get an idea of the features without having to read the implementation of each function.
Not sure if that's really necessary for the one-liners. But some functions have grown well beyond that.
Also a matter of readability I think:
Don't compile signed and unsigned together
At least in the case of operator*=, this makes sense, but I'm not sure if that function will change again, perhaps using the same code regardless of sign. I'll wait a bit with this.
Also move bSIGNED to the top together with MIN and MAX. It's confusing not to have the constants at the top. Even more confusing that there are constants both at the top and the bottom.
MIN and MAX are at the top because they're public. I don't think bSIGNED should be public. Well, maybe moving implementations out of the class definition will help already. (Or I'll put a private section on top for bSIGNED.)
MIN/MAX
Two unused constants. Isn't it just assigning the min/max of INT and divide by SCALE? It could be useful to have those for assert checks.
Not sure what you're saying here (MIN and MAX aren't unused), but I think you're onto something. They're public for the sake of those ubiquitous argmax/ argmin loops:
int iBestValue = MIN_INT; for(...
For that purpose, MAX would have to be
std::numeric_limits<INT>::max() / SCALE
-- so I guess that's currently incorrect. Internally, std::numeric_limits<INT>::max()/SCALE is also usually what's needed for overflow checks – but not always. Might need separate constants for the limits of INT (not divided by SCALE).
Speaking of which, there is no range check in multiplication. If we have 200 * 200 using unsigned bytes, then the game will happily make an overflow.
I think ScaledInt should only be responsible for preventing overflow in intermediate results. Which is already an encumbrance (having to use MulDiv). Other overflow can be caught by assertions, which is, what ScaledInt:: operator*=(ScaledInt) does. (Maybe I'm missing something here.)
Spoiler :
Code:
{
    unsigned __int64 lNum = m_i; // (sizeof check needed here - noted.)
    lNum *= rOther.m_i;
    lNum += SCALE / 2;
    lNum /= SCALE;
    FAssert(lNum >= MIN && lNum <= MAX);
    m_i = static_cast<INT>(lNum);
}
Edit: minor clarification
 
Last edited:
Regarding MulDiv, we could replace it with integer SSE2 intrinsics to get rid of the function call overhead. Besides, saturation is "free" when using these instructions, which is a nice bonus.
That sounds wonderful; I'll try to read up on that. Thanks also for weighing in on the question of ... "evaluation strategy" is the term apparently (call by value/reference).

[...] I came across the following assertions during my AdvCiv autoplay this weekend: [...] it's possible that this has been rectified already.
I doubt that it's fixed already. Doesn't sound like ScaledInt::MAX and MIN having the wrong values is the issue either. I'll investigate. Could be an issue with ScaledInt or the way I use it. I've converted a bit of code to ScaledInt in order to test the usability of the class. That conversion is fairly error-prone. (The current head may well contain other blatant bugs. For one thing, I've just noticed that I flipped a flag and AI explorers now shun goody huts.)
 
The char array is a static member. I don't see how that could increase the size of a ScaledInt instance.
:wallbash:
This is why it is preferred to have multiple eyes looking at the same code. I don't know how many times I read that line and each time missed that it said static.

You are right. As long as it's static, it won't be stored in the instance. However I do think it will generate a char array for each set of template parameters. Not a major issue, but not that clean either.

MIN and MAX are at the top because they're public. I don't think bSIGNED should be public.
I see no reason why compile time constants shouldn't be public. While I can't think of a good specific example of how it can be useful, at the same time I won't rule out that it could be and it's not like we lose any encapsulation or anything like that. After all it's not like the values can be altered by some outside bug.

How about a parameter class EnumType?
That looks like it's cleaner than assigning an int to each and it will accomplish the same result.

Interesting example because movement points are already being scaled by MOVE_DENOMINATOR=60 and currCombatStrength by (hardcoded) 100. I suppose using different scales wouldn't make any visible difference apart from fewer rounding artifacts. A lot of code to change though. And, currently, MOVE_DENOMINATOR can be set in XML; ScaledInt would make that impossible. Well, no real loss.
For the time being, ScaledInt should support changing the code. Actually doing it is another question and it's not a bad idea to not touch the code unless you want to make other changes as well. Every time you change code, you need to test to see if you broke something, meaning from a time usage perspective, keeping known good code untouched could be the best choice even if it's not be best code.

Realistically speaking, ScaledInt will be added to new code or code, which is being modified anyway. That doesn't change the fact that ScaledInt would be more versatile if it is planned to work on all existing code.

I would propose changing MOVE_DENOMINATOR to 2162160. The reason is rounding errors. Let's say we have roads, which cost 1/1, 1/2... 1/30. How many of those will allow a unit with one movement point to travel the expected amount of plots?
If we use 60 and the road should cost 1/7, we either end up with 7 plots costing 56 or 63. Rounding means we can't hit 60 precisely. In fact the only numbers, which works perfectly would be: 1-6, 10, 12, 15, 30. Notice how it supports all values used in vanilla, but not the values between the vanilla used values.

If we go with 2162160, it matches perfectly with: 1-16, 18, 20-22, 24, 26-28, 30.
In other words from 1 to 30, it only fails on 17, 19, 23, 25, 29.
It reduces the amount of movement points an int can contain, but it can still handle a max of 993 moves. That should still be enough for any realistic use case.

Making MOVE_DENOMINATOR should allow the compiler to optimize the code better. Since pathfinding/movement cost calculations dominates the time spend on the AI, this might not be a bad idea. I don't know how much we can gain, but it's not like it's hard to implement this type of optimization.

However MOVE_DENOMINATOR is off topic for ScaledInt.
 
I don't know how many times I read that line and each time missed that it said static.
Perhaps a side-effect of bSIGNED, the other private static member, being declared much later in the file. You're quite right that such (seemingly or actually) arbitrary decisions impair readability.
You are right. As long as it's static, it won't be stored in the instance. However I do think it will generate a char array for each set of template parameters. Not a major issue, but not that clean either.
I've moved it to an otherwise empty base class now.
I see no reason why compile time constants shouldn't be public. While I can't think of a good specific example of how it can be useful, at the same time I won't rule out that it could be and it's not like we lose any encapsulation or anything like that. After all it's not like the values can be altered by some outside bug.
For the moment, I've had to make it public along with the other private data members because I've changed the type of SCALE to IntType (new name for INT, which is already defined in WinDef.h) and, curiously, this has broken the friend declaration. Looks like MSVC can't handle a dependent argument in that context (StackOverflow.com/.../friending-a-dependent-argument-template). I haven't really thought it through, but using the same type for SCALE and m_i sounds like it should be advantageous; so I hope I can find a workaround for the friend issue.

I've also made some other changes previously discussed:
GitHub.com/f1rpo/AdvCiv/commit/11d0b2e
Probably fixes the bug that @devolution had reported: operator/= was lacking 64-bit code for avoiding overflow. All the scale conversion code really needs to go in one place; I'm working on that. That should also make it easier to improve on MulDiv. (I've taken a look at the list of available SSE2 intrinsics. :borg: Well, some combination of those nifty multiplication and shift functions ought to do the job ...)

Realistically speaking, ScaledInt will be added to new code or code, which is being modified anyway. That doesn't change the fact that ScaledInt would be more versatile if it is planned to work on all existing code.
Gotcha.
However MOVE_DENOMINATOR is off topic for ScaledInt.
Sure. Interesting angle though.

On topic: I wonder if "ScaledInt"/ "scaled_int" is really the best name. I don't think it's bad, it's just a bit long and the class doesn't necessarily "scale an int"; rather, it scales a fractional number in order to approximate it as an int. Just "scaled"? Could clash with variable names ... "scaled_t"? Not much shorter than scaled_int, and "uscaled_t" for an unsigned type looks a bit cryptic perhaps. Same for "fixp/ ufixp" (and, if fixp is the class name, then what to name the floating-point conversion macro).
 
Other languages like Python call fixed point rational numbers decimals. Not sure if that clashes with an existing type though.
 
AFAIK, the decimal type is a base-10 float used for accounting\actuarial calculations when tax-compatible :p rounding modes are required. In other words, it is unrelated to integer fixed point calculation.
Edit: I think I misunderstood the statement a bit but I'll leave the comment just in case...
 
I've taken a look at the list of available SSE2 intrinsics.
Me too and I have a facepalm moment. It looks like Apple's velocity engine instructions from the PowerPC days. I don't know why I didn't connect the dots on those two before, but at least I got the velocity engine back (well not completely, but close enough). Now I wonder why EnumMap isn't using it. Add 128 bit padding and then do += and similar on 128 bit at a time rather than on each 8/16/32 variable for each iteration.

Speaking of EnumMap, I realized that adding support for storing ScaledInt (or whatever it will be called) isn't that tricky. Just store m_i and do math directly on that one, assuming it to be a regular IntType. If EnumMap just doesn't support any multiplication or division on it, then no new specialized functions would be needed. Sure multiplication and division can be added if needed, but it requires specialized functions to do so. There is no need for that unless we think we will actually need that.
 
I take it that devolution's comment is about decimal types in C++ libraries, perhaps in particular stddecimal. Whereas the Python decimal module is for exact representations of decimal numbers. I wasn't familiar with either type. "decimal" being associated with floating-point numbers in C++ rules that name out I think. But it's interesting to see that the Python decimal module has settings for rounding and overflow handling (through a Context object). Construction from string (e.g. Decimal("0.3")) is enviable – but unviable.

I'm tending toward "ScaledNum" as the template class name now ("ScaledFraction" seems a bit long) and "scaled" and "uscaled" for the default types. And keep "fixp" (at least it's brief and fairly unique) for conversion from constant double expressions, and perhaps replace the per100 factory function with constructor calls:
Code:
scaled r = per100(kSpeed.getGoldenAgePercent());
-->
scaled r(kSpeed.getGoldenAgePercent(), 100);
Though putting the 100 in the back makes it easier to miss. Doesn't hurt to allow both I guess.
To form an opinion on the name, one may have to write some code using the class. Hasn't really helped me decide; still, maybe it's best to table that decision and leave it at "ScaledInt" for now.

Blunt workaround for the friend issue (not yet committed):
Code:
template<int iSCALE, typename IntType = int, typename EnumType = int>
class ScaledInt : ScaledIntBase<void>
{
   static IntType const SCALE = static_cast<IntType>(iSCALE);
When trying to write arithmetic operators that perform scale conversion and return a ScaledInt with the larger of the two scales, I've run into another issue with dependent arguments in MSVC03. The minimal example below crashes the compiler. But it looks like I have a working – albeit elaborate – workaround for that as well (not included here).
Spoiler :
Code:
template<int i>
struct IntWrapper {};

template<int j>
IntWrapper<(true ? j : 0)>* foo() {
    return 0;
}

int main() {
   foo<0>();
}
[...] Now I wonder why EnumMap isn't using it. Add 128 bit padding and then do += and similar on 128 bit at a time rather than on each 8/16/32 variable for each iteration.
Sounds like quite a bit of work – to apply that only to those EnumMaps for which it makes sense.
More to the point of the thread, my attempt to use scaled_int in EnumMap has soon lead me to realize (as you may have all along) that it can't be a member of the union. One way to encapsulate the conversions between ScaledInt and its IntType would be to leave EnumMap unchanged and write a wrapper class around it, something like "EnumToScaledMap".
 
Last edited:
Small update: I think I've got the mixed-scale operators working now without unnecessary loss of accuracy. (Though the function templates seem kind of fragile in terms of the compiler accepting them ...) This has turned out to be relevant for multiplication, division and (exact) comparisons. More specifically, operator* calls operator*= on the operand with the greater scale, and operator*= uses ScaledInt::mulDiv (via mulDivByScale defined below mulDiv) to multiply the numerators and divide the result by the smaller scale. operator< scales both operands up to the product of the two scale factors (unless they're the same), using 64 bit if necessary.

For addition, it's actually OK to convert both operands to the result scale (i.e. the greater of the two scales) before adding them. If, for maximal precision, a result on the product scale is desired, then one should probably use boost::rational instead.

I've had to use static assertions for the "scaled enum" functionality after all; at least as a temporary solution. Compilation errors resulting from enum type mismatches would be much easier to read if function templates were defined only for matching enum types – as I had originally intended –, but I hadn't tested that approach sufficiently; not as easy as I'd hoped.

The default-precision types have been renamed to "scaled" and "uscaled" and their scale factor increased to 2048; the class template name is still unchanged, but I intend to rename it to "ScaledNum" eventually.

One new open issue: Rounding to the nearest integer after a division can lead to overflow, at least when it's done by adding SCALE/2 before dividing, and I don't think there's an equally fast method that can't overflow. For large SCALE factors (like the one Nightinggale suggested for movement points), this isn't only a hazard – rounding will routinely cause overflow.

For IntType=int, this isn't really a problem because MulDiv (WinBase.h) takes care of rounding to nearest. For short and char, rounding to nearest isn't currently implemented because it involves branching on the sign. That leaves unsigned types as an unresolved problem. Maybe it's better not to worry about this until the intrinsics are brought in. It might be that a template parameter for rounding will have to be introduced (with a default value based on SCALE and IntType).

Some other things remain to be done (see "tbd." comments scattered in ScaledInt.h), but I'll probably leave it as it is for some time – unless new problems or possibilities become apparent.
 
More to the point of the thread, my attempt to use scaled_int in EnumMap has soon lead me to realize (as you may have all along) that it can't be a member of the union. One way to encapsulate the conversions between ScaledInt and its IntType would be to leave EnumMap unchanged and write a wrapper class around it, something like "EnumToScaledMap".
Maybe the real answer to this issue is to remove T from the union in EnumMap. If I recall correctly, it is used if sizeof(T) >= 4 and the length is known to be 1 at compile time. Now how often does that happen? Removing it will not make it stop working, rather it will no longer have the variable inline, but rather rely on a regular array allocated elsewhere even if the length is just 1.

I added it for two reasons. One is that it was trivial at the time. The other is that it would actually be used in WTP at the time, but since then we changed xml and now I'm no longer sure it will be used. If it causes issues, then there is no real reason to keep it around.

One new open issue: Rounding to the nearest integer after a division can lead to overflow, at least when it's done by adding SCALE/2 before dividing, and I don't think there's an equally fast method that can't overflow. For large SCALE factors (like the one Nightinggale suggested for movement points), this isn't only a hazard – rounding will routinely cause overflow.
I don't really get this problem. If out SCALE allows a range from 0 to 199, as long as we only use the scale up to 198, adding SCALE/2 shouldn't cause overflow. If adding that value causes overflow, then odds are that it is so close to causing overflow anyway that it will happened regardless of rounding issues.
 
Maybe the real answer to this issue is to remove T from the union in EnumMap. If I recall correctly, it is used if sizeof(T) >= 4 and the length is known to be 1 at compile time. Now how often does that happen? Removing it will not make it stop working, rather it will no longer have the variable inline, but rather rely on a regular array allocated elsewhere even if the length is just 1.
Let's see. This is the offender
Code:
T m_InlineNative[NUM_NATIVE_BLOCKS];
and it gets used if
Code:
(SIZE == ENUMMAP_SIZE_NATIVE && (MAX_LENGTH * SIZE_OF_T) <= ENUMMAP_MAX_BYTES);
I would've thought that this also applies to char[4]. If I remove m_InlineNative and all code that depends on it; add a call SET_ARRAY_DEFAULT(scaled); and change the first test in TestEnumMap.cpp to EnumMap<RouteTypes,scaled> test — then at least it compiles and the tests are passed, in particular:
Code:
test.set(var, 1);
FAssert(test.get(var) == 1);
Seems that this would be a possible avenue.

I don't really get this problem. If out SCALE allows a range from 0 to 199, as long as we only use the scale up to 198, adding SCALE/2 shouldn't cause overflow. If adding that value causes overflow, then odds are that it is so close to causing overflow anyway that it will happened regardless of rounding issues.
I must've gotten confused somehow. Looks like overflow actually can't occur at all in the case I had referred to as an "unresolved problem." Since the iSCALE parameter is a signed int (to work around a compiler problem), the worst case for unsigned multiplication of factors on different scales should be:
Code:
uscaled rAlmostSqrtOfTwo = scaled(2).sqrt() - fixp(0.001);
ScaledInt<MAX_INT,    uint> r1 = rAlmostSqrtOfTwo;
ScaledInt<MAX_INT - 1,uint> r2 = rAlmostSqrtOfTwo;
FAssert(r2.MAX() == 2);
r2 *= r1;
FAssert(r2 > fixp(1.99) && r2 < 2);
That only increases an unsigned __int64 variable to ca. 2^63. r2.round() , i.e. (r2.m_i + SCALE / 2u) / SCALE, will overflow, but, in that case, the rounding has been explicitly requested and triggers an assertion in the round function.

Another small realization I've had: When multiplying a ScaledInt on some custom scale by the fixp macro or the global per100 function, scale conversion (inevitably) happens before the actual multiplication. Example:
Code:
typedef ScaledInt<32*1024> custom_scaled;
custom_scaled r = 2;
// Note:          2 * 30% * (32*1024) = 19660.8
r *= per100(30);             // r.m_i = 19648
r = 2;
r *= custom_scaled(30, 100); // r.m_i = 19660
r = 2;
r.mulDiv(30, 100);           // r.m_i = 19661
So I guess one shouldn't use the global functions when working with a custom type, or at least not if high accuracy is important. For addition, r += custom_scaled(30, 100) should do.
 
Back
Top Bottom