I've written a small class for fixed-point arithmetic that I wanted to share. At least as a basis for discussion – maybe we can collaboratively come up with something better.
GitHub.com/f1rpo/.../ScaledInt.h#L51
Edit (20 May): latest version -- pretty much(?) ready for inclusion in other mods
For computations in which large rounding errors are problematic and small ones tolerable, and for computing fractional powers, e.g. sqrt(0.9), both without risking OOS errors in multiplayer games. For background on that, see @Nightinggale's description of open issue #176 in the "We the People" (Colonization) mod, specifically under "Why we can't use floating point calculations."
(My own mod "AdvCiv" actually uses floating-point numbers extensively in AI code, and I've experienced no OOS issues – but synchronicity has only been tested onfourseven (edit: as of 20 May) different PCs. It might be that all recent Intel and AMD processors have the same floating-point behavior, perhaps due to their support for SSE2, but it's better not to rely on this poorly tested hypothesis.)
I've named the class "ScaledInt" (edit 20 May: now "ScaledNum") with one template parameter for the scale factor and another for the integer data type that stores the scaled number. It resembles the scaled_integer class in the Compositional Numeric Library (which isn't compatible with MSVC03).
Speaking of external libraries, we do have boost::rational at our disposal – but that's far less efficient because the denominator can't be resolved at compile time, and I don't think that's worth the greater precision and range. (Not to mention that boost::rational keeps the ratio in a normalized form at all times, which is really slow.)
In the hope that a single scale factor will work well enough in most situations, I've defined
Edit (20 May): Now defined as
Performance: For addition and subtraction, the only overhead compared with regular integer math should be the conversion from (non-scaled) integers that aren't known at compile time to scaled_int (a bitshift) and from scaled_int back to integer (costlier than a bitshift when rounding to nearest). Fast multiplication is a challenge because intermediate results can overflow; with scaled_int, that happens already when squaring 45.5 (because 45.5*1024*45.5*1024 exceeds INT_MAX). As for division, there is a potential for saving some time when applying modifiers that are given as percentages in XML – provided that the CvInfo classes store those modifiers as scaled_int so that the conversion from scale 100 happens only once during XML loading. Also, any hardcoded fractional modifiers can be converted to scaled_int at compile time. Example:
Public interface: Some examples from CvPlayerAI::AI_foundValue:
"scaled_int" is already a lot to type. Perhaps an even shorter type name would be more prudent. For Hungarian notation (to be consistent with BtS code), one can't really use 's' as that's normally for strings. 'x' for "fixed-point" might work; I like 'r' for "rational" best.
(NB: Since scaled_int has a scale factor built in, one should probably start with a smaller constant, e.g. rValue = 100.)
Overloaded operators for assigning int to scaled_int.
Multiply by a fraction that can't be resolved at compile time. Could also write:
Limits as static const members.
The return value of startingPlotDistanceFactor is scaled by 1000. Ideally, it would be returned as a scaled_int, but let's assume that it isn't. The global per100, per1000, per10000 factory functions put the cart before the horse syntactically. Perhaps better to use the two-argument constructor instead:
Since we have a custom data type, we might as well add some conveniences.
Alternatively, one could use a floating point expression:
I guess I'd prefer "scaled_int(1.5)", but the macro needs to have a unique name.
Currently, one could also write return rValue.increasedTo(1).round(), but I think I'll remove increasedTo in order to avoid confusion with the (non-const) increaseTo function.
Edit (20 May): Removed increasedTo
As conversion to integer shouldn't be needed all that frequently, I think it's better not to allow implicit casts from (or assignment of) scaled_int to int. round rounds to the nearest integer. That results in a branch instruction because scaled_int can be negative. In other – potentially more frequently executed – arithmetic operations, I round intermediate results to the nearest integer only if the underlying integer type is unsigned.
Last example (from CvPlayerAI::AI_demandRebukedSneak): Coin flip function
Power function for fractional bases and exponents: Not used in BtS code – presumably because neither the standard library nor Boost include such a function. For AI code, I find it pretty indispensable. (In AdvCiv, I currently call std:: pow in more than 50 places.) So I've written a function ScaledInt:: pow(ScaledInt), which might be good enough for my purposes, but it uses an 8 KB lookup table, which I'm pretty sure is far from optimal performance-wise, and accuracy is also subpar.
To be done (perhaps): Edit (Mar 2): I won't be updating this list; see "tbd." comments throughout ScaledNum.h instead.
GitHub.com/f1rpo/.../ScaledInt.h#L51
Edit (20 May): latest version -- pretty much(?) ready for inclusion in other mods
For computations in which large rounding errors are problematic and small ones tolerable, and for computing fractional powers, e.g. sqrt(0.9), both without risking OOS errors in multiplayer games. For background on that, see @Nightinggale's description of open issue #176 in the "We the People" (Colonization) mod, specifically under "Why we can't use floating point calculations."
(My own mod "AdvCiv" actually uses floating-point numbers extensively in AI code, and I've experienced no OOS issues – but synchronicity has only been tested on
I've named the class "ScaledInt" (edit 20 May: now "ScaledNum") with one template parameter for the scale factor and another for the integer data type that stores the scaled number. It resembles the scaled_integer class in the Compositional Numeric Library (which isn't compatible with MSVC03).
Speaking of external libraries, we do have boost::rational at our disposal – but that's far less efficient because the denominator can't be resolved at compile time, and I don't think that's worth the greater precision and range. (Not to mention that boost::rational keeps the ratio in a normalized form at all times, which is really slow.)
In the hope that a single scale factor will work well enough in most situations, I've defined
typedef ScaledInt<1024,int> scaled_int;
typedef ScaledInt<1024,uint> scaled_uint;
This way, scaled_int has a range of ca. -2 mio. to +2 mio. A much smaller range would be awkward in AI computations, but perhaps 2048 would be a better standard scale factor.typedef ScaledInt<1024,uint> scaled_uint;
Edit (20 May): Now defined as
typedef ScaledNum<2048,int> scaled;
typedef ScaledNum<2048,uint> uscaled;
typedef ScaledNum<2048,uint> uscaled;
Performance: For addition and subtraction, the only overhead compared with regular integer math should be the conversion from (non-scaled) integers that aren't known at compile time to scaled_int (a bitshift) and from scaled_int back to integer (costlier than a bitshift when rounding to nearest). Fast multiplication is a challenge because intermediate results can overflow; with scaled_int, that happens already when squaring 45.5 (because 45.5*1024*45.5*1024 exceeds INT_MAX). As for division, there is a potential for saving some time when applying modifiers that are given as percentages in XML – provided that the CvInfo classes store those modifiers as scaled_int so that the conversion from scale 100 happens only once during XML loading. Also, any hardcoded fractional modifiers can be converted to scaled_int at compile time. Example:
Spoiler :
Optimized assembly for two lines of K-Mod code in CvPlayerAI::AI_yieldWeight:
If the weight variable is turned into a scaled_uint that doesn't check for overflow, the assembly becomes:
If overflow is prevented through extension to 64 bits, we get:
Measurements using the TSC register suggest that this is at best marginally faster than the BtS code: GitHub.com/f1rpo/.../ScaledIntTest.cpp#L260.
The fastest method I've found that avoids overflow in a signed rWeight variable (type scaled_int), is the MulDiv function from WinBase.h:
That is, the MulDiv call doesn't get inlined and will use div instead of a bitshift. This seems to be about three times slower than the BtS code.
One could introduce a third template parameter bUNSAFE_MULT that disables overflow handling in release builds and define
Code:
iWeight *= 275;
lea eax,[esi*8] # eax = iWeight * 8
sub eax,esi # eax -= iWeight
add eax,eax # eax *= 2
add eax,eax # eax *= 2
add eax,eax # eax *= 2
sub eax,esi # eax -= iWeight
lea ecx,[eax+eax*4] # ecx = eax + eax * 4
iWeight /= 100;
mov eax,51EB851Fh # eax = magicConstant
imul ecx # eax *= ecx
sar edx,5 # edx >>= 5
mov esi,edx # esi = edx
shr esi,1Fh # esi >>= 31
add esi,edx # esi += edx
Code:
rWeight *= per100(275);
lea edx,[esi+esi*2] # edx = 3 * rWeight.m_i
add edx,edx # edx *= 2
add edx,edx # edx *= 2
sub edx,esi # edx -= rWeight.m_i
shl edx,8 # edx <<= 8
add edx,200h # edx += 512 (round to nearest)
shr edx,0Ah # edx >>= 10
mov esi,edx # eWeight.m_i = edx
Code:
rWeight *= per100(275);
mov eax,esi # eax = rWeight.m_i
mov ecx,0B00h # ecx = 2816 (=2.75*1024)
mul eax,ecx # eax *= ecx
add eax,200h # eax += 512
adc edx,0 # edx += CF (carry flag)
mov esi,eax # rWeight.m_i = eax
shrd esi,edx,0Ah # rWeight.m_i >>= 10 (with edx shifted in from the left)
shr edx,0Ah # edx >>= 10
The fastest method I've found that avoids overflow in a signed rWeight variable (type scaled_int), is the MulDiv function from WinBase.h:
Code:
rWeight *= per100(275);
push 400h # 1024
push 0B00h # 2816 (=2.75*1024)
push esi # rWeight.m_i
call dword ptr [__imp__MulDiv@12]
One could introduce a third template parameter bUNSAFE_MULT that disables overflow handling in release builds and define
typedef ScaledInt<1024,uint,true> modifier_t;
specifically for fast multiplication of modifiers. Though I can't actually think of BtS/AdvCiv code where that would be important.
Public interface: Some examples from CvPlayerAI::AI_foundValue:
Spoiler :
Code:
int iValue = 1000;
-->
scaled_int rValue = 1000;
(NB: Since scaled_int has a scale factor built in, one should probably start with a smaller constant, e.g. rValue = 100.)
Code:
iValue += GC.getMap().maxStepDistance();
-->
rValue += GC.getMap().maxStepDistance();
Code:
iValue *= (1 + 4 * pPlot->getMinOriginalStartDist());
iValue /= (1 + 2 * GC.getMapINLINE().maxStepDistance());
-->
rValue.mulDiv(1 + 4 * pPlot->getMinOriginalStartDist(),
1 + 2 * GC.getMap().maxStepDistance());
rValue *= scaled_int(1 + 4 * pPlot->getMinOriginalStartDist(),
But that wouldn't always be as precise as the above.1 + 2 * GC.getMap().maxStepDistance());
Code:
int iMinDistanceFactor = MAX_INT;
-->
scaled_int rMinDistanceFactor = scaled_int::MAX;
Code:
int iClosenessFactor = kOtherPlayer
.startingPlotDistanceFactor(pPlot, getID(), iMinRange);
-->
scaled_int rClosenessFactor = per1000(kOtherPlayer
.startingPlotDistanceFactor(pPlot, getID(), iMinRange));
scaled_int rClosenessFactor(kOtherPlayer
.startingPlotDistanceFactor(pPlot, getID(), iMinRange), 1000);
Code:
iMinDistanceFactor = std::min(iClosenessFactor, iMinDistanceFactor);
-->
rMinDistanceFactor.decreaseTo(rClosenessFactor);
Code:
iMinDistanceFactor = std::min(1500, iMinDistanceFactor);
-->
rMinDistanceFactor.decreaseTo(per100(150));
rMinDistanceFactor.decreaseTo(fixp(1.5));
So long as the expression gets resolved at compile time, it's safe for multiplayer. The fixp macro ensures that by forwarding the expression as a template parameter:
Code:
#define fixp(dConstExpr) \
( // ... overflow checks ...
scaled_int::fromRational<(int)( \
(dConstExpr) * 10000 + ((dConstExpr) > 0 ? 0.5 : -0.5)), 10000>())
Code:
return std::max(1, iValue);
-->
return scaled_int::max(rValue, 1).round();
Edit (20 May): Removed increasedTo
As conversion to integer shouldn't be needed all that frequently, I think it's better not to allow implicit casts from (or assignment of) scaled_int to int. round rounds to the nearest integer. That results in a branch instruction because scaled_int can be negative. In other – potentially more frequently executed – arithmetic operations, I round intermediate results to the nearest integer only if the underlying integer type is unsigned.
Last example (from CvPlayerAI::AI_demandRebukedSneak): Coin flip function
Code:
if (kGame.getSorenRandNum(100, "AI Demand Rebuked")
< GC.getInfo(getPersonalityType()).getDemandRebukedSneakProb())
-->
if (per100(GC.getInfo(getPersonalityType()).getDemandRebukedSneakProb())
.bernoulliSuccess(kGame.getSorenRand(), "AI Demand Rebuked"))
To be done (perhaps): Edit (Mar 2): I won't be updating this list; see "tbd." comments throughout ScaledNum.h instead.
Spoiler :
* Better power function, perhaps based on this paper from 2013: (Not free, but I have a copy.)
A Division-Free Algorithm for Fixed-Point Power Exponential Function in Embedded System
* Logarithm I use far less often, but a log function would still be nice to have. The accepted answer to this Stack Overflow question looks promising: link
* The code would look nicer if operators were inherited from boost/operators.hpp. Those functions take and return all parameters by reference though; that could make a difference for performance.
* My test assertions in ScaledIntTest.cpp aren't thorough enough (poor coverage).
* A Natvis file showing the ratio m_i/SCALE as a decimal fraction would be helpful for debugging. (I haven't written one because I still use VS2010; Natvis requires VS2012.)
A Division-Free Algorithm for Fixed-Point Power Exponential Function in Embedded System
* Logarithm I use far less often, but a log function would still be nice to have. The accepted answer to this Stack Overflow question looks promising: link
* The code would look nicer if operators were inherited from boost/operators.hpp. Those functions take and return all parameters by reference though; that could make a difference for performance.
* My test assertions in ScaledIntTest.cpp aren't thorough enough (poor coverage).
* A Natvis file showing the ratio m_i/SCALE as a decimal fraction would be helpful for debugging. (I haven't written one because I still use VS2010; Natvis requires VS2012.)
Last edited: