Hasn't this article made the rounds here before? Or am I thinking of something else?
Anyway, it was the subject of a brief but intense discussion shortly after its publication. The caveats list at the end of the article addresses it in some detail. Something to keep in mind is that the article was explicitly designed as an "interesting thought experiment" rather than a scholarly contribution. That's fair! Let's see if it works based on that.
First, the concept. The author used a data set derived from Wikipedia's collection of battles and casualties taken. This is unquestionably the easiest set to analyze and convert into usable data, but it is a deeply flawed set, as the author acknowledges later. It is extremely incomplete, with some incompleteness based on geographical bias (engagements outside Europe and to a lesser extent modern America are hard to find). It also does not compare like with like, because Wikipedia's definition of a "battle" is highly mutable and inconsistent. This is mostly a problem with the modern data. Starting in 1864, it becomes harder to tease out individual battles because of the altering nature of warfare. This is directly reflected in the data, as the author notes that his system seems to rate modern generals much lower; he does not, however, seem to understand or address why that is the case.
The limitations of the data are real and severe, but the underlying concept - rating tactical acumen based on an up/down number for victory - is problematic. Wikipedia's numbers for casualties are often wrong and could be improved through further research, but that they were not incorporated at all is a severe flaw. (I would also suggest normalizing casualties based on era-specific rates, because the changing nature of warfare has led to dramatically different average casualty numbers over the last three millennia.) The up/down number for victory is also tricky because what a victory actually meant is hard to define in many cases.
After each general receives a rating for each battle, the author totals up the ratings for the battles. This is after the fashion of WAR (Wins Above Replacement, the baseball stat) and it is the most obviously flawed aspect of the whole thing. WAR is a counting stat; longer careers at a reasonably high level are generally better than short careers at an extremely high peak, because the total number of wins generated is higher. This makes sense for baseball, because "health is a skill" - and a quite relevant one, for managers wanting to rack up wins and make the postseason - but it does not translate over to warfare very well!
Generals don't have 162-battle seasons. Some of them have short careers by virtue of mostly leading during a time of peace. Some of them have short careers in the Wikipedia-derived data set because the data set isn't very good. Some of them have short careers because their style of warfare did not privilege set-piece battles. As we will see, this leads directly to some of the more counterintuitive and flawed results of the thought experiment.
Finally, there is the concept of the "replacement level", which the author admits isn't very useful in the caveats after spending entirely too much time on it in the article. I would agree that it's not very useful. First of all, he doesn't really use the equivalent of a "replacement level player" as in baseball, but rather the mean player skill, which is very much not the same thing. (Replacement-level players are significantly worse than the mean player in MLB, because they're basically the guys left over after the managers have already assembled their teams.) That's mostly just a nomenclature issue, though. The bigger issue is all of the data that the Wikipedia battle boxes don't actually cover. They don't cover technological differences, differences in troop quality or training, differences in firepower available (usually), differences in terrain effects, and so on. The author admits these omissions and suggests them for further investigation, and points out that the model's conclusions are sometimes difficult to understand because of these omissions!
On to specific conclusions, then:
Unsurprisingly, Napoleon Bonaparte ranks at the top. There's an obvious reason for this. Napoleon's career coincided with an era in warfare that strongly favored fighting set-piece battles
and during which set-piece battles were very thoroughly recorded, meaning he has a very large data set; for counting stats like WAR, that rates him very highly indeed. Only tactical engagements are considered, meaning that Napoleon's most egregious military failures, in Spain, the Levant, and Russia, don't enter the equation. But there's nothing wrong with this conclusion. In the highly subjective discussion of best generals, Napoleon is certainly a valid answer. There is a reason Clausewitz labeled him "the god of war". Napoleon possessed remarkable gambler's instincts and an excellent eye for terrain, and he was an aggressive troop leader at the tactical and operational levels. You could do worse than call him the greatest tactician of all time, even if his gambling did eventually catch up to him. The author touts Napoleon's extreme outlier status as a mark of his troop-leading ability, but I would suggest that it was rather a combination of troop-leading ability and these other factors.
As you can see, the facts that a) we're using a counting stat and b) focusing on tactical engagements define the nature of the conclusions. Other warlords that rank highly according to these criteria are Caesar and Alexander, both of whom fought a lot of battles and won a lot of them, and both of whom were sufficiently famous and whose careers were well-recorded enough to result in lots of Wikipedia battleboxes. No surprises there. They are also highly regarded commanders from military history, so again: nothing counterintuitive about that.
One of the conclusions that I have mixed feelings about is the model's extremely low rating of Robert E. Lee. The author suggests that, although Lee was "saddled" with severe disadvantages, his reputation as a tactician is "likely undeserved". I fully agree that there is a particular group of individuals who have inflated Lee's abilities beyond all reasonable levels. Many recent scholars of the war have chosen to focus on Lee's Virginia-centric military policy (not reflected in the ratings) and the high relative casualties his army absorbed (also not reflected in the ratings). But the notion of a "replacement general" here is problematic. Based on those disadvantages, how well would a replacement-level general - the likes of Johnston or Beauregard, in this case - done? Probably not very well! The Army of Northern Virginia was repeatedly saved from total disaster in the Overland Campaign by Lee's rapid responses and tactical maneuvers. He devised audacious plans that often rebounded to advantage. It is
extremely hard to imagine another likely candidate in the same scenario, with the same army and same opponents, succeeding even half as well as Lee. This conclusion points up some of the goofier omissions of the model - casualties on both sides not considered, and the deeply flawed concept of the "replacement level".
A conclusion that I believe to be totally unfounded is the model's conclusion on Pyrrhos of Epeiros, an ancient general mentioned in the Hannibal quotation from the beginning. The author finds it difficult to understand why Pyrrhos was rated so highly by Hannibal and suggests that the "Pyrrhic victory" - absorbing so many casualties that even victory becomes a defeat - should mean he ought to be rated even
lower.
This is because the data set for Pyrrhos is so disastrously incomplete! Only his three battles against the Romans are considered in the model - not his conquests of Macedonia, nor his war in Sicily, nor his battles against the various other Successors in southern Greece.
Of those three battles, one is correctly rated a victory for Pyrrhos (Battle of Herakleia), one is correctly rated a defeat (Battle of Beneventum), and one is inexplicably described as a defeat despite being a victory (Battle of Asculum), presumably due to Roman propaganda that Pyrrhos' army absorbed too many casualties despite that, uh, not being the case. Pyrrhos held the field at Asculum and inflicted significantly higher casualties on the Roman army than his own army absorbed, yet Asculum often goes down as a "Pyrrhic victory" (read: defeat) due to Roman efforts to argue that Pyrrhos took so many casualties that he was unable to accomplish his goals. Even if this were true (it isn't), it's an odd time to suddenly decide that operational and strategic concerns matter in this alleged discussion of tactical prowess
and only tactical prowess.
The author touts Moshe Dayan as one of the greater modern military leaders, and conversely points out that George S. Patton Jr. was not particularly highly rated. This makes a lot of sense when you look at the author's data, in which
Patton is credited with "Operation Torch" (as a single battle) and "Battle of the Bulge" (as a single battle??!??!?!?!?) and...that is it. Conversely,
Moshe Dayan is credited with...the 1948, 1956, 1967, and 1973 wars. In their entirety. The data set sucks. Even going by the limitations of the nature of warfare - Napoleon's era privileging tactical engagements defined as "battles" and modern "battles" usually being more like entire campaigns - this is a terrible list. The author suggests that modern generals are underrated in the model because of the changing nature of warfare and the decreasing participation of generals in various battles. The changing nature of command in an era of million-man armies is certainly a relevant consideration, but not even remotely the whole picture. I would also point out the changing nature of battle itself (transitioning to periods of more or less constant fighting for much longer times on much larger fronts) and also the fact that the data set apparently thinks that George Patton didn't command troops in Sicily, Normandy, Lorraine, or southern Germany. Frankly, the generals rating system based on tactical considerations for discrete, tactical-only engagements falls apart after about 1880 or so, and arguably before that (the Overland Campaign, for example).
Anyway. The article. The concept isn't that bad, but even as a thought experiment it isn't very useful because of a highly flawed data set and the problematic way that "replacement level" is described. There aren't really many valid conclusions here that military historians haven't already reached through other means.
Usually, authors attempt to employ data science and statistics to
add rigor to a subject in which rigor was severely lacking. That's a laudable goal! I agree that some aspects of military history could use more rigor and closer attention to the numbers (although only the worst of pop-historians ever try to avoid statistics entirely). However, this attempt generally applied
less rigor than is now standard in the field. Even on its own terms, I can't but think that it wasn't a very good attempt.