1. We have added the ability to collapse/expand forum categories and widgets on forum home.
    Dismiss Notice
  2. All Civ avatars are brought back and available for selection in the Avatar Gallery! There are 945 avatars total.
    Dismiss Notice
  3. To make the site more secure, we have installed SSL certificates and enabled HTTPS for both the main site and forums.
    Dismiss Notice
  4. Civ6 is released! Order now! (Amazon US | Amazon UK | Amazon CA | Amazon DE | Amazon FR)
    Dismiss Notice
  5. Dismiss Notice
  6. Forum account upgrades are available for ad-free browsing.
    Dismiss Notice

Benford's law

Discussion in 'Science & Technology' started by Masquerouge, May 30, 2007.

  1. Masquerouge

    Masquerouge Chieftain

    Joined:
    Jun 3, 2002
    Messages:
    17,790
    Location:
    Mountain View, CA
    So I recently learned about Benford's law.

    I suggest this:
    http://en.wikipedia.org/wiki/Benford's_law

    and this:
    http://www.math.gatech.edu/~hill/publications/cv.dir/1st-dig.pdf

    for a detailed explanation, but the general idea is that the first digits of random numbers from random sets are not uniformely distributed between 1 and 9. On the contrary,
    Leading digit Probability
    1------------30.1%
    2----------- 17.6%
    3----------- 12.5%
    4----------- 9.7%
    5----------- 7.9%
    6----------- 6.7%
    7----------- 5.8%
    8----------- 5.1%
    9----------- 4.6%

    This means that if, for instance, you decided to pick up all the numbers in the front page of various newspapers, thus ending with random numbers from random sets (lottery numbers, temperatures, casualties, etc.), then on average 30.1% of these numbers would start with a 1, 17.6% would start with a 2, and so on...

    The thing I do not understand is why? I can explain what the law is about, but I don't understand why it works. Could someone please explain it to me?
     
  2. History_Buff

    History_Buff Knight of Cydonia

    Joined:
    Aug 12, 2001
    Messages:
    6,529
    Location:
    Calgary, Alberta
    Because not all sets go as high as the nineties?
     
  3. Ayatollah So

    Ayatollah So the spoof'll set you free

    Joined:
    Feb 20, 2002
    Messages:
    4,387
    Location:
    SE Michigan
    I think it's because the size of the measuring unit is arbitrarily chosen. Like, say, a foot (based on the human foot) versus, I dunno, the lengths of various animals. You should expect a quasi-uniform distribution of the logarithms of the values, over a range of many orders of magnitude.

    Suppose the animals' lengths in feet were a uniform linear distribution, instead. That would just be really weird. It would mean that most animals would have to be within an order of magnitude of the size of a blue whale.

    This has always made intuitive sense to me because I think in terms of geometric progressions, orders of magnitude, etc. rather than linear progressions. But I'm not sure I can explain why it makes sense. I read something interesting by a philosopher on this, once, I'll see if I can dig that up.
     
  4. angeleyes

    angeleyes mood indigo

    Joined:
    Oct 12, 2005
    Messages:
    2,300
    Location:
    The Netherlands
    Our counting system starts with 1, so its obvious this will be more used than 2 etc, seems obvious. For example if the street you live in counts up to # 325, than first number:

    1 - 121x (1, 10-19, 100-199)
    2 - 121x (2, 20-29, 200-299)
    3 - 37x (3, 30-39, 300-325)
    4 - 11x (4, 40-49)
    5 - 11x
    6 - 11x
    7 - 11x
    8 - 11x

    etc
     
  5. Masquerouge

    Masquerouge Chieftain

    Joined:
    Jun 3, 2002
    Messages:
    17,790
    Location:
    Mountain View, CA
    Okay I think that's what they're saying, and that's exactly what I don't get. Why should I expect a quasi-uniform distribution of the logarithms, and why having a quasi-uniform distribution of logarithms of the values means that I will end up with 30% of the number starting with 1, 17% starting with 2, etc.?



    I understand your example, but I'm not sure that's what the explanation is - and that's too bad, because this one I understand :)
     
  6. Erik Mesoy

    Erik Mesoy Core Tester / Intern

    Joined:
    Mar 25, 2002
    Messages:
    10,949
    Location:
    Oslo, Norway
    Pick some random numbers of arbitrary size. List the integers going up from 1 to each of those numbers. You'll get sequences looking something like this:

    *1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33
    *1, 2, 3
    *1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47
    *1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

    (I picked 4 random numbers up to 100)

    Amount of numbers produced that begin with 1: 30.
    Amount of numbers produced that begin with 2: 24.
    Amount of numbers produced that begin with 3: 18.
    et cetera...
    Amount of numbers produced that begin with 9: 3.

    If you pick among a worse sample, such as "up to 199", the pattern is stronger. If a lottery has 20000 tickets, more than half of them have a number beginning with a 1.

    Put another way, the amount of numbers between 1 and N inclusive that begin with the digit "1" is always equal to or greater than the amount of numbers between 1 and N inclusive that begin with another digit.

    Does that help?
     
  7. Masquerouge

    Masquerouge Chieftain

    Joined:
    Jun 3, 2002
    Messages:
    17,790
    Location:
    Mountain View, CA
    Yes, it does help, and that was my intuitive explanation, but it doesn't seem to be what the papers are saying.
    For instance, Wiki says:
    I don't understand why real-world measurements are distributed logarithmically, and I don't understand what it means to be distributed logarithmically. Is that a fancy way of saying what you just explained?

    And here I'm completely lost. I don't understand why a logarithmic distribution is scale invariant, and why being scale invariant means that the first digit will be 1 30% of the time.
     
  8. Erik Mesoy

    Erik Mesoy Core Tester / Intern

    Joined:
    Mar 25, 2002
    Messages:
    10,949
    Location:
    Oslo, Norway
    AFAIK, yes. The "why" of being distributed logarithmically is "because they're distributed in the above way, and that's logarithmic".



    Saying that a logarithmic distribution is scale invariant is a bit like saying that the slope of a pyramid is translation invariant, but not under rotation. Pyramids stay the same angle if you move up or down; logarithmic distributions stay patterned if you multiply by some factor.
    The 30% figure (and the others, let me quote) are from the following pattern:
    Log 2 = 0.301029996
    Log 3 = 0.477121255.
    0.477121255-0.301029996 = 0.176091259
    Etc...
    Log 9 = 0.954242509
    1 - 0.954242509 = 0.045757491
     
  9. Masquerouge

    Masquerouge Chieftain

    Joined:
    Jun 3, 2002
    Messages:
    17,790
    Location:
    Mountain View, CA
    Thanks man :) I was frustrated because I understood very well how it worked, but not why - and when you explain it to someone else it helps :)



    :eek: that awesome! Thanks again! :)
     
  10. Ayatollah So

    Ayatollah So the spoof'll set you free

    Joined:
    Feb 20, 2002
    Messages:
    4,387
    Location:
    SE Michigan
    Funny, that explanation you quoted sounds a lot like what that philosopher I mentioned was saying.

    I think Erik answered the 2nd half of your question. To answer the first, take several large arrays of numbers and play around with them - multiply each by a constant.
    Here are some examples of uniform logarithmic distributions:
    1 2 4 8 16 32 64 ....
    1 3 9 27 81 243 729 ...
    And an example of a uniform linear distribution:
    1 2 3 4 5 6 ...
     
  11. macko

    macko Chieftain

    Joined:
    Mar 17, 2009
    Messages:
    1
    But if we switch from decimal to binary numbers, every binary digit will start with 1! Thus, the first figure is 1 in 100% cases!
     
  12. Mise

    Mise isle of lucy

    Joined:
    Apr 13, 2004
    Messages:
    28,319
    Location:
    London, UK
    The applications in Accounting fraud are really amazing! :wow: I shall have to find some way of using this to my advantage, perhaps at work :hmm:

    EDIT: Incidentally, I think the key to the "why" is in this paragraph:
    Having read [3] and skimmed [8] (it speaks in maths which I can't really understand), the above results in the logarithmic distribution of digits (the proof of this is [8]).
     
  13. illram

    illram Chieftain Super Moderator

    Joined:
    Dec 25, 2005
    Messages:
    9,217
    Location:
    San Francisco
    So can I use this to up my chances of winning the lottery?
     
  14. ParadigmShifter

    ParadigmShifter Random Nonsense Generator

    Joined:
    Apr 4, 2007
    Messages:
    21,810
    Location:
    Liverpool, home of Everton FC
    Yeah, logarithms answer the question.

    EDIT: That was directed to the OP rather than illram. Don't buy a lottery ticket :lol:
     
  15. Birdjaguar

    Birdjaguar Entangled Retired Moderator Supporter

    Joined:
    Dec 24, 2001
    Messages:
    29,753
    Location:
    Albuquerque, NM
    How do you see such an application?
     
  16. warpus

    warpus In pork I trust

    Joined:
    Aug 28, 2005
    Messages:
    44,572
    Location:
    Stamford Bridge
    The reason that the distribution is logarithmic is because

    The probability that a number is between 100 and 1000 (logarithm between 2 and 3)
    = The probability that a number is between 10,000 and 100,000 (logarithm between 4 and 5)

    Why?

    Well, this is obviously not always true.. but for many sets of numbers it is a reasonable assumption.. especially for sets of numbers that grow exponentially, like incomes, and stock prices, and sets of numbers we encounter in daily life.

    Why?

    Because the systems we use to measure things are arbitrary.. Take the distribution of all incomes, of all people who live in the U.S. You're going to get a whole bunch of things at the bottom. ($0 - $10,000), then a smaller amount of things a bit higher up ($10,000 - $20,000), then an even smaller amoutn of things a bit higher ($20,000 - $30,000), and so on, and so on.

    But wait! What if you expressed all these incomes in Zimbabwean dollars? or Polish Zloty? Or Euros? Or yen? Well, you'd get the exact same type of distribution.

    I realize that it's not totally obvious why that makes things logarithmic, but that's how it makes sense to me.
     
  17. Knight-Dragon

    Knight-Dragon Unhidden Dragon Retired Moderator

    Joined:
    Jun 25, 2001
    Messages:
    19,958
    Location:
    Singapore
    Moderator Action: Moved to S/T.
     
  18. ainwood

    ainwood Consultant. Administrator

    Joined:
    Oct 5, 2001
    Messages:
    29,973
    It is used in forensic auditing. For example, if someone if making up false invoices and then having the company pay them: People would tend to (say) make lots of fake invoices that are small enough not to arouse suspicion (eg. less than $1000) but large enough to make it worthwhile (ie go for several hundred rather than one hundred).

    The "real" invoices will likely follow benford's law, while the fake ones will distort it, because they are not 'random' (or even pseudo-random).
     
  19. Mise

    Mise isle of lucy

    Joined:
    Apr 13, 2004
    Messages:
    28,319
    Location:
    London, UK
    For me, the interesting part isn't that "exponential things" have first digits distributed logarithmically (that's quite obvious when you're told it!), or that lots of every day things are exponential. The interesting part, for me, is that, when you take a random number from a random distribution -- even ones that don't obey Benford's law, such as uniform distributions or normal distributions -- you end up with a distribution that obeys Benford's law. I find that quite incredible.

    EDIT: It's unfortunate that the wiki only dedicates a single paragraph to this fact, and spends much more time explaining the "exponential" and the "measurement" things, neither of which explain how taking disparate numbers from different newspapers will result in a Benford-distributed set of first digits.

    I can't follow the proof (not being well versed in Statistics, or even Maths anymore), so if anyone has a more "intuitive" description of the proof, I'd love to hear it!

    @Birdjaguar: Do you mean at my work or in Accounting fraud?
     
  20. warpus

    warpus In pork I trust

    Joined:
    Aug 28, 2005
    Messages:
    44,572
    Location:
    Stamford Bridge
    Mise, once you understand that the probabilities have a logarithmic distribution and why, what else is there to understand?
     

Share This Page