7th January 2013
I was asked by a colleague of mine how many combinations are possible for the German tax id. The German tax id is built entirely from eleven digits. Its specification is given in Pruefziffernberechnung, see Steueridentifikationsnummer in the German Wikipedia. Every person reported to live in Germany is assigned one tax id, even babies and children. 20 years after the death of a person the number can be recycled.
The rules for constructing a tax id are as follows:
For example, 4895437120 and 5549267083 are valid tax ids, ignoring the 11th check digit. The first example tax id has 4 as repeating digit, the second has 5 as repeating digit. All other digits do not repeat.
Hint: As we have ten digits altogether, one of them being the repeating digit, so in effect we only have nine remaining digits to choose one of them to be the repeating digit. For example, 9x87054321, nine digits, excluding digit 6, we have nine digits to choose from, e.g., digit x=2. So, position 2 and position 9 now have both digit 2. For 98x7654321, we have only 8 possibilities, not nine, to choose our repeating digit, i.e., we can choose either the first digit (here 9), the fourth (here 7), the fifth (here 6), and so on. We cannot choose the second one (here 8), as this repeating digit was already covered in the case 9x87054321.
To deduce the number of possible combinations nine different cases have to be considered, depending on the repeating digit. First case:
Add to this the second case:
and so on up to the ninth case:
This is
That number is strikingly low, compared to the roughly 82 million population in Germany, see German population.
The chance of getting a valid German tax id (without check digit) out of a complete random number (all digits with equal probability) is therefore \(146,966,400 / 10^{10}\), i.e., 1.47%.
As the derivation of the number of combinations might be a little entangled, it is good to know, whether there is an alternative way to prove that the number above is indeed correct, or at least plausible. For this I have written a short simulation which calculates the probability that a number is indeed a German tax id, see taxidsim.c. The output looks something like this:
~/c/taxidsim -tn100000
...
*9 2 6 7 8 1 5 0 8 4 / 1110111121
*3 5 2 7 0 1 6 9 8 5 / 1111021111
*1 9 5 8 5 7 4 2 3 0 / 1111120111
*9 3 8 0 6 7 2 1 5 7 / 1111011211
Digit Occur. Perc.
0 100488 10.048800
1 100456 10.045600
2 100817 10.081700
3 100026 10.002600
4 99938 9.993800
5 99432 9.943200
6 99475 9.947500
7 99826 9.982600
8 99690 9.969000
9 99852 9.985200
tax ids: 1453 1.453000
Enlarging n on the commandline gives 1.470%, which is quite near to our theoretical value.
taxidsim.c contains a number of unrolled loops, see, e.g., Dongarra/Hinds (1979), thus helping Out-of-order execution. The trick to detect tax ids is, first to count the number of occurences of x[i], i=0(1)9 in r[i], i=0(1)9, and then to count occurences in r[] in rs[]. The test is then
x[0] != 0 && rs[0] == 1 && rs[1] == 8 && rs[2] == 1
i.e., first digit is non-zero, the number of no-occurrences is one, rs[0]=1, eight digits occur exactly one time, the double-occurence is 1, rs[2]=1.
Added 15-Jul-2018: Klaus Koch, via e-mail, pointed out the following method to get the number of combinations.
Five years later: Yes, I can confirm your results. But you can get it much easier: If you drop the restriction non-zero for the first digit, you get 90*10!/2=163.296.000.
(If you google for 163296000 you will get some hits.)
Multiply by 0.9 and you get your number.
Added 25-Jul-2018: Klaus Koch, via e-mail, added the following information:
Wie Sie vielleicht auch schon bemerkt haben, gibt es zwar viele, die über datenschutzrechliche Aspekte der Steuer-ID, aber offenbar nur äußerst wenige, die über die Anzahl möglicher Steuer-IDs nachdenken und sich dazu im Netz äußern. In der Tat teile ich Ihre Überraschung darüber, dass die Anzahl gültiger IDs so erstaunlich gering ist. ... in 2016 [wurde] beschlossen, künftig auch drei gleiche Ziffern zuzulassen, Wikipedia: https://de.wikipedia.org/wiki/Steuerliche_Identifikationsnummer.
Nachdem ich das gelesen hatte, mußte ich natürlich neu rechnen (lassen). Microsoft Excel hat die Binomialkoeffizienten, die jeder unter dem Namen n über k kennt, in der Funktion Kombinationen(n;k) versteckt, also =FAKULTÄT(10)*KOMBINATIONEN(10;A3)/FAKULTÄT(A3-1) wobei in A3 die Zahl gleicher Ziffern steht. Hier die ganze Tabelle:
gleiche Anzahl Ziffern Möglichkeiten 2 163296000 (ab 2007) 3 217728000 (ab 2016) 4 127008000 5 38102400 6 6350400 7 604800 8 32400 9 900 10 10
In der [ersten] Zeile werden Lösungen betrachtet, bei denen eine Ziffer gar nicht vorkommt, in der [zweiten] Zeile werden Lösungen betrachtet, bei denen zwei Ziffern gar nicht vorkommen, u.s.w. ... [Ich] habe meinen PC nachzählen lassen, er brauchte nur 10 Minuten für alle 10-stelligen Zahlen... Entwarnung: er kam zum gleichen Ergebnis. Wegen des Ausschlusses einer führenden Null müssen alle Resultate noch mit 0,9 multipliziert werden. Durch die Änderung in 2016 erhöht sich der Pool beträchtlich.
@Klaus Koch: Thanks for these thoughts and calculations!
Categories: C / C++, programming, tax, mathematics
Tags: , , , , ,
Author: Elmar Klausmeier