Only 17% of all 64-bit Integers are products of two 32-bit integers

> I find it interesting to consider that if you pick a value at random, it will usually fail! That is, most 64-bit integers cannot be written as the product of two 32-bit integers.

While I find the 17% number interesting to think about, "most" is far less interesting. Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63. That's a hair's breadth away from "most" already, and considering even a tiny amount of overlapping results gets you there.

What gets interesting is actually trying to quantify the overlapping results.

> You might be able to come up with a more efficient algorithm.

Challenge accepted. Suppose we want to know the answer to 3 decimal places (so we'd match the headline). And suppose I allow my algorithm to be wrong one in a thousand times ("probably approximately correct").

Then sample some constant number C of random 64 bit integers. Run the following algorithm which separates each random sample into one of three classes: Y (has 32 but factors), N (does not have 32 bit factors), U (unknown).

Check if prime using probabilistic miller rabin. (Error prob goes to zero exponentially fast). If prime, return N. If it's not a prime, then run T steps of pollard rho to determine whether the number has 32 but factors; return Y,N, or U depending of the factors found up to step T.

The key observation is that T can be chosen to make the UNKNOWN class very small (with high probability), and so our estimate should rapidly converge to 17%Y, 83%N, ~0.001%U

For fixed error tolerance, this would run in roughly a constant number of iterations, independent of N.

I dream of a future where all 64-bit integers are products of 32-bit integers. Together, we can change math for the better.

There are about 4 billion 64 bit integers for each 32 bit integer.

The chance of a random 64 bit integer being a 32 bit integer is 0.0000000233 %

The chance of a random 64 bit integer being a product of two 32 bit integers is 17%

Nice

There is a cute argument (I think it is due to Erdos) that, asymptotically, 0% of the integers in [0,n^2] appears in the "n by n multiplication table":

By Erdos-Kac, almost all integers of size about n^2 have about log(log(n^2)) ~ log(log(n)) prime factors. However, almost all integers in the multiplication table have about 2*log(log(n)) prime factors.

Kevin Ford gets much more precise asymptotic estimates.

This just seems like an expansion of prime numbers to includes factors in the 2^33+ range. Basically you're calculating if a number is prime but stopping the check when the factors go above 2^32.

> the proportion of all 2n-bit values that can be generated by the product of two n-bit values goes to zero as n becomes large. This means that if you have, say, 10000000-bit integers multiplying 10000000-bit integers, you’d expect relatively few 100000000000000-bit integers to be produced.

That should be "relatively few 20000000-bit integers", right?

This feels like a underlying property that contributes to of Benford's Law[0]. That is, most numbers we measure and record are the results of various independent (addition) and dependent (multiplication) factors stacking together, and we observe this property in the distribution of them.

[0]: https://en.wikipedia.org/wiki/Benford%27s_law

This is something I had thought about some time back where I was thinking about the feasibility of somehow using the upper and lower registers inside a multiplier as general purpose storage for fun / seeing if you could make them more compact.

Anyway here is a fun pattern you get when you multiply 8 bit unsigned integers. Not all pairs of (upper bits, lower bits) are reachable, and it has a lot of distinct patterns.

https://i.imgur.com/Gb3HDR0.png

(Should I host the image on GitHub Gists so it doesn't vanish?)

I must be missing something. Aren’t ~50% of 64-bit integers the product of the number 2 and another 32-bit integer?

Does it actually matter for hash uniformity, though?

Well, that is entirely not surprising. Pretty sure people writing not terrible hash functions figured it decades ago

So you're better of using a 8x8->16 widening multiplication SIMD instruction or even just a multi register TBL/TBX instruction?

If this seems counterintuitive, consider that only about a third of the two-digit numbers ({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 24, 25, 27, 28, 30, 32, 35, 36, 40, 42, 45, 48, 49, 54, 56, 63, 64, 72, 81}) can be written as the product of two one-digit numbers.

> I find it interesting to consider that if you pick a value at random, it will usually fail! That is, most 64-bit integers cannot be written as the product of two 32-bit integers.

What gets interesting is actually trying to quantify the overlapping results.

Ok so I think I understand your insight: the number of 64 bit numbers you can get from multiplying two 32 bit numbers is the number of distinct results. I guess it follows that, of those 64 bit integers that can be written as the product of 2 32 bit ints, on average they can be factored into 32 bit ints 6 different ways. The only ones that could possibly be written as such a product exactly one way are the prefect squares.

Yeah the number sounds a lot less impressive if you say that you only get 2^61.44 integers out of 2^64. In other words, a 4% entropy loss.

Information quantities are more meaningfully expressed in number of bits.

All the primes above 2^32 are out, but that accounts for only two point something percent.

> Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63.

Not sure I understand.

Adding two 32 bit integers takes you to 33 bit integers. (1111 + 1111 = 11110).

Addition doesn't care about order, so you're instantly cutting 2^33 possibilities down to 2^32. Or so is your argument. But in reality you can reach nearly all of those 2^33 numbers.

Why does order matter?

Whether a 64-bit number can be written as the product of two 32-bit ones depends only on the prime factors of the 64-bit number - it's a property of the number itself, and apparently 17% of 64-bit numbers have this property.

A lot of the remaining is multiples of 4, which you can either get from having a 2 in both factors or a 4 in one (multiples of 9 are similar).

... or just considering the even numbers almost all of them are 2 x N where N>2^32 and that gets you to within a hair of "most" and if you add in the odd thirds for which the same is true you get a bound of 2/3 - epsilon.

> While I find the 17% number interesting to think about, "most" is far less interesting. Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63. That's a hair's breadth away from "most" already

It's much worse than that. It's difficult for a 64-bit product to have the high bit set if the multiplicands are both no larger than 32 bits.

This just seems like an expansion of prime numbers to includes factors in the 2^33+ range. Basically you're calculating if a number is prime but stopping the check when the factors go above 2^32.

Having a prime factor greater than 2^32 accounts for about 80% of the 64-bit integers that can’t be expressed as a product of 32-bit integers. But it’s not the only way; you can also have three prime factors in the range (2^16, 2^32), for instance.

Well, technically yes, but 'stopping the factors at 32 bits' is a plenty interesting constraint because it excludes all 64 bit composite numbers that have at least one factor above 2^32.

You have to redo the math to make the constraint work.

> You might be able to come up with a more efficient algorithm.

The key observation is that T can be chosen to make the UNKNOWN class very small (with high probability), and so our estimate should rapidly converge to 17%Y, 83%N, ~0.001%U

For fixed error tolerance, this would run in roughly a constant number of iterations, independent of N.

I dream of a future where all 64-bit integers are products of 32-bit integers. Together, we can change math for the better.

There is a cute argument (I think it is due to Erdos) that, asymptotically, 0% of the integers in [0,n^2] appears in the "n by n multiplication table":

Kevin Ford gets much more precise asymptotic estimates.

[0]: https://en.wikipedia.org/wiki/Benford%27s_law

Anyway here is a fun pattern you get when you multiply 8 bit unsigned integers. Not all pairs of (upper bits, lower bits) are reachable, and it has a lot of distinct patterns.

https://i.imgur.com/Gb3HDR0.png

(Should I host the image on GitHub Gists so it doesn't vanish?)

So you're better of using a 8x8->16 widening multiplication SIMD instruction or even just a multi register TBL/TBX instruction?

Does it actually matter for hash uniformity, though?

Well, that is entirely not surprising. Pretty sure people writing not terrible hash functions figured it decades ago

It's much worse than that. It's difficult for a 64-bit product to have the high bit set if the multiplicands are both no larger than 32 bits.

Yeah the number sounds a lot less impressive if you say that you only get 2^61.44 integers out of 2^64. In other words, a 4% entropy loss.

Information quantities are more meaningfully expressed in number of bits.

A lot of the remaining is multiples of 4, which you can either get from having a 2 in both factors or a 4 in one (multiples of 9 are similar).

Even if you have 32-bit factors the number may not be the product of two 32-bit numbers. For example 2^62*3 cannot be split as either (2^32, 2^30*3) or (2^31, 2^31*3). In both cases one factor does not fit in 32 bits.

Indeed, but justice requires that we recursively continue all the way to the base case, until all 32-bit integers are products of 16-bit integers, all 16-bit integers are products of 8-bit integers, all 8-bit integers are products of 4-bit integers, all 4-bit integers are products of 2-bit integers, and all 2-bit integers are products of 1-bit integers. Only when we have reach all the way down that list to the very, very smallest of the numbers around us and brought justice to them will the future be able to arrive. I literally can not wait for that day.

Cryptographers hate this trick

Why stop there? We can dream of a future where math is bent to our will [0] for the betterment of all mankind!

0: https://en.wikipedia.org/wiki/Indiana_pi_bill

That would require multiplication to be non-commutative, right?

1 + 1 = 3 (for sufficiently large values of 1)

Maybe we can reach there by using integers as fixed point decimals?

There should be a law!

I upvoted you, not because I think your joke is particularly great, but I hate that HN has this tendency to downvote comments that are clearly meant as a humorous contribution. And I get it, no-one wants HN to turn into Reddit. I also understand that not every joke lands. But I just think it's unnecessary to downvote, you could simply ignore.

They address this argument in the blog.

I must be missing something. Aren’t ~50% of 64-bit integers the product of the number 2 and another 32-bit integer?

Going from 32 bits to 64 bits doesn't double the range (that would be adding 1 bit), it squares the range.

I don’t think so, because that only gets you up to 2x2^32, which is nowhere near halfway to 2^64

No. 50% of them are the product of 2 and a 63-bit integer.

There are about 4 billion 64 bit integers for each 32 bit integer.

The chance of a random 64 bit integer being a 32 bit integer is 0.0000000233 %

The chance of a random 64 bit integer being a product of two 32 bit integers is 17%

Nice

There are about 18.446 quintillion more 64-bit integers than 32-bit integers.

The chance of a random 64-bit integer matching some pair of 32-bit integers is a 100%, though.

Or, the odds of a random 64-bit integer being a 32-bit integer are the same as you or me guessing a random 32 bit integer.

Wonder what the limit is as you add more 32 bit integers to the product. Just the primes over 32 bit?

That should be "relatively few 20000000-bit integers", right?

Perhaps it's binary.

Why does order matter?

All the primes above 2^32 are out, but that accounts for only two point something percent.

> Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63.

Not sure I understand.

Adding two 32 bit integers takes you to 33 bit integers. (1111 + 1111 = 11110).

Addition doesn't care about order, so you're instantly cutting 2^33 possibilities down to 2^32. Or so is your argument. But in reality you can reach nearly all of those 2^33 numbers.

Well, technically yes, but 'stopping the factors at 32 bits' is a plenty interesting constraint because it excludes all 64 bit composite numbers that have at least one factor above 2^32.

You have to redo the math to make the constraint work.

Cryptographers hate this trick

Why stop there? We can dream of a future where math is bent to our will [0] for the betterment of all mankind!

0: https://en.wikipedia.org/wiki/Indiana_pi_bill

That would require multiplication to be non-commutative, right?

where is the graph and the theorem for integers of n bits, with n going to infinity?

The input space is 32 + 32 = 64 bits. The output space is 64 bits. So the best you can do is an 1-to-1 mapping.

However, since a * b = b * a, our input space has a lot of duplicate outputs. So from this alone you can conclude roughly half of the output space must be uncovered by any input pair, simply because there aren't enough input pairs.

But also all of their multiples. I suspect that those account for the vast majority.

Concatenating arbitrary 32 bit ints covers all possible 64 bit ints. So the space of all pairs of 32 bit ints is in bijection with 64 bit ints.

Commutativity introduces a relation on pairs of 32 bit ints (a,b) ~ (b,a), which accounts for one bit of information. Thus, at most 50% of 64bit ints show up as products of 32 bit ints.

The 2^64 in gps argument comes from the number of pairs of 32 bit numbers, not from the upper bound of multiplying two 32 bit numbers. So for the addition case the symmetry argument is still only good enough to get you down to about 2^63, which doesn't help you at all because you have much stronger information from the upper bound.

Addition in this case is cutting from 2^64 to 2^33-1.

The 2^64 number is the number of inputs. For an operation which is commutative, you expect the outputs to be 2^63+2^32 or smaller, since you’ve introduced symmetry.

It's a bit more subtle than that -- most n>2^32 are not prime in which case 2 x n has more factorizations you would have to check.

(Just by way of example, for n=2^33, 2n=2^34 but also =2^17*2^17)

More precisely one in the range (a,2^32) and two in the range (2^32/a, 2^32). But if the latter have many duplicate prime factors it's worse.

1 + 1 = 3 (for sufficiently large values of 1)

Enough of this divided binary world, we are all one

It helps if you take the limit of 1 going towards 1.5.

Most 1s won't go towards 1.5, but sometimes you're lucky.

I thought you were making a joke but if we're assuming that the 1's are being rounded or truncated before the final value cake is produced I guess you are right.

Addition in this case is cutting from 2^64 to 2^33-1.

The 2^64 number is the number of inputs. For an operation which is commutative, you expect the outputs to be 2^63+2^32 or smaller, since you’ve introduced symmetry.

It's a bit more subtle than that -- most n>2^32 are not prime in which case 2 x n has more factorizations you would have to check.

(Just by way of example, for n=2^33, 2n=2^34 but also =2^17*2^17)

Maybe we can reach there by using integers as fixed point decimals?

They address this argument in the blog.

There should be a law!

More precisely one in the range (a,2^32) and two in the range (2^32/a, 2^32). But if the latter have many duplicate prime factors it's worse.

where is the graph and the theorem for integers of n bits, with n going to infinity?

The math was linked in the article

https://arxiv.org/pdf/1908.04251

The input space is 32 + 32 = 64 bits. The output space is 64 bits. So the best you can do is an 1-to-1 mapping.

OK - thanks. I must have misunderstood what the other poster was saying, since I thought they were objecting to the "most" characterization.

But also all of their multiples. I suspect that those account for the vast majority.

Each x is prime with probability 1/ln(x), each x has M/x multiples less than M, as a fraction of M that is just 1/x. Together that makes 1/(x ln(x)) with the indefinite integral ln(ln(x)). If we plug in 2^32 and 2^64, we get ln(2). So about 69.3 % of all 64 bit integers should have a prime factor larger than 2^32 and therefore not be the product of two 32 bit integers.

Concatenating arbitrary 32 bit ints covers all possible 64 bit ints. So the space of all pairs of 32 bit ints is in bijection with 64 bit ints.

Commutativity introduces a relation on pairs of 32 bit ints (a,b) ~ (b,a), which accounts for one bit of information. Thus, at most 50% of 64bit ints show up as products of 32 bit ints.

Ah, fair enough, thanks everyone. So basically the argument is if that we have a deterministic function taking a pair (x_1, x_2) with x_i in X with |X| = M, then the function can produce at most M^2 outputs. And knowing that the function is symmetric cuts it down to M(M+1)/2. (Which is still far bigger than the 2M in my addition analogy.) Cheers.

Except the perfect squares don't reduce by half, so it's not quite 50% but it's very close.

I don’t think so, because that only gets you up to 2x2^32, which is nowhere near halfway to 2^64

Going from 32 bits to 64 bits doesn't double the range (that would be adding 1 bit), it squares the range.

No. 50% of them are the product of 2 and a 63-bit integer.

The chance of a random 64-bit integer matching some pair of 32-bit integers is a 100%, though.

Or, the odds of a random 64-bit integer being a 32-bit integer are the same as you or me guessing a random 32 bit integer.

Perhaps it's binary.

Enough of this divided binary world, we are all one

It helps if you take the limit of 1 going towards 1.5.

Most 1s won't go towards 1.5, but sometimes you're lucky.

I thought you were making a joke but if we're assuming that the 1's are being rounded or truncated before the final value cake is produced I guess you are right.

"Ignore" is one of those things that sounds like it's a neutral choice but really isn't in practice - it's still just saying "can only ever be positively pressured". IMO people shouldn't go as far as flag though, at the very least, and if it's already at the bottom of the sort there is no sense dumping on it further.

My current comment itself, for instance, also doesn't really add anything to the discussion about the article and I'd have no expectation people leave it from going negative. Maybe the will, maybe they won't, but there is no reason to expect they should in principle of me loving tangents :D.

There are about 18.446 quintillion more 64-bit integers than 32-bit integers.

True, but there are as many 64-bit integers as pairs of 32-bit integers.

Therefore the fact that relatively few 64-bit numbers are products of 32-bit integers means that a lot of pairs of 32-bit integers give by multiplication the same product.

I think they meant to write "There are about 4 billion TIMES more 64 bit integers than 32 bit integers".

There are about 2^64 more 64-bit integers than 32-bit integers.

Wonder what the limit is as you add more 32 bit integers to the product. Just the primes over 32 bit?

If you're allowed to multiply as many 32-bit numbers as you want, the only numbers you won't be able to achieve by so doing are those with any prime factor larger than 2^32.

This is more than just the prime numbers. For example, a 41-bit prime can be multiplied by 16 and it will still fit into 64 bits.

OK - thanks. I must have misunderstood what the other poster was saying, since I thought they were objecting to the "most" characterization.

The math was linked in the article

https://arxiv.org/pdf/1908.04251

Except the perfect squares don't reduce by half, so it's not quite 50% but it's very close.

True, but there are as many 64-bit integers as pairs of 32-bit integers.

Therefore the fact that relatively few 64-bit numbers are products of 32-bit integers means that a lot of pairs of 32-bit integers give by multiplication the same product.

That seems intuitively true given that most 32-bit numbers are composite, so if you have

X = ab and aY < 2^32 and bY < 2^32:

X × Y = X/a × aY = X/b × bY = Y × X = aY × X/a = bY × X/b

Which is 6 pairs resulting in the same product. This will be reduced if e.g. aY = X, but still...

I think they meant to write "There are about 4 billion TIMES more 64 bit integers than 32 bit integers".

There are about 2^64 more 64-bit integers than 32-bit integers.

Indeed, edited the mistake

If you're allowed to multiply as many 32-bit numbers as you want, the only numbers you won't be able to achieve by so doing are those with any prime factor larger than 2^32.

This is more than just the prime numbers. For example, a 41-bit prime can be multiplied by 16 and it will still fit into 64 bits.

What are you assuming about overflow? Three 32-bit numbers multiply out to 96 bits.

That seems intuitively true given that most 32-bit numbers are composite, so if you have

X = ab and aY < 2^32 and bY < 2^32:

X × Y = X/a × aY = X/b × bY = Y × X = aY × X/a = bY × X/b

Which is 6 pairs resulting in the same product. This will be reduced if e.g. aY = X, but still...

Indeed, edited the mistake

What are you assuming about overflow? Three 32-bit numbers multiply out to 96 bits.

In software programming, the product between two integers is often computed to a fixed number of bits with overflow. Consider 8-bit integers. If you multiply 127 by 127, you get back the number 1 as an 8-bit unsigned integer, with an overflow. The actual full product is 16129. To represent 16129, you typically use 16 bits of precision.

Thus we have the notion of the full product. The full product of two 32-bit integers is typically represented using 64 bits. The question that preoccupied me is what fraction of all 64-bit integers can be written as the product of two 32-bit integers.

You might wonder why you would care?

We often design hash functions: they are special functions that take an input and generate a random-looking output. Several years ago I designed a very fast hash function called clhash. It is a super-fast hash function for strings having a few hundred bytes or more. If you don’t know about clhash, check it out. It is interesting in its own right.

This clhash hash function uses a type of multiplication typical of cryptographic applications. I was trying to argue that our approach had benefits compared with techniques based on standard multiplications. Let me illustrate. A simple hash function for 32-bit integers could take the least significant bits and multiply them with the most significant bits.

// simpleHighLowHash is a simple (and weak) 32-bit hash // that multiplies the high 16 bits by the low 16 bits. func simpleHighLowHash(x uint32) uint32 { high := uint16(x >> 16) low := uint16(x & 0xFFFF) return uint32(high) * uint32(low) }

Maybe you’d want the hash function to be uniform: all possible 32-bit hash values should be equally probable. It is only possible in this instance if the hash function can produce all 32-bit hash values, which is not the case.

The great mathematician Erdös showed that the proportion of all 2n-bit values that can be generated by the product of two n-bit values goes to zero as n becomes large. This means that if you have, say, 10000000-bit integers multiplying 10000000-bit integers, you’d expect relatively few 20000000-bit integers to be produced. But what about practical cases like 32-bit integers or 64-bit integers?

You can just brute-force the problem easily up to the multiplication of 16-bit integers into 32-bit products. At that point, slightly one out of five 32-bit numbers is a product between two 16-bit integers. About 80% of all 32-bit integers are never produced by this hash. However, the running time grows exponentially, and brute force won’t scale all the way to 32 bits.

So what do we do about the 32-bit case? That is, what do you do when you multiply two 32-bit integers to produce a 64-bit product? What fraction of 64-bit values can the following function produce?

func simpleHighLowHash(x uint64) uint64 { high := uint32(x >> 32) low := uint32(x & 0xFFFFFFFF) return uint64(high) * uint64(low) }

Can we get an exact result?

Yes!!!

Webster and his colleagues built the math to allow us to scale up the exact computation. He was kind enough to publish his code.

There are 3,215,709,724,700,470,902 64-bit (unsigned) integers that can be written as a product of two 32-bit integers. That’s about 17% of all possible values.

What about actually computing a pair of integers given their product? One approach consists of computing its full prime factorization, and then using those factors to build all possible divisors that are strictly less than 2^32, starting with a set of candidates containing only 1 and iteratively multiplying existing candidates by each prime factor (only keeping products that stay below 2^32). We can avoid adding duplicates to our set by processing unique prime factors with their multiplicity. Finally, we select the maximum such candidate m as the largest divisor under 2^32, compute the corresponding leftover n / m, and report whether a valid split into two 32-bit factors exists. In general, the answer (if it exists) is not unique: this returns the pair where one value is maximized. In Python, the code might look as follows.

for p in factor_multiplicities: new_candidates = [] for c in candidates: for i in range(factor_multiplicities[p] + 1): if c * (p ** i) < 2**32: new_candidates.append(c * (p ** i)) for new_c in new_candidates: candidates.append(new_c) m = max(candidates) print(f"Maximum candidate: {m}") leftover = n // m print(f"Leftover: {leftover}") if leftover >= 2**32: print("Leftover is too large, cannot find a suitable candidate.")

You might be able to come up with a more efficient algorithm. I find it interesting to consider that if you pick a value at random, it will usually fail! That is, most 64-bit integers cannot be written as the product of two 32-bit integers.

Hacker Times

Hacker Times

Only 17% of all 64-bit Integers are products of two 32-bit integers

Discussion

Discussion