While I find the 17% number interesting to think about, "most" is far less interesting. Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63. That's a hair's breadth away from "most" already, and considering even a tiny amount of overlapping results gets you there.
What gets interesting is actually trying to quantify the overlapping results.
Challenge accepted. Suppose we want to know the answer to 3 decimal places (so we'd match the headline). And suppose I allow my algorithm to be wrong one in a thousand times ("probably approximately correct").
Then sample some constant number C of random 64 bit integers. Run the following algorithm which separates each random sample into one of three classes: Y (has 32 but factors), N (does not have 32 bit factors), U (unknown).
Check if prime using probabilistic miller rabin. (Error prob goes to zero exponentially fast). If prime, return N. If it's not a prime, then run T steps of pollard rho to determine whether the number has 32 but factors; return Y,N, or U depending of the factors found up to step T.
The key observation is that T can be chosen to make the UNKNOWN class very small (with high probability), and so our estimate should rapidly converge to 17%Y, 83%N, ~0.001%U
For fixed error tolerance, this would run in roughly a constant number of iterations, independent of N.
By Erdos-Kac, almost all integers of size about n^2 have about log(log(n^2)) ~ log(log(n)) prime factors. However, almost all integers in the multiplication table have about 2*log(log(n)) prime factors.
Kevin Ford gets much more precise asymptotic estimates.
Anyway here is a fun pattern you get when you multiply 8 bit unsigned integers. Not all pairs of (upper bits, lower bits) are reachable, and it has a lot of distinct patterns.
https://i.imgur.com/Gb3HDR0.png
(Should I host the image on GitHub Gists so it doesn't vanish?)
It's much worse than that. It's difficult for a 64-bit product to have the high bit set if the multiplicands are both no larger than 32 bits.
Information quantities are more meaningfully expressed in number of bits.
The chance of a random 64 bit integer being a 32 bit integer is 0.0000000233 %
The chance of a random 64 bit integer being a product of two 32 bit integers is 17%
Nice
That should be "relatively few 20000000-bit integers", right?
Whether a 64-bit number can be written as the product of two 32-bit ones depends only on the prime factors of the 64-bit number - it's a property of the number itself, and apparently 17% of 64-bit numbers have this property.
Not sure I understand.
Adding two 32 bit integers takes you to 33 bit integers. (1111 + 1111 = 11110).
Addition doesn't care about order, so you're instantly cutting 2^33 possibilities down to 2^32. Or so is your argument. But in reality you can reach nearly all of those 2^33 numbers.
You have to redo the math to make the constraint work.
The 2^64 number is the number of inputs. For an operation which is commutative, you expect the outputs to be 2^63+2^32 or smaller, since you’ve introduced symmetry.
(Just by way of example, for n=2^33, 2n=2^34 but also =2^17*2^17)
However, since a * b = b * a, our input space has a lot of duplicate outputs. So from this alone you can conclude roughly half of the output space must be uncovered by any input pair, simply because there aren't enough input pairs.
Commutativity introduces a relation on pairs of 32 bit ints (a,b) ~ (b,a), which accounts for one bit of information. Thus, at most 50% of 64bit ints show up as products of 32 bit ints.
Most 1s won't go towards 1.5, but sometimes you're lucky.
My current comment itself, for instance, also doesn't really add anything to the discussion about the article and I'd have no expectation people leave it from going negative. Maybe the will, maybe they won't, but there is no reason to expect they should in principle of me loving tangents :D.
Therefore the fact that relatively few 64-bit numbers are products of 32-bit integers means that a lot of pairs of 32-bit integers give by multiplication the same product.
This is more than just the prime numbers. For example, a 41-bit prime can be multiplied by 16 and it will still fit into 64 bits.
X = ab and aY < 2^32 and bY < 2^32:
X × Y = X/a × aY = X/b × bY = Y × X = aY × X/a = bY × X/b
Which is 6 pairs resulting in the same product. This will be reduced if e.g. aY = X, but still...
In software programming, the product between two integers is often computed to a fixed number of bits with overflow. Consider 8-bit integers. If you multiply 127 by 127, you get back the number 1 as an 8-bit unsigned integer, with an overflow. The actual full product is 16129. To represent 16129, you typically use 16 bits of precision.
Thus we have the notion of the full product. The full product of two 32-bit integers is typically represented using 64 bits. The question that preoccupied me is what fraction of all 64-bit integers can be written as the product of two 32-bit integers.
You might wonder why you would care?
We often design hash functions: they are special functions that take an input and generate a random-looking output. Several years ago I designed a very fast hash function called clhash. It is a super-fast hash function for strings having a few hundred bytes or more. If you don’t know about clhash, check it out. It is interesting in its own right.
This clhash hash function uses a type of multiplication typical of cryptographic applications. I was trying to argue that our approach had benefits compared with techniques based on standard multiplications. Let me illustrate. A simple hash function for 32-bit integers could take the least significant bits and multiply them with the most significant bits.
// simpleHighLowHash is a simple (and weak) 32-bit hash // that multiplies the high 16 bits by the low 16 bits. func simpleHighLowHash(x uint32) uint32 { high := uint16(x >> 16) low := uint16(x & 0xFFFF) return uint32(high) * uint32(low) }
Maybe you’d want the hash function to be uniform: all possible 32-bit hash values should be equally probable. It is only possible in this instance if the hash function can produce all 32-bit hash values, which is not the case.
The great mathematician Erdös showed that the proportion of all 2n-bit values that can be generated by the product of two n-bit values goes to zero as n becomes large. This means that if you have, say, 10000000-bit integers multiplying 10000000-bit integers, you’d expect relatively few 20000000-bit integers to be produced. But what about practical cases like 32-bit integers or 64-bit integers?
You can just brute-force the problem easily up to the multiplication of 16-bit integers into 32-bit products. At that point, slightly one out of five 32-bit numbers is a product between two 16-bit integers. About 80% of all 32-bit integers are never produced by this hash. However, the running time grows exponentially, and brute force won’t scale all the way to 32 bits.
So what do we do about the 32-bit case? That is, what do you do when you multiply two 32-bit integers to produce a 64-bit product? What fraction of 64-bit values can the following function produce?
func simpleHighLowHash(x uint64) uint64 { high := uint32(x >> 32) low := uint32(x & 0xFFFFFFFF) return uint64(high) * uint64(low) }
Can we get an exact result?
Yes!!!
Webster and his colleagues built the math to allow us to scale up the exact computation. He was kind enough to publish his code.
There are 3,215,709,724,700,470,902 64-bit (unsigned) integers that can be written as a product of two 32-bit integers. That’s about 17% of all possible values.
What about actually computing a pair of integers given their product? One approach consists of computing its full prime factorization, and then using those factors to build all possible divisors that are strictly less than 2^32, starting with a set of candidates containing only 1 and iteratively multiplying existing candidates by each prime factor (only keeping products that stay below 2^32). We can avoid adding duplicates to our set by processing unique prime factors with their multiplicity. Finally, we select the maximum such candidate m as the largest divisor under 2^32, compute the corresponding leftover n / m, and report whether a valid split into two 32-bit factors exists. In general, the answer (if it exists) is not unique: this returns the pair where one value is maximized. In Python, the code might look as follows.
for p in factor_multiplicities: new_candidates = [] for c in candidates: for i in range(factor_multiplicities[p] + 1): if c * (p ** i) < 2**32: new_candidates.append(c * (p ** i)) for new_c in new_candidates: candidates.append(new_c) m = max(candidates) print(f"Maximum candidate: {m}") leftover = n // m print(f"Leftover: {leftover}") if leftover >= 2**32: print("Leftover is too large, cannot find a suitable candidate.")
You might be able to come up with a more efficient algorithm. I find it interesting to consider that if you pick a value at random, it will usually fail! That is, most 64-bit integers cannot be written as the product of two 32-bit integers.