result = [[{"x": x, "y": y, "v": (x+y if (x+y)%2==0 else None)} for (x, y) in row] for row in data]
Now with a bit of reasoning another programmer that hasn't used python might be able to figure out what that means. But what if my audience is non-programmers? The moment they encounter the first unexplained square brackets and then an opening curly brace it will essentially feel like telling them: "Here is a riddle for you" or potentially even like "I expect you to know this, dummy".Not that this text was particular bad in that regard, but I wish more math people had a heightened awareness of the fact that for many the hard part is not understanding the concept (e.g. fourier transformation), but the foreign looking signs mathematicians have decided to use to write them down.
That is as if someone explaines the way to the next train station to you in a foreign language. The hard part isn't understanding the way, it is understanding the noises that are supposed to make up the description.
And as a programmer who from time to time has to translate maths into discrete programs (or the other way around) the hard part was always parsing the notation and when I figured it out I was usually like: "Ohh, this is just a simple algorithm doing that.
So if you want to explain a math concept to programmers you should chose one of two routes:
(A) Stay with your notation and explain every character that isn't visible on a regular keyboard in length and gently lead the reader into being able to read the notation or
(B) let go of the notation and first explain what it does and how, e.g. for our FFT example: FFT slices your list of values into frequency buckets, figures out how much of each frequency is present, and returns those strengths as numbers. And then you can work backwards from that understanding towards your notation explaining which sign relates to which part of the concept (e.g. to the number of buckets).
I would prefer the latter, since it explains both the concept and gives the mathematician a chance to explain how and why math notation can be useful on top, e.g. to figure out certain properties of the method that may even have practical implications.
In contrast, EdDSA (which is based on Schorr signatures) does, by construction: the public key is included in one of the hashes, which binds the signature to a particular public key.
I haven't investigated whether cryptocurrency's use of Schnorr satisfies this property or not. (Indeed, I do not care about cryptocurrency at all.) So it's an exercise to the reader if it's satisfactory or not :3
https://www.youtube.com/live/2IpZWSWUIVE?si=-LRRbU2mJgL9LiNP...
What you wish for is more akin to coding like this:
declaring a function whose name is "max" and its arguments are "x" (of type number) and "y" (of type number) that returns a number:
statement: if a is greater than b, the function returns a
statement: the function returns b
But programmers don't bat an eye at {}[](),.!&^| (and I just realized I used the term "function" which outsiders might wish was replaced by simpler terminology!) // This is more readable if you're "in the know"
// even if it looks like a jumbled mess to outsiders
fn max(a: num, b: num): num => a > b ? a : b
Math uses terms of art like "group", "field", "modulo" and "multiplicative inverse"; and notation like "∑"; because they are short and communicate very specific (and common) things, many of which are implicit and we probably wouldn't even notice.In other words: we're not the target audience.
Note that this is not only a matter of conciseness. See Ken Iverson's (of APL/J fame) "Notation as a Tool of Thought": https://www.eecg.utoronto.ca/~jzhu/csc326/readings/iverson.p...
Since my exposition is constructive in nature, the proofs and other remarks are an integral part of the article, not digressions.
The ed25519 issues are absolutely insane. Anywhere I can read more about that?
We devs (take back that "just" :) ) deal with much harder stuff when we build complex APIs, so the problem must be at the syntactic level. To us devs, math may look like an antipattern, with all the short names and operator overloading.
But that's unavoidable, unfortunately. It's normal to spend hours or more on a single concept until it clicks. I'd say don't give up, but I understand one's time is valuable, and the return might not be high enough to justify the cost.
Paraphrasing 'Group' from the article to see if I've understood it:
A set of elements G, and some operation ⊕, where
(g1 ⊕ g2) is also in G. // "Type-safety"
Some g0 exists such that (gn ⊕ g0) == (g0 ⊕ gn) == gn // "Zero"
For every g, there's some inverse gi such that (g ⊕ gi) == (gi ⊕ g) == g0 // "Cancelling-out"
a ⊕ (b ⊕ c) == (a ⊕ b) ⊕ c // "Associative"
If (a ⊕ b) == (b ⊕ a) then the group is also "abelian/commutative"I don't have anything against introducing new words. If your concept can be adequately described by existing language that seems like a good way to allow people to learn and talk about it. Technically as a person who has studied philosophy the greek alphabet is also no big hurdle to me. But it is to others. Try googling some weird sign you found in a formula. First you don't know how it is called or how to write it, second any signed might have been used in 100 different formulae so even if you, know how to search for it (there are applications people use to identify mathematical signs) good luck at finding any meaningful answer.
I know for mathematicians these signs are arbitrary and they would say you could just use emojis as well. But then it turns out mathematicians ascribe meaning to which alphabet they are using and whether it is upper- or lowercase. Except sometimes they will break that convention for what appears to be mostly historical reasons.
I know mathematicians will get used to this just fine, but the mathematical notation system has incredibly bad UX and the ideals embedded within it are more about density and intransparency (only the genius mathematician knows what is going on), than about rigorous precision and understanding.
When I studied philosophy there were philosophers like Hegel who had to expand the German language to express their new thoughts. And there were philosophers who shall remain unnamed that would use nearly unparseable dense and complex language to express trivial thoughts. The latter always felt like an attempt to paper over their own deficiencies with the convoluted language they had learned to express themselves in.
Mathematicans can also have a degree of the latter at times. If your notation is more complex than the problem it describes your notation sucks and you waste collective human potential by using it.
I'm mentioning this, as other people in this thread are discussing "explaining symbols you use", and you're using a non-standard symbol for +. I can easily imagine a circle around + making + a different operation, and wonder if it is so?
Aspirin I've bought in the past has a + on it, and its trademark is a + within a circle. That's why I've latched on what a "common person" might view the symbol as:
https://www.brand.aspirin.com/sites/g/files/vrxlpx46831/file...
Interestingly, I have University level math courses, but decades out of date, and have never run into that symbol. I see it here:
It acts as a normal +, mostly. When you're dealing with modulo math, the "normal" plus becomes a bit weird as there are rules attached to a number expressed as "(a + b) mod c", so mathematicians often use symbols like ⊕ to mean something like "+, but different". The second link you posted does the same, it acts sort of like normal addition, conceptually, except it's not done on actual numbers but groups.
In definitions like these, you may as well use a peace symbol or a picture of a frog; "some operation ⊕" means "there is some operation we write down like this, and it does this and that".
Another place you may find ⊕ is when it's used to represent XOR in some cases; (a + b) mod 2 is a bitwise XOR when operating on single bits (again, it means "normal addition except with weird rules", namely the mod 2 that makes you throw out anything larger than the last bit).
edit: What I mean is that, as a consequence, the symbol used is not really important.
I specifically didn't use an already-existing symbol because then you wouldn't know if I'm talking about that symbol, or any symbol in general.
Integer-multiplication is associative, E.g.
"3 times 9 is 9 times 3"
(3 + 9) == (9 + 3) == 27
and it has an identity element, E.g. "3 times 1 is 3"
(3 + 0) == 3ECDSA attack cryptography ethereum extended euclidean algorithm generating functions signature malleability
In this article, we'll try to understand how ECDSA (Elliptic Curve Digital Signature Algorithm) works.
The version I have in mind is the one used by the Ethereum blockchain. Since my interest lies in security, we'll also explore the signature malleability attack.
I expect you to be familiar with Public Key Cryptography and how it can be used to sign messages, at least conceptually.
You'll only need to know basic math, so abstract algebra is not a requirement. I'll introduce the bare minimum as we go. My exposition will be deliberately unsophisticated, favoring ease of understanding over conciseness and elegance.
The reader I have in mind is someone dissatisfied with the many superficial, hand-wavy explanations of ECDSA often found in articles and books aimed at developers and auditors, but who doesn't have the time or interest to go all the way down the rabbit hole and learn cryptography in a thorough and systematic way.
If you, like me, work in a field where you need to have a working knowledge of multiple disciplines, you'll probably appreciate this kind of compromise.
Finally, this might also serve as an introduction to the topic before you turn to more serious and academic literature.
You can think of this section as a kind of disclaimer.
This article is the result of an exercise where I start from a vague understanding of a topic and try to connect all the dots and fill in all the gaps on my own, without relying on any external sources of information. This means no books, no LLMs, and no internet.
For the exercise to be effective, it needs to be written with an audience in mind, forcing you to keep track of what you've already explained and what you can expect the reader to know. It also helps you do a better job because you feel more exposed.
Have you ever gone back to something you learned in the past and realized you forgot most of it? Your knowledge has become sparse and all you remember are some facts disconnected from each other.
Can you restore the original picture on your own?
If you succeed, your final understanding will be much deeper than the one you'd have if you relied on external help such as books and notes.
With this article, I go a step further and try to connect the dots with knowledge that I never had to begin with. The fact that it's possible is what makes mathematical topics so special.
That should explain why I wrote it, but why should you read it?
Well, you get to read something:
Your role is that of an auditor or verifier, constantly trying to find any inconsistencies and non sequiturs in what I wrote: I'm the generator and you the discriminator. In a (constructively) adversarial setting, this would be an iterative process.
It goes without saying that this article is meant to be read linearly, from the start.
It's all around us:
| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
| 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| 21 | 22 | 23 | 24 | 25 | 26 | 27 |
| 28 | 29 | 30 | 31 |
If we're just interested in the day of the week, then the numbers in the same column are equivalent. What do they have in common? The fact that the difference between any two of them is always a multiple of \(7\):
Since numbers in the same column are equivalent, we can represent all of them by the smallest one. Let's call it the representative of the column. If we do that, we end up with \(7\) numbers:
| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
That's not ideal for a calendar, but it makes sense: we just add multiples of \(7\) to the starting numbers to recover the missing ones.
How do we get the representative from a number? For instance, what's the representative of \(45\)? Well, \(45 = 3 + 7\cdot 6\), so the representative is \(3\). Indeed, starting from \(3\), we add a multiple of \(7\) to get \(45\).
Now what's \(3\) with respect to \(45\)? It's the remainder of \(45\) divided by \(7\). We can get that number by using the mod(ulo) operator: \(45\ \mathrm{mod}\ 7 = 3\), or 45 % 7 == 3, in many programming languages.
Beware:
-45 % 7 is \(-3\)-45 % 7 is \(-3\)-45 % 7 is \(4\)Both values make sense, since they're separated by \(7\) and, thus, in the same column, or equivalence class, i.e. the class of all equivalent elements. But we want the representative, so \(4\) is preferred. Observe that \(-45 = -7\cdot 7 + 4\).
Basically, any time we're outside the window \(\{0, \ldots, 6\}\), we add or subtract \(7\) as many times as we need to land in the window.
Note that ((n % 7) + 7) % 7 will give the representative in any language, since:
n % 7 is in \(\{-6, -5, \ldots, 0, \ldots, 5, 6\}\)(n % 7) + 7 is in \(\{1, \ldots, 13\}\)((n % 7) + 7) % 7 is in \(\{0, \ldots, 6\}\)Observe that:
(x % 7) % 7 is just x % 7. This property is called idempotency (same power): reapplying the operation doesn't increase the extent of the effect, i.e. it gives the same result.Instead of writing mod operators everywhere, we can say that we're computing mod \(p\):
\[y^2 = x^3 + 7\ (\mathrm{mod}\ p)\]
That's equivalent to
\[y^2\ \mathrm{mod}\ p = (x^3 + 7) \ \mathrm{mod}\ p\]
which is a pain to write.
If we're only dealing with addition and multiplication, then we can insert as many "mod \(p\)" as we want wherever we want, so these two expressions are equivalent:
That's not true for exponentiation:
ECDSA doesn't rely on exponentiation, so we don't need to talk about it.
We still don't know how to divide mod \(p\). That is, we don't know how to compute, say, \(3/4\) mod \(p\), or whether it even exists.
What does dividing by \(4\) do? It does something that can be reversed by multiplying by \(4\). So the two operations cancel out and are equivalent to multiplying by \(1\), the neutral element. In other words, we must have
\[a / a = 1\ (\mathrm{mod}\ p)\]
That's usually written as
\[a\cdot a^{-1} = 1\ (\mathrm{mod}\ p)\]
where \(a^{-1}\) is called the multiplicative inverse of \(a\).
As an aside, the additive inverse, or opposite, is simply \(-n\), since \(n + (-n) = 0\), where \(0\) is the neutral element of addition. Of course, we can compute \(-n\ \mathrm{mod}\ p\) to get the representative of \(-n\).
Let's find \(x\) such that \(4\cdot x = 1\ (\mathrm{mod}\ 7)\) by using simple brute force:
I omitted "\(\left(\mathrm{mod}\ 7\right)\)" for convenience. I'll do that often from now on.
As we can see, \(4^{-1} = 2\). That's because \(4\cdot 2 = 8 = 8-7 = 1\).
Let's go back to our \(3/4\):
\[3/4 = 3\cdot 4^{-1} = 3\cdot 2=6\]
Indeed:
\[6\cdot 4 = 24 = 3\]
so we get \(3\) back.
An important fact to know is that a number \(a\) is always invertible mod \(p\), as long as it's coprime with \(p\), i.e. their GCD (greatest common divisor) is \(1\).
Proof (safely skippable)
Let's define \(r_x = a\cdot x\ \mathrm{mod}\ p\). Let \(r\) be the sequence \(r_0, \ldots, r_{p-1}\).
Again, I'll omit "mod \(p\)" for notational convenience.
If \(r_x = r_y\), i.e. \(a\cdot x = a\cdot y\), then \(a(x-y) = 0\), which means that \(a(x-y)\) is divisible by \(p\). If \(a\) and \(p\) are coprime, then \(x-y\) must be divisible by \(p\), so \(x-y = 0\), i.e. \(x=y\). In other words, \(x\neq y\) implies that \(r_x \neq r_y\).
This means that \(r\) has \(p\) distinct values in \(\{0, \ldots, p-1\}\), i.e. \(r\) is a permutation of the sequence \(0, \ldots, p-1\). In particular, \(r\) contains exactly one \(1\), so there's exactly one \(x\) such that \(a\cdot x = 1\).
End of proof!
As an example, let's look again at the brute-forcing we did above to find \(4^{-1}\ \mathrm{mod}\ 7\) and note that the results are a permutation of the numbers from \(0\) to \(6\), so they contain exactly one \(1\). That's expected since \(4\) and \(7\) are coprime.
Observe that when \(p\) is prime, all numbers from \(1\) to \(p-1\) are coprime with it, so they're all invertible.
Technically, the set of representatives \(0, \ldots, p-1\) is often denoted by \(\mathbb{Z}_p\) or \(\mathbb{Z}/p\mathbb{Z}\). It's obtained by partitioning the integers into equivalence classes (our calendar columns, but extended to all integers) and representing each class by a representative in \(\{0, \ldots, p-1\}\). That's what we did informally.
Tip
You can safely skip this section, if you already know or don't care about how the multiplicative inverse can be computed in practice. If you're interested in the method of generating functions you might still want to read the Fibonacci numbers subsection, though.
For a fast and practical way to compute the multiplicative inverse, we can use the extended Euclidean algorithm (EEA).
The Euclidean algorithm (EA) can be used to efficiently compute \(\mathrm{GCD}(a, p)\), and its extended version returns two integers \(x\) and \(y\) such that
\[ax + py = \mathrm{GCD}(a, p)\]
If \(a\) and \(p\) are coprime, then
\[ ax + py = 1 \implies ax = 1\ (\mathrm{mod}\ p) \]
This means that \(x\) is the multiplicative inverse of \(a\) mod \(p\).
How does the algorithm work? It's very simple.
The first observation is that
\[\mathrm{GCD}(a, b) = \mathrm{GCD}(a, b-a)\]
and, by symmetry,
\[\mathrm{GCD}(a, b) = \mathrm{GCD}(a-b, b)\]
We will prove this later.
Since we can subtract repeatedly, we can also use the mod operator:
\[\mathrm{GCD}(a, b) = \mathrm{GCD}(a\ \mathrm{mod}\ b, b\ \mathrm{mod}\ a)\]
This way, we can reduce the two arguments very quickly. Note that in a real implementation we only need one mod per step, since one of the two has clearly no effect.
Let's use it to compute \(\mathrm{GCD}(784, 495)\):
\[ \begin{align*} (784&, 495) \\ (289&, 495)&\qquad\qquad 289 &= 784 - 495 \\ (289&, 206)& 206 &= 495 - 289 \\ (83&, 206)& 83 &= 289 - 206 \\ (83&, 40) & 40 &= 206 - 83\cdot 2 \\ (3&, 40) & 3 &= 83 - 40\cdot 2 \\ (3&, 1) & 1 &= 40 - 3\cdot 13 \end{align*} \]
The second column shows how we got the new values. Since we obtained \(\mathrm{GCD}(3, 1)\), the GCD is \(1\), i.e. \(784\) and \(495\) are coprime.
The extended version of the algorithm uses the second column in a simple way. To start, we notice that the equation at the bottom of the second column is already in the right form, i.e.
\[40\cdot 1 + 3(-13) = 1\]
However, we want the expression with respect to the initial values \(784\) and \(495\).
The solution is easy: we just do substitutions as we go up the second column, starting from the bottom:
\[ \begin{align*} 1 &= 40 - 3\cdot 13 \\ 1 &= 40 - (83 - 40\cdot 2)\cdot 13 \\ &= 40\cdot 27 - 83\cdot 13 \\ 1 &= (206 - 83\cdot 2)\cdot 27 - 83\cdot 13 \\ &= 206\cdot 27 - 83\cdot 67 \\ 1 &= 206\cdot 27 - (289 - 206)\cdot 67 \\ &= 206\cdot 94 - 289\cdot 67 \\ 1 &= (495 - 289)\cdot 94 - 289\cdot 67 \\ &= 495\cdot 94 - 289\cdot 161 \\ 1 &= 495\cdot 94 - (784 - 495)\cdot 161 \\ &= 495\cdot 255 - 784\cdot 161 \\ \end{align*} \]
Indeed, \(495\cdot 255 - 784\cdot 161 = 1\).
So now we know that \(495\cdot 255 = 1\ (\mathrm{mod}\ 784)\).
Now, the only thing missing is to prove that
\[\mathrm{GCD}(a, b) = \mathrm{GCD}(a, b-a)\]
Let me write \(a|b\) to mean that \(a\) divides \(b\), i.e. \(b = ah\) for some integer \(h\).
If two numbers divide each other, they must be equal, so we just need to prove that, for any integer \(k\),
\[\mathrm{GCD}(u, v)\mid \mathrm{GCD}(u, v+ku)\]
Indeed, we can then argue that:
Let's prove that
\[\mathrm{GCD}(u, v)\mid \mathrm{GCD}(u, v+ku)\]
Let \(d_1\) be the GCD on the left and \(d_2\) the one on the right. It's clear that \(d_1|u\) and \(d_1|v\), which implies that \(d_1|(v+ku)\). Now we'd like to conclude that \(d_1|d_2\).
Unfortunately, we only proved that \(d_1\) is a common divisor of \(u\) and \(v+ku\) so far.
Let's show that if \(d'\) divides both \(a\) and \(b\), then it also divides \(d = \mathrm{GCD}(a,b)\).
Proof (safely skippable)
We can always express \(d\) and \(d'\) as
\[ \begin{cases} d' = u\cdot \mathrm{GCD}(d, d') \\ d = v\cdot \mathrm{GCD}(d, d') \\ 1 = \mathrm{GCD}(u, v) \end{cases} \]
where, as indicated, \(u\) and \(v\) are coprime.
Observe that if \(u\) and \(v\) weren't coprime, their common divisor would be absorbed by \(\mathrm{GCD}(d, d')\), so we'd have the same situation as above but for \(u'=u/\mathrm{GCD}(u,v)\) and \(v'=v/\mathrm{GCD}(u,v)\).
Since \(a\) and \(b\) are divisible by both \(d'\) and \(d\), then \(a' = a/\mathrm{GCD}(d, d')\) and \(b' = b/\mathrm{GCD}(d, d')\) must still be divisible by \(u\) and \(v\). So:
for some integers \(k_1\), \(k_2\), \(h_1\), and \(h_2\).
Since \(u\) and \(v\) are coprime, then \(u|k_2\) and \(u|h_2\), i.e. \(k_2 = u k_3\) and \(h_2 = u h_3\) for some integers \(k_3\) and \(h_3\). Therefore:
This means that \(ud\) is a common divisor of \(a\) and \(b\), and \(u>1\) would imply that we found a greater divisor than \(d\), their GCD.
Since \(u = 1\), then \(d' = \mathrm{GCD}(d, d')\), i.e. \(d'|d\).
End of proof!
I seem to recall that some people include this property in the definition of the GCD itself, but I think that's slightly redundant.
Anyway, we're done!
Wait! How fast is this algorithm? Let's look at the reduction again:
\[ \begin{align*} ({\color{red} 784}&, {\color{green}495}) \\ ({\color{green}289}&, {\color{red} 495})&\qquad\qquad 289 &= 784 - 495 \\ ({\color{red} 289}&, {\color{green}206})& 206 &= 495 - 289 \\ ({\color{green}83}&, {\color{red} 206})& 83 &= 289 - 206 \\ ({\color{red} 83}&, {\color{green}40}) & 40 &= 206 - 83\cdot 2 \\ ({\color{green}3}&, {\color{red} 40}) & 3 &= 83 - 40\cdot 2 \\ ({\color{red} 3}&, {\color{green}1}) & 1 &= 40 - 3\cdot 13 \end{align*} \]
I can see two super Fibonacci sequences. Here's the green one:
| Green Seq. | 1 | 3 | 40 | 83 | 206 | 289 | 495 |
| Green Mult. | 0 | 13 | 2 | 2 | 1 | 1 | 1 |
Fibonacci numbers form a sequence \(F_0, F_1, F_2, \ldots\) where the recurrence relation is \(F_i=F_{i-2}+F_{i-1}\), for \(i=2, 3, \ldots\).
In our case, however, the recurrence relation is \(F_i = F_{i-2}+F_{i-1}\cdot M_{i-1}\), where \(F\) is on the first row and \(M\) on the second row of the table.
As an example, I highlighted 4 elements in the table: \(206 = 40 + 83\cdot 2\). I call this a super Fibonacci sequence because the multipliers make it grow faster than the regular one (corresponding to all \(M_i=1\)).
Fibonacci numbers grow exponentially, so the number of steps necessary to reach a number \(n\) is \(\Theta(\log n)\).
Since our sequence is even faster, the number of steps is lower and all we can say for now is that the worst-case complexity of EEA is \(O(\log n)\).
Note
Technically, \(\Theta\) denotes exact growth, \(O\) denotes an upper bound, and \(\Omega\) denotes a lower bound.
For instance, \(n = O(n^2)\) is correct, though in practice people often use \(O\) when they really mean \(\Theta\).
Moreover, \(n = O(n^2)\) really means \(n \in O(n^2)\), but the former notation is more common than the latter.
Can we think of a very slow sequence? But, of course! We can build it starting from the bottom and always choosing \(M_i=1\):
\[ \begin{align*} \cdots\ & \cdots \\ ({\color{red} 34}&, {\color{green}55}) \\ ({\color{red} 34}&, {\color{green}21})&\qquad\qquad 21 &= 55 - 34 \\ ({\color{green}13}&, {\color{red} 21})& 13 &= 34 - 21 \\ ({\color{red} 13}&, {\color{green}8})& 8 &= 21 - 13 \\ ({\color{green}5}&, {\color{red}8})& 5 &= 13 - 8 \\ ({\color{red} 5}&, {\color{green}3})& 3 &= 8 - 5 \\ ({\color{green}2}&, {\color{red} 3})& 2 &= 5 - 3 \\ ({\color{red} 2}&, {\color{green}1})& 1 &= 3 - 2 \end{align*} \]
Those are basically two Fibonacci sequences! This tells us that the worst case of the EEA is indeed logarithmic or, to be precise, \(\Theta(\log (\min\{a, b\}))\). Why min? Because we have two sequences: the green and the red one. Since they start and end together, the faster one dominates the other and faster growth means shorter sequence, so the time complexity is \(\Theta(\min\{\log a, \log b\})\), i.e. \(\Theta(\log (\min\{a, b\}))\).
I had no idea that the EA had such a connection with the Fibonacci numbers before writing this section. As always, check my reasoning!
Tip
You can safely skip this section. You don't need it for the rest of the article, but if you want to learn about generating functions, I think this is a good opportunity.
I want to find the base for the logarithm that appears in the time complexity of the EA and EEA algorithms.
If we assume that Fibonacci numbers grow exponentially, i.e. \(F_i\sim b^i\), then:
\[ \begin{align*} F_{i+2} &= F_{i+1} + F_i \\ &\hspace{10pt}\Downarrow \\ b^{i+2} &= b^{i+1} + b^i \end{align*} \]
We divide by \(b^i\) and get \(b^2-b-1 = 0\), whose positive solution is
\[b_+ = \frac{1 + \sqrt{5}}{2} \approx 1.618\]
That's the well-known golden ratio.
We started from the assumption that the growth is exponential, but what's the exact expression for the \(n\)-th Fibonacci number, just to make sure we're correct?
Let \(V_0\) be the vector of Fibonacci numbers \(F_i\):
| \(V_0:\) | \(F_0\) | \(F_1\) | \(F_2\) | \(F_3\) | \(F_4\) | \(F_5\) | \(F_6\) | \(\ldots\) |
Now let's introduce the two shifted versions \(V_1\) and \(V_2\):
| \(V_0:\) | \(F_0\) | \(F_1\) | \(F_2\) | \(F_3\) | \(F_4\) | \(F_5\) | \(F_6\) | \(\ldots\) |
| \(V_1:\) | \(F_0\) | \(F_1\) | \(F_2\) | \(F_3\) | \(F_4\) | \(F_5\) | \(\ldots\) | |
| \(V_2:\) | \(F_0\) | \(F_1\) | \(F_2\) | \(F_3\) | \(F_4\) | \(\ldots\) |
We can see that, from the third column onward, \(V_0 = V_1 + V_2\), because of the relation \(F_{i+2} = F_{i+1} + F_{i}\).
The advantage of the vector approach is that we don't have to deal with the index \(i\) anymore. In a sense, we vectorized the loop and abstracted away the annoying index. The drawback is that we lost some algebraic power because, unless we introduce other operations, we don't even know how to express the fact that \(V_1\) is a shifted version of \(V_0\).
Instead of reinventing the wheel, why don't we use a power series instead of a simple vector? I'm thinking of something like this:
\[ \begin{align*} P_0(x) &= F_0 + F_1 x + F_2 x^2 + F_3 x^3 + F_4 x^4 + F_5 x^5 + \ldots \\ P_1(x) &= \phantom{F_0 +} F_0 x + F_1 x^2 + F_2 x^3 + F_3 x^4 + F_4 x^5 + \ldots \\ P_2(x) &= \phantom{F_0 + F_1 x +} F_0 x^2 + F_1 x^3 + F_2 x^4 + F_3 x^5 + \ldots \\ \end{align*} \]
The individual elements are kept separated thanks to the different powers of \(x\), and we inherit lots of algebraic properties from power series!
For instance, it's easy to see that \(P_1(x) = x P_0(x)\) and \(P_2(x) = x^2 P_0(x)\).
Moreover, we can state algebraically what we observed before:
We can see that, from the third column onward, \(V_0 = V_1 + V_2\), because of the relation \(F_{i+2} = F_{i+1} + F_{i}\).
With power series, that becomes
\[ P_0(x) - F_0 - F_1 x = P_1(x) - F_0 x + P_2(x) \]
Note that we simply removed the unwanted terms in the first two columns:
\[ \begin{align*} P_0(x) - F_0 - F_1 x &= F_2 x^2 + F_3 x^3 + F_4 x^4 + F_5 x^5 + \ldots \\ P_1(x) - F_0 x &= F_1 x^2 + F_2 x^3 + F_3 x^4 + F_4 x^5 + \ldots \\ P_2(x) &= F_0 x^2 + F_1 x^3 + F_2 x^4 + F_3 x^5 + \ldots \\ \end{align*} \]
To reiterate, we expressed all the following relations at once:
We can now express \(P_1\) and \(P_2\) in terms of \(P_0\). For convenience, let's write \(P\) instead of \(P_0(x)\):
\[ P - F_0 - F_1 x = x P - F_0 x + x^2 P \]
Now we solve for \(P\):
\[ P = \frac{(F_0 - F_1)x - F_0}{x^2 + x - 1} \]
Since Fibonacci numbers start with \(0\) and \(1\), let's substitute \(F_0=0\) and \(F_1=1\):
\[ P = \frac{-x}{x^2 + x - 1} \]
That's in implicit form. If we can put \(P\) in explicit form, then we can read the expression for the generic coefficient of \(x^i\), which, by construction, is the \(i\)-th Fibonacci number! Here's the form we want:
\[ P(x) = \sum_{i=0}^\infty \alpha_i x^i \]
The coefficient \(\alpha_i\) is bound to be a general expression for \(F_i\).
How do we do that?
Let's take the simplest power series we can think of and find both the explicit and implicit forms for it:
\[ S(x) = 1 + x + x^2 + x^3 + x^4 + \ldots \]
We can use the same trick, i.e. the shift:
\[ \begin{align*} S(x) &= 1 + x + x^2 + x^3 + x^4 + x^5 + \ldots \\ xS(x) &= \phantom{1 +} x + x^2 + x^3 + x^4 + x^5 + \ldots \\ S(x) - xS(x) &= 1 \end{align*} \]
Maybe surprisingly:
\[ \begin{gather*} S(x) - xS(x) = 1 &\implies \\ (1 - x) S(x) = 1 &\implies \\ S(x) = \frac{1}{1 - x} \end{gather*} \]
Written more formally, we proved that
\[ \sum_{i=0}^\infty x^i = \frac{1}{1-x} \]
We don't care about convergence, as we only want to read the coefficients of the series. As long as what we do is algebraically correct, we should be fine. We might say that we're repurposing some algebraic machinery to do "something else".
Note
We say a series converges if it evaluates to a real number. Otherwise, we say it diverges. For instance, the following series clearly converges: $$ \sum_{i=0}^\infty \left(\frac{1}{10}\right)^i = 1 + 0.1 + 0.01 + \ldots = 1.1111\ldots $$
In the Fibonacci case, we want to find \(\alpha_i\) such that
\[ \sum_{i=0}^\infty \alpha_i x^i = \frac{-x}{x^2 + x - 1} \]
Now that we've witnessed how the simple case works, we should have more confidence that this method might just work! It might still look like magic, though.
How do we close the gap between the simple case and the Fibonacci case?
First, we notice that the denominator is factorizable:
\[ P = \frac{-x}{(x - c_1)(x - c_2)} \]
where
\[ \begin{align*} c_1 &= \frac{-1 + \sqrt5}{2} \\ c_2 &= \frac{-1 - \sqrt5}{2} \end{align*} \]
These are the same solutions we found for \(b\) at the start of this section, but multiplied by \(-1\).
Now we can split the expression into two simpler ones:
\[ \frac{-x}{(x - c_1)(x - c_2)} = \frac{A}{x - c_1} + \frac{B}{x - c_2} \]
We just need to find the appropriate \(A\) and \(B\) to get \(-x\) at the numerator:
\[ \begin{gather*} \frac{A}{x - c_1} + \frac{B}{x - c_2} =\\ \frac{A(x-c_2) + B(x-c_1)}{(x-c_1)(x-c_2)} =\\ \frac{x(A+B) - c_2 A - c_1 B}{(x-c_1)(x-c_2)} \end{gather*} \]
We want \(-x = x(A+B) - c_2 A - c_1 B\), so we must have
\[ \begin{cases} A+B = -1 \\ c_2 A + c_1 B = 0 \end{cases} \]
The solutions are
\[ \left\{ \begin{align*} A = \frac{-c_1}{c_1-c_2} \\ B = \frac{c_2}{c_1-c_2} \end{align*} \right. \]
Therefore:
\[ (c_1-c_2) P = - \frac{c_1}{x - c_1} + \frac{c_2}{x - c_2} \]
If we can convert each of the two parts into explicit form, then we're done, since explicit forms sum nicely: we just sum the corresponding coefficients.
Now we divide numerator and denominator of the left part by \(c_1\) and of the right part by \(c_2\):
\[ (c_1-c_2) P = - \frac{1}{\frac{x}{c_1} - 1} + \frac{1}{\frac{x}{c_2} - 1} \]
We change some signs:
\[ (c_1-c_2) P = \frac{1}{1 - \frac{x}{c_1}} - \frac{1}{1 - \frac{x}{c_2}} \]
Success:
\[ (c_1-c_2) P = \left( \sum_{i=0}^\infty \left(\frac{x}{c_1}\right)^i \right) - \left( \sum_{i=0}^\infty \left(\frac{x}{c_2}\right)^i \right) \]
Expanding and simplifying:
\[ (c_1-c_2)P = \sum_{i=0}^\infty \left(\frac{1}{c_1^i} - \frac{1}{c_2^i} \right)x^i = \sum_{i=0}^\infty \frac{c_2^i-c_1^i}{c_1^i c_2^i} x^i \]
We can simplify it further, since \(c_1 c_2 = -1\) and \(c_1 - c_2 = \sqrt5\):
\[ P = \sum_{i=0}^\infty (-1)^i \frac{c_2^i-c_1^i}{\sqrt5} x^i \]
At last:
\[ F_i = (-1)^i \frac{c_2^i-c_1^i}{\sqrt5} \]
with
\[ \begin{align*} c_1 &= \frac{-1 + \sqrt5}{2} \\ c_2 &= \frac{-1 - \sqrt5}{2} \end{align*} \]
Let's check this with the simplest and dumbest Python code possible:
[](#__codelineno-0-1)from math import sqrt [](#__codelineno-0-2) [](#__codelineno-0-3)s5 = sqrt(5) [](#__codelineno-0-4)c1 = (-1 + s5) / 2 [](#__codelineno-0-5)c2 = (-1 - s5) / 2 [](#__codelineno-0-6) [](#__codelineno-0-7)def fib(i): [](#__codelineno-0-8) return (-1)**i * (c2**i - c1**i)/s5 [](#__codelineno-0-9) [](#__codelineno-0-10)[int(fib(i)) for i in range(20)]
Fingers crossed:
[](#__codelineno-1-1)[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]
Phew...
If we substitute \(c_1\) and \(c_2\) in the formula, we get
\[ F_i = \frac{(1+\sqrt5)^i-(1-\sqrt5)^i}{2^i\sqrt5} \]
We obtained the same solutions we found for \(b\), and we get the same asymptotic growth as well.
Indeed, the numerator grows as \((1+\sqrt5)^i\):
\[ \lim_{i\to\infty} \frac{(1+\sqrt5)^i-(1-\sqrt5)^i}{(1+\sqrt5)^i} = \lim_{i\to\infty} \left(1 - \left(\frac{1 - \sqrt5}{1 + \sqrt5}\right)^i\right) = 1 \]
So, \(F_i \sim \frac{\phi^i}{\sqrt5} = \Theta(\phi^i)\), where \(\phi = \frac{1+\sqrt5}{2}\).
By the way, \(P(x) = \sum_{i=0}^\infty F_i x^i\) is called a generating function.
Ethereum's ECDSA uses the elliptic curve secp256k1, defined as
\[y^2 = x^3 + 7\ (\mathrm{mod}\ p)\]
where \(p\) is a very big prime number.
Here's what the continuous version (i.e. without mod) of the curve looks like:
The continuous elliptic curve is the set of all points \((x, y)\) in the plane such that \(y^2 = x^3 + 7\). Because \(y\) is squared, the curve is symmetric about the X-axis, i.e. \((x, y)\) is on the curve if and only if \((x, -y)\) is.
When we switch to mod \(p\), things get complicated:
To draw that picture I used \(p = 97\), a small prime. The blue line is the continuous curve, while the dots are the solutions in \(\mathbb{Z}_p\times\mathbb{Z}_p\). Note that those solutions must always be finitely many, since they lie on a \(p\times p\) grid.
This figure only shows the upper right part (1st quadrant) of the previous one, so we can't see the symmetry of the continuous curve. Yet, the points in \(\mathbb{Z}_p\times\mathbb{Z}_p\) show a new symmetry: they're reflected across the horizontal line \(y = p/2\). That makes sense:
Let's plot the negative solutions as well:
As we can see, the part above is identical to the part below, since adding \(p\) or \(-p\) to a coordinate doesn't change anything mod \(p\).
A group \(G\) is a set of elements equipped with a binary operation, \(+\), such that:
If for all \(a, b\in G\) we have \(a+b = b+a\), then \(G\) is abelian or commutative.
Notice that we can't have two distinct identities \(0_1 \neq 0_2\), since
\[0_1 = 0_1+0_2 = 0_2\]
Let's break it down:
Therefore, the two identities must be equal.
The same goes for the additive inverse. If \(x\) and \(y\) are opposites of \(a\) then:
\[ \begin{align*} a+x &= 0 \\ a+y &= 0 \end{align*} \]
That means that:
\[ \begin{align*} a+x &= a+y &\implies \\ x+(a+x) &= x+(a+y) &\implies \\ (x+a)+x &= (x+a)+y &\implies \\ 0+x &= 0+y &\implies \\ x &= y \end{align*} \]
So there's only one inverse per element.
Given a subset \(S\) of elements from \(G\), we can define the subgroup \(G_S\) of \(G\) generated by \(S\) as the smallest group that includes all the elements of \(S\). We say that \(S\) is a generator of \(G_S\).
In this article we'll only describe in detail the case where \(S\) has just one element. For simplicity, we'll use the same symbol for the subgroup and its generator, so a generator \(G\) generates the (sub)group \(G\) defined as follows:
\[G = \{0, G, G+G, G+G+G, G+G+G+G, \ldots\}\]
That's very cumbersome to read and write, so let's define
\[nG = G_1 + G_2 + \cdots + G_n,\ \mathrm{where}\ G_1 = G_2 = \cdots = G_n = G\]
Now we can rewrite the definition as
\[G = \{0, G, 2G, 3G, 4G, \ldots\}\]
or even
\[G = \{0G, 1G, 2G, 3G, 4G, \ldots\}\]
where we define \(0G = 0\).
Here's another definition of a group.
A group \(G\) is a set of elements equipped with a binary operation, \(\cdot\), such that:
To write the new definition, I simply copied and pasted the previous one, and replaced:
I also tweaked the text a little, but that's it. The two definitions are perfectly equivalent, and the only difference is notational. Feel free to use any symbol you want for the operator, the identity or whatever...
You should convince yourself that the real numbers form a group:
ECDSA defines a group over the points on the curve mod \(p\). To do that, we first need to define an addition between points.
Here's how it's done on the continuous version of the elliptic curve:
If \(P\) and \(Q\) are two points on the curve, then \(P+Q\) is the reflection across the X-axis of the intersection between the curve and the line passing through \(P\) and \(Q\).
The line through \(P\) and \(Q\) always intersects the curve at a third point, except when the line is perfectly vertical. In that case, the line is said to intersect the curve at \(0\), called the point at infinity. The point \(0\) acts as the identity, but is not really part of the curve, so it needs to be handled as a special case.
Now, observe that the line through \(R = (x, y)\) and \(-R = (x, -y)\) is vertical, so \(R+(-R)=0\), as suggested by the "\(-\)" sign.
Note
Usually, the point at infinity is denoted by \(\mathcal{O}\) and the origin by \(0 = (0, 0)\). However, since we have no need for the origin in this context, we'll denote the point at infinity by \(0\), stressing the fact that it's the zero of the group.
When \(P = Q\), the line through \(P\) and \(Q\) is taken to be the tangent to the curve at \(P\) (or, equivalently, \(Q\)). It makes sense:
After all, the animation is continuous when the slope of the line is continuous, and the slope is continuous at a point when it's equal to its limit, i.e. the derivative, at that point.
Here's a figure with a fixed point \(P\) and secants through \(P\) and several \(Q_i\) points that converge to \(P\). I chose a color map such that the closer a \(Q_i\) is to \(P\), the bluer the secant through it becomes.
By the way, we still count the tangent line as intersecting the elliptic curve in three points, with two coinciding at the point of tangency.
But how is it possible that even an "almost vertical" line through two points on the curve always intersects the curve at a third point? Such a line intersects the curve either at a point towards \((+\infty, +\infty)\) or \((+\infty, -\infty)\).
For \(x\to+\infty\):
What we're saying is that when \(x\) becomes very big, additive terms such as \(q\) and \(7\) are dwarfed and can be ignored, so the line grows like \(mx\), while the curve grows like \(x^{3/2}\). The curve grows asymptotically faster, so, when \(m > 0\), the upper branch of the curve will hit the line from below and cross it sooner or later. Similarly, when \(m < 0\), the lower branch of the curve will hit the line from above and cross it.
Here's a visual example for \(m>0\):
The algebra is easy enough. We just compute the intersection between the curve and the line. Let's start from the two points \(P = (x_P, y_P)\) and \(Q = (x_Q, y_Q)\).
If the line is vertical, the third intersection point is \(0\). Otherwise, its equation has the form \(y = mx + q\), where
\[ \begin{align*} m &= \frac{y_Q-y_P}{x_Q-x_P} \\ q &= y_P - m x_P \end{align*} \]
We need to solve for \(x\) and \(y\) the system
\[ \left\{ \begin{align*} y &= mx + q \\ y^2 &= x^3 + 7 \end{align*} \right. \]
We substitute the first equation into the second and get
\[ \begin{gather*} (mx + q)^2 = x^3 + 7 &\iff\\ m^2x^2 + q^2 + 2mqx = x^3 + 7 &\iff\\ x^3 - m^2x^2 - 2mqx + 7 - q^2 = 0 \end{gather*} \]
Before giving in to despair, we remember that we already know two solutions, \(x_P\) and \(x_Q\), and we're looking for the third point \(-R = (x_{-R}, y_{-R})\). This means that the LHS of the equation in \(x\) must be factorizable as
\[(x - x_P)(x - x_Q)(x - x_{-R}) = 0\]
Let's expand it to get
\[x^3 + x^2(-x_P-x_Q-x_{-R}) + \ldots = 0\]
The second term is all we need: for the two LHS to be equal, we must have
\[ \begin{align*} -m^2 &= -x_P-x_Q-x_{-R} \iff\\ x_{-R} &= m^2 - x_P - x_Q \end{align*} \]
The \(y\) coordinate is simply \(y_{-R} = m x_{-R} + q\).
Therefore, the sum of the two points can be computed as
\[ \left\{ \begin{align*} m &= (y_Q-y_P)/(x_Q-x_P) \\ x_R &= m^2 - x_P - x_Q \\ y_R &= m (x_P - x_R) - y_P \end{align*} \right. \]
As we can see, finding the sum of two points requires subtractions, multiplications, and one division to compute \(m\). To sum two points on the curve mod \(p\), we just need to do the calculations mod \(p\). We know how to "divide" mod \(p\), so we're good.
Let's briefly go through the case with \(P=Q=(x,y)\). Recall that we need to consider the tangent to the curve at \((x,y)\). Note that for \(y=0\) the tangent is perfectly vertical, so the result is \(0\) (look at the figure). For \(y\neq 0\), we need to compute the derivative.
We start from the equation
\[y^2 = x^3 + 7\]
and derive both sides with respect to \(x\):
\[2y \frac{dy}{dx} = 3x^2\]
We solve for the derivative:
\[\frac{dy}{dx} = \frac{3x^2}{2y}\]
That's our \(m\).
A complete implementation will also handle the (trivial) edge cases with \(P=0\) or \(Q=0\), of course.
Note
In previous sections, we wrote \(P+Q=R\), where \(-R\) was the intersection point and \(R\) its reflection, i.e. the final result. However, in this section, and only in this section, we assume that \(P\), \(Q\), and \(R\) are on the same line, so the intersection must be called \(R\) and its reflection \(-R\). The same goes for \(P\) and \(Q\).
Having to reflect the intersection point might seem a little arbitrary at first, but if we think about it, not reflecting it is problematic. Indeed, since the three points lie on the same line, without reflection all the following equations would hold:
By summing the first two equations, we'd get \(2P+Q+R=R+Q\), i.e. \(P=0\). Analogously, by summing the last two equations, we'd get \(R=0\). Since \(P=Q=R=0\), that rule would only work for \(0\).
The correct rule will look less asymmetric if we think of it as \(P+Q+R=0\), which gives
What about the point at infinity? Where does it come from? All I know is that it has to do with the so-called projective space.
I got somewhat acquainted with that space when I was doing 3D graphics. In 3D, we may add a 4th coordinate, so that \((x, y, z)\) is represented by \((wx, wy, wz, w)\) and some computations become more regular (i.e. with fewer exceptions). At the end, we divide by \(w\) to go back to the usual coordinates.
There's also the 2D case when we project a 3D scene onto a 2D screen: \(\pi(x, y, z) = (x/z, y/z)\), where I used \(\pi\) for projection. This has to do with how we perceive the world, so that the farther an object is from us, the smaller it looks (assuming, from our POV, that \(z\) is the distance of the object from us).
Let's say we have some 3D space. We make it projective by imposing that, in general, \((x, y, z) \sim (\lambda x, \lambda y, \lambda z)\) for all \(\lambda\neq 0\), and \((x, y, z)\neq(0, 0, 0)\), where \(\sim\) means equivalent. In words, all non-zero scalings of a non-zero point are equivalent. Those are all the points, origin excluded, on the same line through the origin.
The classes partition the punctured (i.e. without the origin) 3D space. The origin, if included, would be in its own singleton class \(\{0\}\) anyway. If we instead allowed \(\lambda = 0\), then two points \(x\) and \(y\) on different lines through the origin would violate the equivalence relation: \(x\sim 0\) and \(y\sim 0\), but \(x\nsim y\).
Indeed, an equivalence relation must follow three rules:
where "\(\wedge\)" means "and".
To remember the rules, we can just think of equality, which is also an equivalence relation:
So, if \(x\sim 0\) and \(0\sim y\), then we must have \(x\sim y\). If \(0\) is equivalent to elements that belong to different classes, then it breaks transitivity.
Since the origin doesn't belong to the projective space, any generic point \((x, y, z)\) in it is to be considered non-zero.
Back to our elliptic equation. On the 2D plane, the equation is \(y^2 = x^3 + 7\), but that won't work in the projective space. Since \((x, y, z)\sim (\lambda x, \lambda y, \lambda z)\), with \(\lambda\neq 0\), we'd like for \((\lambda x, \lambda y, \lambda z)\) to be on the curve whenever \((x, y, z)\) is.
Let's write the equation in the projective space as
\[Y^2 = X^3 + 7\]
and do the substitution \((X, Y, Z) = (\lambda x, \lambda y, \lambda z)\):
\[\lambda^2 y^2 = \lambda^3 x^3 + 7\]
We want that to hold whenever \((x, y, z)\) is a solution, i.e. whenever \(y^2 = x^3 + 7\). For that to happen, the equation must factorize as
\[f(\lambda)(y^2 - x^3 - 7) = 0\]
so that when the second factor is \(0\), the equation holds regardless of the factor with \(\lambda\).
We still have a \(Z\) to add, so why not use it to balance the degree of the terms? That is:
\[Y^2 Z = X^3 + 7 Z^3\]
Now the substitution gives
\[ \begin{gather*} \lambda^3 y^2 z = \lambda^3 x^3 + 7 \lambda^3 z^3 &\iff \\ \lambda^3 (y^2 z - x^3 - 7 z^3) = 0 \end{gather*} \]
We did it, but what about that annoying extra \(z\)? If we want to recover the original equation, we need to set \(z=1\), i.e. we need to restrict ourselves to \((x, y, 1)\).
That's perfectly fine, though: the original curve lies on the \(z=1\) plane, while on each \(z=\lambda\) plane, with \(\lambda\neq 0\), lies a \(\lambda\)-scaled version of the original curve:
With this setup, we can say that either all the elements of an equivalence class (a punctured line through the origin) are on the curve or none of them are.
There's actually an easier way to get the equation in \(x\), \(y\), and \(z\) coordinates. The original 2D curve is embedded in the 3D space by adding a \(z = 1\) coordinate, i.e.
\[(x, y)\mapsto (x, y, 1) \sim (\lambda x, \lambda y, \lambda) = (X, Y, Z)\]
Starting from a generic point \((X, Y, Z)\) with \(Z\neq 0\), we can go back to the 2D case by just dividing by \(Z\) and dropping the third coordinate, i.e.
\[(x, y) = (X/Z, Y/Z)\]
Now, let's substitute \(x = X/Z\) and \(y = Y/Z\) into the starting equation and get rid of the denominators:
\[ \begin{align*} y^2 &= x^3 + 7 \\ (Y/Z)^2 &= (X/Z)^3 + 7 \\ \frac{Y^2}{Z^2} &= \frac{X^3}{Z^3} + 7 \\ Y^2 Z &= X^3 + 7Z^3 \end{align*} \]
One can also apply other projections. For instance, \(x = X/Z^2\) and \(y = Y/Z^3\) lead to
\[ \begin{align*} y^2 &= x^3 + 7 \\ (Y/Z^3)^2 &= (X/Z^2)^3 + 7 \\ \frac{Y^2}{Z^6} &= \frac{X^3}{Z^6} + 7 \\ Y^2 &= X^3 + 7Z^6 \end{align*} \]
This is actually nicer and used to greatly speed up computations. It's called Jacobian projection.
We still haven't solved the mystery of the point at infinity. That was the main reason why we decided to explore projective spaces.
We know that a vertical line intersects the planar elliptic curve either at no points at all or at the points \(P\), \(-P\), and \(0\) for some point \(P\), where \(0\) is the so-called point at infinity. Let's try to make sense of it.
On the plane, a vertical line has equation \(x = k\), but in our projective space that's the equation of a plane. Like with the elliptic curve, we want to upgrade the equation so that if \((x, y, z)\) is on the line, then so is \((\lambda x, \lambda y, \lambda z)\) for \(\lambda\neq 0\). We use the same substitution as before:
\[ \begin{align*} x &= k \\ \frac{X}{Z} &= k \\ X &= kZ \end{align*} \]
So the equation of the vertical plane becomes \(X = kZ\), which represents a family of \(Z\)-scaled vertical lines. The equation \(X = kZ\) makes sense since, as \(Z\) gets closer to \(0\), the line must also get closer to the origin because everything, curve included, gets scaled down.
Let's find the intersections between the curve and the line by solving
\[ \begin{cases} Y^2 Z = X^3 + 7Z^3\\ X = kZ \end{cases} \]
We substitute the second equation into the first:
\[Y^2 Z = k^3 Z^3 + 7Z^3\]
That can be rewritten as
\[Z (Z^2 (k^3 + 7) - Y^2) = 0\]
For \(Z\neq 0\), we can divide by \(Z\), so we're left with
\[Z^2 (k^3 + 7) - Y^2 = 0\]
which gives two solutions for each \(Z\neq 0\) because of that \(Y^2\). Those two solutions correspond to two points \(P\) and \(-P\).
For \(Z = 0\), we get \(X = 0\) from the second equation, i.e. the solutions are \((0, \lambda, 0)\) for \(\lambda\neq 0\), which is the Y-axis without the origin. We can take \((0, 1, 0)\) as the representative of that class, which is exactly the point at infinity. As we can see, it doesn't live in any plane with the curve, so it doesn't exist in our original 2D space, but we already knew that.
This reminded me that we never designated representatives for the equivalence classes. Let's see:
So, we have three groups of representatives:
Tip
You can safely skip this section!
Computing \(m\), the slope of the line through the points \((x_1, y_1)\) and \((x_2, y_2)\), requires a division, which, mod \(p\), is a relatively expensive operation (compared to simple additions and multiplications).
Let's recall how to compute the point \((x_3, y_3) = (x_1, y_1) + (x_2, y_2)\), assuming \(x_1\neq x_2\):
\[ \begin{align*} m &= (y_2-y_1)/(x_2-x_1) \\ x_3 &= m^2 - x_1 - x_2 \\ y_3 &= m (x_1 - x_3) - y_1 \end{align*} \]
To go from the plane to the 3D projective space, we proceed as we did before with the elliptic curve and the line equations, i.e. we make the substitution \((x,y) = (X/Z, Y/Z)\).
Let's start with \(m\):
\[ \begin{align*} m &= \frac{y_2-y_1}{x_2-x_1} \\ &= \frac{Y_2/Z_2 - Y_1/Z_1}{X_2/Z_2 - X_1/Z_1} \\ &= \frac{Y_2/Z_2 - Y_1/Z_1}{X_2/Z_2 - X_1/Z_1} \cdot \frac{Z_1 Z_2}{Z_1 Z_2} \\ &= \frac{Y_2 Z_1 - Y_1 Z_2}{X_2 Z_1 - X_1 Z_2} \end{align*} \]
We define
\[ \begin{align*} A &= Y_1 Z_2 \\ B &= Y_2 Z_1 - A \\ C &= X_1 Z_2 \\ D &= X_2 Z_1 - C \end{align*} \]
Therefore:
\[ m = \frac{Y_2 Z_1 - Y_1 Z_2}{X_2 Z_1 - X_1 Z_2} = \frac{B}{D} \]
Now we deal with \(x_3\):
\[ \begin{align*} x_3 &= m^2 - x_1 - x_2 \\ \frac{X_3}{Z_3} &= \frac{B^2}{D^2} - \frac{X_1}{Z_1} - \frac{X _2}{Z_2} \\ X_3 &= Z_3 \frac{B^2 Z_1 Z_2 - D^2 (X_1 Z_2 + X_2 Z_1)}{D^2 Z_1 Z_2} \\ &= Z_3 \frac{B^2 Z_1 Z_2 - D^2 (D + 2C)}{D^2 Z_1 Z_2} \end{align*} \]
It's \(y_3\)'s turn:
\[ \begin{align*} y_3 &= m (x_1 - x_3) - y_1 \\ \frac{Y_3}{Z_3} &= \frac{B}{D} \left(\frac{X_1}{Z_1} - \frac{X_3}{Z_3}\right) - \frac{Y_1}{Z_1} \\ Y_3 &= Z_3 \frac{B(X_1 Z_3 - X_3 Z_1) - Y_1 D Z_3}{D Z_1 Z_3} \\ &= \frac{B(X_1 Z_3 - X_3 Z_1) - Y_1 D Z_3}{D Z_1} \\ &= \frac{B\left(X_1 Z_3 - \left( Z_3 \frac{B^2 Z_1 Z_2 - D^2 (D + 2C)}{D^2 Z_1 Z_2} \right) Z_1\right) - Y_1 D Z_3}{D Z_1} \\ &= Z_3\frac{B\left(X_1 - \frac{B^2 Z_1 Z_2 - D^2 (D + 2C)}{D^2 Z_2}\right) - Y_1 D}{D Z_1} \\ &= Z_3\frac{B\left(X_1 - \frac{B^2 Z_1 Z_2 - D^2 (D + 2C)}{D^2 Z_2}\right) - Y_1 D}{D Z_1}\cdot \frac{D^2 Z_2}{D^2 Z_2} \\ &= Z_3\frac{B(X_1 Z_2 D^2 - B^2 Z_1 Z_2 + D^2 (D + 2C)) - Y_1 D^3 Z_2}{D^3 Z_1 Z_2} \\ &= Z_3\frac{B(D^2 (D + 3C) - B^2 Z_1 Z_2) - Y_1 D^3 Z_2}{D^3 Z_1 Z_2} \\ \end{align*} \]
We substituted \(X_3\) into \(Y_3\), so we could factor out \(Z_3\), since \(X_3 = Z_3 (\ldots)\).
We end up with
\[ \begin{align*} X_3 &= Z_3\frac{B^2 Z_1 Z_2 - D^2 (D + 2C)}{D^2 Z_1 Z_2} \\ Y_3 &= Z_3\frac{B(D^2 (D + 3C) - B^2 Z_1 Z_2) - Y_1 D^3 Z_2}{D^3 Z_1 Z_2} \end{align*} \]
and by choosing \(Z_3 = D^3 Z_1 Z_2\), we get rid of the denominators:
\[ \begin{align*} X_3 &= D(B^2 Z_1 Z_2 - D^2 (D + 2C)) \\ Y_3 &= B(D^2 (D + 3C) - B^2 Z_1 Z_2) - Y_1 D^3 Z_2 \\ Z_3 &= D^3 Z_1 Z_2 \end{align*} \]
We can clean that up further by defining
\[ \begin{align*} E &= B^2 Z_1 Z_2 - D^2 (D + 2C) \\ F &= D^3 Z_2 \end{align*} \]
which results in
\[ \begin{align*} X_3 &= DE \\ Y_3 &= B(D^2 C - E) - Y_1 F \\ Z_3 &= Z_1 F \end{align*} \]
Note that further micro-optimizations are possible. For instance, we shouldn't compute \(D^2\) and \(D^3\) separately.
I hope my calculations are correct, since this is my first time doing them. Either way, I'm satisfied with the result from a pedagogical point of view. I hope you are as well. As we can see, the common denominator was put in \(Z_3\) to avoid intermediate divisions. Now we can add many points together without any division and only do one single division when we want to go back to our 2D plane:
\[ \begin{align*} Z_{\text{inv}} &= Z^{-1} \\ (x, y) &= (XZ_{\text{inv}}, YZ_{\text{inv}}) \end{align*} \]
That's one slow (at least in \(\mathbb{Z}_p\)) division and 2 fast multiplications.
Note that the \(P=Q\) case is handled similarly. Moreover, the same approach will also work for the Jacobian projection, i.e. for \((x, y) = (X/Z^2, Y/Z^3)\).
This is almost a philosophical observation. When we substitute \(x\) with \(X/Z\), we're not promoting \(x\) to a fraction, but we're reexpressing it as a fraction, since they're assumed to be equal.
For example, if \(x\) is a simple integer in \(\mathbb{Z}\) (without mod) and we replace it with \(X/Z\) where \(X\) and \(Z\) are also in \(\mathbb{Z}\), then we're ranking up from integers to rational numbers, which is a promotion. Indeed, unless \(X\) is divisible by \(Z\) or we're willing to lose some information, we won't be able to go back to \(x\) when the time comes.
In the ECDSA case, though, \(x\) is in \(\mathbb{Z}_p\), with \(p\) prime, so \(X/Z\) is also in \(\mathbb{Z}_p\): we're not promoting \(x\) to something more, but just reexpressing it.
Let's say we have a huge matrix that can be factorized into two small matrices because it's low-rank (i.e. many of its rows or columns are linear combinations of a selected few). Instead of carrying around the huge matrix during the computations, we may want to keep it in factorized form, \((L, R)\), and then update the factorized form itself:
\[(L', R') = (f(L, R), g(L, R))\]
One way to find \(f\) and \(g\) is to do the substitution \(M = LR\) into whatever expression we want to evaluate involving \(M\), and put the result back into factorized form. Note that if we end up with \(L=I\) or \(R=I\) (where \(I\) is the identity matrix), then the factorization is useless for our purposes.
ECDSA uses the addition operation we defined above to generate a group from a predetermined generator \(G\). All the considered points are on the curve mod \(p\), meaning that their coordinates are in \(\mathbb{Z}_p\). Here's the group:
\[G = \{0, G, 2G, 3G, \ldots, (N-1)G\}\]
Notice that \(G\) has only a finite number of elements, which is to be expected since the points lie on a \(p\times p\) grid, which contains \(p^2\) distinct points at most.
We start from \(0\) and keep adding \(G\) until we loop, i.e. we get a point that we've already seen. Let's assume this is our current list:
\[0, G, 2G, 3G, \ldots, kG, \ldots, hG\]
We assume we've just looped, so the first \(h\) elements are all distinct, and \(hG = kG\), with \(k < h\).
We must have \(0 = hG-kG = (h-k)G\). The only elements in the list that can be 0 are the first one and the last one. Since \(h-k>0\), \((h-k)G\) must be the last element, so \(h-k=h\), which gives \(k=0\). This means that when we loop we restart from \(0\).
So we end up with the following group:
\[G = \{0, G, 2G, 3G, \ldots, (N-1)G\}\]
\(G\) has order \(N\), i.e. it has \(N\) elements. This looping should remind you of \(\mathbb{Z}_N\):
\[\mathbb{Z}_N = \{0, 1, 2, 3, \ldots, N-1\}\]
Indeed, \(\mathbb{Z}_N\) is also a (commutative) group:
Moreover, it's generated by \(1\):
\[\mathbb{Z}_N = \{0, 1, 1+1, 1+1+1, \ldots, (N-1)1\}\]
If we couldn't inspect the single elements, and we just used the group laws, we'd actually be unable to tell \(G\) and \(\mathbb{Z}_N\) apart.
For instance, let's say we're given, as programmers, an opaque type Element together with an identity zero and an operation add. Would we be able to tell whether we're working with \(G\) or \(\mathbb{Z}_N\) without looking inside Element or at the implementation? No, we wouldn't. By defining a common API, we abstracted away the differences.
So, we've got ourselves an isomorphism between groups:
\[aG + bG = (a+b)G\]
More formally, let's define \(f: a\mapsto aG\), which is a bijection, i.e. a 1-1 mapping between all the elements of \(\mathbb{Z}_N\) and all the elements of \(G\). This means that we can invert \(f\) and use \(f^{-1}\) to go the other direction (however computationally expensive it is to do).
Then the equation above can be rewritten as
\[f(a) +_G f(b) = f(a+_{\mathbb{Z}_N}b)\]
That means that we can either
The final result will be the same.
Another way of saying this is that we can move back and forth between \(G\) and \(\mathbb{Z}_N\) without losing information.
For example, for \(N = 7\):
In the first case the looping comes from \(G\), while in the second case it comes from \(\mathbb{Z}_7\). In the second case we only worked inside the parentheses, i.e. with the numbers in \(\mathbb{Z}_7\): we didn't touch \(G\) at all.
The idea is to work with numbers in \(\mathbb{Z}_N\), which are easier to work with, and then transform them into points by multiplying them by \(G\). While it's very easy to go from \(k\) to \(kG\), it's computationally infeasible to go back from \(kG\) to \(k\).
It shouldn't surprise that ECDSA represents private keys as numbers in \(\mathbb{Z}_N\), and then multiplies them by \(G\) to get the associated public keys. This makes recovering the private key from a public key computationally infeasible.
I'll cheat a little and read my notes for the signing part of the algorithm:
[](#__codelineno-2-1)* Message to sign: [](#__codelineno-2-2) z = 256-bit message digest [](#__codelineno-2-3) [](#__codelineno-2-4)* Signature generation (in F_n, i.e. mod n): [](#__codelineno-2-5) k = rand_unif({1, ..., n-1}) # ephemeral nonce [](#__codelineno-2-6) R = k * G = (R_x, R_y) [](#__codelineno-2-7) r = R_x mod n # 1st component [](#__codelineno-2-8) if r = 0, restart and choose a new k [](#__codelineno-2-9) s = (k^{-1} * (z + r * d)) mod n # 2nd component [d = account private key] [](#__codelineno-2-10) if s = 0, restart and choose a new k [](#__codelineno-2-11) v = R_y mod 2 # AKA recid, recovery_id, or is_y_odd [](#__codelineno-2-12) signature = (r, s, v)
Let's see:
Note that I wrote \(F_n\) or, better, \(\mathbb{F}_n\) instead of \(\mathbb{Z}_n\) because, when \(n\) is prime, the latter is actually a field. The coordinates we've been working with for all this time are in \(\mathbb{Z}_p\), which is also a field, since \(p\) is prime. That's why I wrote "some 3D space" before: depending on which field we choose, we'll end up with a different 3D space.
We basically already observed that \(\mathbb{Z}_n\), with \(n\) prime, is a field, but we never spelled it out.
That's actually why our math works both in the continuous case and mod \(p\). The theory only requires that the coordinates are elements of a field. It doesn't matter which one.
A field \(\mathbb{F}\) is a set equipped with two binary operations, addition and multiplication, such that:
Note that \(\mathbb{F}\setminus\{0\}\) means "\(\mathbb{F}\) minus \(\{0\}\)", i.e. \(\mathbb{F}\) without the element \(0\).
\(\mathbb{Z}_n\setminus\{0\}\), with \(n\) prime, is a group under multiplication because:
The \(0\) element has no multiplicative inverse since there's no \(x\) such that \(0\cdot x = 1\). That would be against the very definition of \(0\) as the identity for the addition operation.
When \(n\) is not prime, we lose the group under multiplication because the inverse doesn't exist for all elements. For instance, let's consider mod \(6\):
Since \(3\) and \(6\) are not coprime, \(3\) has no inverse.
When \(n\) is not a prime number, \(\mathbb{Z}_n\) is just a (commutative) ring, which has a weaker structure.
Well-known examples of fields are:
The integers are clearly not a field since, for instance, \(3x = 1\) has no integer solution, so \(3\) has no multiplicative inverse. So, by adding a mod \(p\), with \(p\) prime, we gain more structure, and we get ourselves a field!
Back to the algorithm. The first two lines are pretty easy:
Since \(d\) is the private key, then \(dG\) is the public key. We'll call it \(Q\).
Given \(r\), \(s\), \(Q\), and the message, we can verify the signature by:
We know that \(R = kG\) and that \(s\) contains \(k\), but in inverse form. If we invert \(s\) and multiply it by \(G\), we get
\[s^{-1}G = (z+rd)^{-1}kG = (z+rd)^{-1}R\]
Mhm... if we had \(d\), we could compute \((z+rd)\) and use it to cancel \((z+rd)^{-1}\) and get \(R\).
While we don't have \(d\), we do have \(Q = dG\), which means that although we can't compute \((z + rd)\), we can compute \((z + rd)G\):
\[(z + rd)G = (zG + rQ)\]
If we multiply that by \(s^{-1}\) we get
\[s^{-1}(zG + rQ) = s^{-1}G(z+rd) = (z+rd)^{-1}R(z+rd) = R\]
In words, we factor out \(G\) from \(zG + rQ\) and form \(s^{-1}G\), which is just \((z+rd)^{-1}R\), as we saw at the beginning.
We did it! Now we check that \(r = R_x\ \mathrm{mod}\ n\).
Here's the algorithm:
Observe that \(((zw)G + (rw)Q)\) is more efficient than \(s^{-1}(zG + rQ)\) because, in the former, \(w\) multiplies two numbers, while, in the latter, \(s^{-1}\) multiplies a point.
Now the only problem is that in Ethereum the public key \(Q\) doesn't come with the signature. However, we can recover it from \(r\), \(s\), and \(v\).
As we know, \(Q = dG\) and \(s = k^{-1}(z+rd)\), so we should try multiplying \(s\) by \(G\):
\[sG = k^{-1}(z+rd)G = k^{-1}(zG + rQ)\]
To solve for \(Q\), we need to get rid of that \(k^{-1}\). We can't just multiply \(sG\) by \(k\) because we don't know \(k\)... but wait:
\[(zG + rQ) = ksG = s(kG) = sR\]
Therefore:
\[Q = r^{-1}(sR - zG)\]
Unfortunately, we don't know \(R\). But can we recover it? We know \(r = R_x\), so we only need to recover \(R_y\), since \(R = (R_x, R_y)\). We're forgetting something, though: \(r = R_x\ \mathrm{mod}\ n\). Recall that the coordinates of the points on the curve are in \(\mathbb{Z}_p\), not \(\mathbb{Z}_n\). We need to recover the original \(R_x\) from \(r\).
We know that \(R_x \in \{0, \ldots, p-1\}\), and, apparently, \(n\) is just a little smaller than \(p\), which means that \(R_x = r + jn\) for some very small \(j\). We start from \(j = 0\) and keep trying as long as \(r + jn < p\). We say that \(r + jn\) is a candidate for \(R_x\). Let's call the current candidate simply \(x\).
Given \(x\), we can recover \(y\) by using the equation \(y^2 = x^3 + 7\ (\mathrm{mod}\ p)\) itself and solve for \(y\). There are fast algorithms to do that. If there's no solution, we try the next candidate. Otherwise, we get two possible solutions: \(y\) and \(-y\). If you recall, \(v = R_y\ \mathrm{mod}\ 2\), which tells us the solution to pick:
One might wonder why we preserve the least significant bit of \(R_y\) to select the correct \(y\). That's because if \(y\) is in \(\{0, \ldots, p-1\}\), then \(-y\ \mathrm{mod}\ p = p-y\). It's clear that \(y + (p-y) = p\), which is odd (being a big prime), which implies that only one of \(y\) and \(-y\) can be odd (or even) mod \(p\).
Anyway, once we have \(R = (x, y)\), we compute \(Q = r^{-1}(sR - zG)\).
Now we must check that \(Q\) is valid, i.e. that \(Q\) is on the curve. If it's not, then we try the next candidate.
We should actually check that \(Q\) is in \(G\), but, apparently, \(G\) contains all the solutions of \(y^2 = x^3 + 7\ \mathrm{mod}\ p\), so if \(Q\) is on the curve then it's also in \(G\).
We left this for last, but after all we've been through, this is disappointingly straightforward.
If \((r, s, v)\) is a signature created by signing a message \(M\) with a private key \(d\), then so is \((r, n-s, 1-v)\).
That's it. The problem arises when someone blacklists \((r, s, v)\) (once it's been used) believing that this will prevent double spending. An attacker will use the signature \((r, n-s, 1-v)\) to send the same message for a second time, bypassing the blacklist.
Instead, programs should use nonces contained directly in the messages and blacklist the nonces or the messages themselves.
Let's see why both signatures are valid.
Let's recall the signing algorithm:
[](#__codelineno-3-1)* Message to sign: [](#__codelineno-3-2) z = 256-bit message digest [](#__codelineno-3-3) [](#__codelineno-3-4)* Signature generation (in F_n, i.e. mod n): [](#__codelineno-3-5) k = rand_unif({1, ..., n-1}) # ephemeral nonce [](#__codelineno-3-6) R = k * G = (R_x, R_y) [](#__codelineno-3-7) r = R_x mod n # 1st component [](#__codelineno-3-8) if r = 0, restart and choose a new k [](#__codelineno-3-9) s = (k^{-1} * (z + r * d)) mod n # 2nd component [d = account private key] [](#__codelineno-3-10) if s = 0, restart and choose a new k [](#__codelineno-3-11) v = R_y mod 2 # AKA recid, recovery_id, or is_y_odd [](#__codelineno-3-12) signature = (r, s, v)
Now let's see what happens if we use \(-k\) instead of \(k\):
\[ \begin{align*} k' &= -k \\ R' &= k'G = -kG = -R = (R_x, p-R_y) \\ r' &= R'_x\ \mathrm{mod}\ n = R_x\ \mathrm{mod}\ n = r \\ s' &= (k'^{-1} (z + r'd))\ \mathrm{mod}\ n \\ &= -(k^{-1}(z + rd))\ \mathrm{mod}\ n \\ &= -(k^{-1}(z + rd)\ \mathrm{mod}\ n)\ \mathrm{mod}\ n \\ &= -s\ \mathrm{mod}\ n \\ &= n-s \\ v' &= R'_y\ \mathrm{mod}\ 2 = (p-R_y)\ \mathrm{mod}\ 2 = 1-v \\ (r', s', v') &= (r, n-s, 1-v) \\ \end{align*} \]
That is, given the signature computed with \(k\), we can trivially get the one computed with \(-k\).
Basically, by using \(-k\) instead of \(k\), we reflect \(R\) across the X-axis and flip \(v\) to signal that we switched the \(y\) coordinate.
I hope you enjoyed the ride and deepened your understanding of ECDSA as much as I did.
If you spot any errors, you're welcome to open an issue or leave a comment below, but keep in mind that the article's nature will remain as stated in the disclaimer.
I won't be revisiting this years later unless something truly significant comes up.
Until next time!