Everything is logarithms

The baseless log here is just a torsor [0]!

Lots of things are torsors: position, currency values, calendar dates etc. the vales themselves are arbitrary, and translating/scaling them by some value doesn't make a functional difference. Torsors let us talk about these things without needing to make such an arbitrary choice a priori.

In the case of baseless logs, the underlying set is "information units", i.e. log 2 is bits, log e is nats, log 10 is digits, etc. The conversion factors give us the torsor's group, and picking a privileged unit is just a trivialization of the torsor.

The vector division notation is, similarly, encoding a g-torsor in precisely the same way as length units are.

The examples so far are all torsors with abelian groups, but specifying position both requires choosing an origin and a length unit. The group of this torsor is a suitable semidirect product between translation and scaling, which gives a non-abelian group.

Most of the time we just implicitly choose a trivialization, which often causes confusion because it identifies objects with operations on them, e.g. conflating vectors as positions with vectors as translations. The author's treatise on problems with geometric algebra [1] even brings up this point!

[0]:https://math.ucr.edu/home/baez/torsors.html

[1]:https://alexkritchevsky.com/2024/02/28/geometric-algebra.htm...

Logs are awesome. I started a math textbook from the 1920's a while ago, and all the calculations relied on tabulated logs, where you would convert the number to a log in a table to reduce the operation's degree, then convert back to the ordinary representation. This would reduce operations like finding cubed roots to division, would could be converted to log-log to be further reduced to subtraction before you would restore to ordinary notation. It feels like you're using a magic wormhole or something when you're doing this stuff by hand, it's really neat.

This essay needs a type system. Every time it says “log” it should say: log of what, into what?

It’s like audio where people say "dB" as if it answers the next question. Relative to what, measured how, and weighted for whom?

Author should brush up on https://en.wikipedia.org/wiki/Lie_theory

The term "baseless logarithm" is really nonsensical and using it would be a great mistake.

Nonetheless, where the author of TFA is correct is that logarithms are a single physical quantity, like length, area or volume, and that choosing the so called "base" is choosing the unit of measurement for logarithms.

Logarithms are included in the dimensional formulae of many derived physical quantities, e.g. for describing the attenuation or amplification of waves during their propagation, where one uses quantities like logarithm per length and logarithm per time.

Changing the "base" of logarithms modifies the numeric values of all derived physical quantities exactly in the same manner as changing any other fundamental unit of measurement, like the unit of length or the unit of time.

Like for any physical quantity, the complete value of a logarithm is independent of the unit of measurement, because it is the product between the numeric value and the unit of measurement. When the unit of measurement is changed, both the numeric value and the unit are changed and the product stays the same (i.e. the logarithm corresponds to the same ratio, regardless what base is used to compute a numeric value for the logarithm).

Nowadays, the unit of logarithms is normally chosen between the octave (binary logarithms), neper (hyperbolic logarithms) or bel (decimal logarithms).

The units of measurement for logarithms are not the bases, but the logarithms of the bases, which is why e.g. the value of the number "e", the base of the hyperbolic logarithms, is never needed in any computation. The only values that are needed are "ln 2" or its inverse "log2 e", which are used to convert the numeric values of logarithms when the unit of measurement is changed between those corresponding to binary logarithms and to hyperbolic logarithms (a.k.a. natural logarithms, but there is nothing more "natural" about hyperbolic logarithms than about any other kind of logarithms).

I think what's going on with the complex logarithm is basically the same as the logarithm that outputs the set of all possible bases for a vector space. The complex logarithm produces a Z-torsor, and the basis logarithm produces a GL(V)-torsor. There's probably some way to represent a choice of branch cut as a part of the choice of the base of the complex logarithm, and similarly the choice of a specific basis as part of the choice of base of the vector space base logarithm.

All this would be way more interesting if it actually helped to demonstrate a novel mathematical fact. Right now it's more like notational play.

Wasn't there some scientific paper recently that proved that every operation can be represented as a logarithm? Like, the same as every logic gate can be derived from NAND gates

This sentiment (not necessarily the content) is what I'm striving to communicate with Mag World[0] (website and podcast so far).

[0] magworld.pw

Does this answer the question of why we see hyperoperations until exponentiation in physics, but not higher?

I can't believe he called normal logarithms 'based'

IIRC, Knuth use lg for logarithm base 2.

That's a lot of ways to think about logarithms.

Logarithms are laughably simple once you've fully internalized the meaning of the log function; it simply answers the question:

"To what power must I raise the base to get the argument?"

This is why the output tapers out as you increase the argument; because even if you increase the argument exponentially, you only need a fixed increment in the power to reach that number... So if you increase the argument only by a fixed amount (linearly) instead of exponentially, then it makes sense that the output will grow sub-linearly.

I remember when I was doing algebra with logs many years ago at school, I was applying rules to remove the log from one side of the equation.

Then when I got to uni, I had to revise the rules but it was kind of silly of me because those rules can be trivially derived if you just think about what the log function means. Turns out I had been solving equations with logs throughout school without understanding what they even meant... It's only at university that I actually bothered to learn them.

Actually TBH. I didn't even fully understand powers for some time even though I was doing calculus with them at school. I only fully understood powers once I properly internalized the concept of k-ary trees as a proxy.

It's one thing to be able to apply something, another to understand it. And I think to innovate with something, as a tool, it's not enough to be able to apply it. You must understand it.

Look, the whole thing actually makes sense and the core idea is pretty cool because it's true that a lot of stuff in math looks identical. But in my opinion this is way too much of a macro-level overgeneralization and you risk throwing everything into the same pot, which ends up diluting the actual point of things.I mean, if you take a hammer and a meat mallet, at the end of the day they're both chunks of metal used to hit stuff, but if you bunch them together without making any distinction, you lose track of why you use one to drive nails into a wall and the other to prep cutlets.Saying everything is just one big logarithm is a nice mental exercise, but I feel like it flattens out the differences too much and makes you lose the practical utility of the individual math tools, which are meant to solve completely different problems.

I can't believe he called normal logarithms 'based'

The term "baseless logarithm" is really nonsensical and using it would be a great mistake.

Nowadays, the unit of logarithms is normally chosen between the octave (binary logarithms), neper (hyperbolic logarithms) or bel (decimal logarithms).

Wasn't there some scientific paper recently that proved that every operation can be represented as a logarithm? Like, the same as every logic gate can be derived from NAND gates

This sentiment (not necessarily the content) is what I'm striving to communicate with Mag World[0] (website and podcast so far).

[0] magworld.pw

IIRC, Knuth use lg for logarithm base 2.

The baseless log here is just a torsor [0]!

The vector division notation is, similarly, encoding a g-torsor in precisely the same way as length units are.

[0]:https://math.ucr.edu/home/baez/torsors.html

[1]:https://alexkritchevsky.com/2024/02/28/geometric-algebra.htm...

I do know about torsors actually but I didn't think to link it from there. I guess I don't find the term very useful; it feels like things are still hard to think about even after you know it's a torsor!---but also, I think I need to get more familiar with the concept, because the other commenter on here who described my basis-logarithm as a "GL(V)-torsor" really said it much more succinctly than what I was hacking out manually.

Regardless of the terminology, I thought it was interesting because I have never seen the logarithm thought about in that way.

Using the term "torsor" for that mathematical concept has been a very bad choice, both because the concept does not have any obvious relationship with the meaning of the word and because the word "torsor" had already been used for a very long time in classical mechanics for a very different concept, i.e. for the quantity that must be null for a rigid body to stay in equilibrium (i.e. the pair of a resultant force and a resultant torque).

Unfortunately, in mathematics there already is a long tradition of reusing common words to designate concepts that have no relationship whatsoever with the original meanings of those words. This obfuscates the content of many mathematical books or research papers, because even when they state trivial facts the statements are opaque for those unfamiliar with the specific jargon used in that niche branch of mathematics.

Thanks for sharing, very interesting. I wonder how this maps to swe

The physical version of that magic wormhole is called a slide rule.

Got a PDF? I love old books like this.

care to share the name of the said book?

This essay needs a type system. Every time it says “log” it should say: log of what, into what?

It’s like audio where people say "dB" as if it answers the next question. Relative to what, measured how, and weighted for whom?

Author should brush up on https://en.wikipedia.org/wiki/Lie_theory

The important properties of the logarithm are structural: we usually do not care about units or bases, except when carrying out an actual numerical computation.

As developed in the article, informally, but somewhat sufficiently, the change of base formula shows that the choice of base is largely irrelevant: different bases give equivalent logarithms up to a constant factor.

The Taylor expansion of exp gives a more intrinsic and general definition of the exponential function. This allows exp to be generalised structurally to many algebraic settings, provided the relevant convergence conditions are met: for example, the complex exponential and its many possible logs, the matrix exponential, and so on…

I still don't understand why audio dB are negative. That's relative to what? What happens at 0dB?

The first section details how the author thinks of "log N" with no base as an abstract object rather than a number. Or what are you referring to?

Interesting, it did not occur to me of those as two instances of the same phenomenon. Although I still find the complex analytic one hard to think about.

All this would be way more interesting if it actually helped to demonstrate a novel mathematical fact. Right now it's more like notational play.

I read this kind of essay as a certain part of the arc by which new thoughts are formed: an act of large-scale pattern matching, laying out a bunch of cases which resemble each other, searching for the essential basis of the resemblance.

To post such a pattern allows the thought process to become distributed. Perhaps someone else will see the insight.

Does this answer the question of why we see hyperoperations until exponentiation in physics, but not higher?

I think that's more about integrations/differentials not producing them (generally speaking). Physics likes to deal with integrals and differentiation as you calculate change over time or over spatial dimensions.

Eg. the integral of x^10 is x^11 / 11 + c. No hyper-operation appears and it's just another exponential (with a division).

The integral of log(x) is xlog(x) - x + c. So still basically just a logarithm

Even the integral of 2^x is just 2^x / log(2). Still basically the same thing.

There's no easy way to pull a hyper-operation out.

That's a lot of ways to think about logarithms.

Logarithms are laughably simple once you've fully internalized the meaning of the log function; it simply answers the question:

"To what power must I raise the base to get the argument?"

I remember when I was doing algebra with logs many years ago at school, I was applying rules to remove the log from one side of the equation.

It's one thing to be able to apply something, another to understand it. And I think to innovate with something, as a tool, it's not enough to be able to apply it. You must understand it.

A better way to understand logarithms is to start with the original motivation from Napier himself (https://sites.pitt.edu/~super1/lecture/lec44911/005.htm);

Seeing there is nothing (right well-beloved Students of the Mathematics) that is so troublesome to mathematical practice, nor that doth more molest and hinder calculators, than the multiplications, divisions, square and cubical extractions of great numbers, which besides the tedious expense of time are for the most part subject to many slippery errors, I began therefore to consider in my mind by what certain and ready art I might remove those hindrances. And having thought upon many things to this purpose, I found at length some excellent brief rules to be treated of (perhaps) hereafter. But amongst all, none more profitable than this which together with the hard and tedious multiplications, divisions, and extractions of roots, doth also cast away from the work itself even the very numbers themselves that are to be multiplied, divided and resolved into roots, and putteth other numbers in their place which perform as much as they can do, only by addition and subtraction, division by two or division by three.

This is what provides the intuition viz; convert multiplication/division/etc. of large numbers into addition/subtraction of two other smaller numbers. Logarithms as inverse of Exponentiation came much later. Starting with this generally confuses the student since they do not understand the point of it all.

From https://en.wikipedia.org/wiki/History_of_logarithms;

Napier conceived the logarithm as the relationship between two particles moving along a line, one at constant speed and the other at a speed proportional to its distance from a fixed endpoint.

Since the speed is directly proportional to its remaining distance from the fixed endpoint, it therefore is a deceleration, which results in the characteristic "flattening" of the curve.

Further details for understanding the above can be found at Priority, Parallel Discovery, and Pre-eminence: Napier, Burgi and the Early History of the Logarithm Relation (pdf) - http://www.numdam.org/item/RHM_2012__18_2_223_0.pdf

Thanks for sharing, very interesting. I wonder how this maps to swe

Interesting, it did not occur to me of those as two instances of the same phenomenon. Although I still find the complex analytic one hard to think about.

A better way to understand logarithms is to start with the original motivation from Napier himself (https://sites.pitt.edu/~super1/lecture/lec44911/005.htm);

From https://en.wikipedia.org/wiki/History_of_logarithms;

Napier conceived the logarithm as the relationship between two particles moving along a line, one at constant speed and the other at a speed proportional to its distance from a fixed endpoint.

Since the speed is directly proportional to its remaining distance from the fixed endpoint, it therefore is a deceleration, which results in the characteristic "flattening" of the curve.

What made you want to understand it or did it happen upon you in college

I'm a programmer so to me this brings to mind the idea of classes and subclasses. A program is implemented by having a set of classes. The classes can be organized into a class-hierarchy where they inherit methods from their ancestor-classes.

Now assume originally you did not have the feature of inheritance in your programming language so you would just create all the classes you need without orgnizing them into an inheritance-tree. Then you upgraded to a language that doe shave inheritance and you wanted to refactor your program to omit duplicate definitions of methods.

What kind of class-hierarchy would you come up with? There is no single way to do it. Some ways are better than others. There migh be more than one optimal way.

Same goes with generalization general, it is part of the language we create to describe things and there are many different languages we may come up with, some simpler, some more difficult to understand.

Regardless of the terminology, I thought it was interesting because I have never seen the logarithm thought about in that way.

Thanks for the article. I do think your more elementary approach is good pedagogy since the subject is so broadly familiar already. I just like torsors, since they elegantly encode the "arbitrary choice" needed to deal with lots of objects.

Thanks for the writeup!

Words happen more than they are chosen, cf. "computer". The term "torsor" in this sense likely comes from the French "torseur" [0], which was used to describe rigid-body motions via a fundamental screw-like action.

The hypothesis seems to be that the idea of affine spaces came out of that theory, for whatever reason, which was subsequently generalized to principle bundles and finally into what we have now. The point is that, at every step along the way, we want to connect the incrementally new ideas to existing ones, and creating a hard break with new, idiosyncratic terminology is itself obfuscatory.

My beef is more with use of the heavily-overloaded words "regular" and "normal" in math, which just seems like lazy naming:

> In the normal extension K/Q, every normal subgroup of the regular representation acts on a normal scheme that is regular in codimension one, whose normal bundle — orthonormal to the regular surface at each regular value — carries a normal operator whose spectrum follows a normal distribution over a space that is at once regular and normal, all indexed by a regular cardinal.

That's like 8 different meanings of normal and 6 different meanings of regular. lol

[0]:https://fr.wikipedia.org/wiki/Torseur

Yeah, see this thread --- I assume these guys haven't heard of the other meaning neither

https://golem.ph.utexas.edu/category/2013/06/torsors_and_enr...

Consider in particular that use of ‘distance’

>I think you can look at adjoint profunctors from the unit category and show that they consist of giving a consistent ‘distance’ to every object, which in a torsor will be represented.

Got a PDF? I love old books like this.

The physical version of that magic wormhole is called a slide rule.

Another neat application, if a bit simplistic, are these mechanical paper computer that let you figure out your body-mass-index. They are basically two disks with logarithmic scales on them that you rotate relative to each other. Like a slide-rule, but circular. I think you can find them under the name 'BMI wheel'.

The important properties of the logarithm are structural: we usually do not care about units or bases, except when carrying out an actual numerical computation.

> The important properties of the logarithm are structural: we usually do not care about units or bases, except when carrying out an actual numerical computation.

Units are important as a sort-of type system, even at the conceptual level.

You are right that bases are not as important conceptually.

care to share the name of the said book?

Trigonometry for Navigating Officers by WP Winter

https://www.google.com/books/edition/Trigonometry_for_Naviga...

I found this book because I was a little rusty on my trig and most celestial navigation texts will just throw the PZX equation (and others) at you without breaking down what's actually being done with it on a mathematical level...it's just kind of treated like a magical black box without any discussion, and I'd rather have a complete understanding of what I'm doing and why. Having an application-specific approach also makes it a lot easier to learn.

I'm using it with Norie's Nautical Tables, which has the log tables and a whole lot else:

https://bluewaterweb.com/product/nories-nautical-tables-2025...

I'm sure there are plenty of free PDF's of log tables you can find though.

(I believe they used log tables on boats primarily because it's easier to use than a slide rule when everything is constantly rocking back and forth.)

The first section details how the author thinks of "log N" with no base as an abstract object rather than a number. Or what are you referring to?

The first section is the good part.

The later reuse of “log” across valuations, dimension, vector fields, orders of vanishing is not so good. Those may be related ideas, but each needs a type signature: from what, to what, and preserving which operation?

I still don't understand why audio dB are negative. That's relative to what? What happens at 0dB?

Well, the brightness of celestial objects is also sometimes negative:

> The apparent magnitude of known objects can range from −26.832 for our Sun to about +31.5 for objects in deep space imaged by the Hubble Space Telescope.[3]

See https://en.wikipedia.org/wiki/Apparent_magnitude

0db is usually defined as the loudest sound that the audio system can produce. Hence, everything else must be negative.

That is dB full scale where 0 is an absolute ceiling and you can deduct from there.

What kind of class-hierarchy would you come up with? There is no single way to do it. Some ways are better than others. There migh be more than one optimal way.

Yeah, see this thread --- I assume these guys haven't heard of the other meaning neither

https://golem.ph.utexas.edu/category/2013/06/torsors_and_enr...

Consider in particular that use of ‘distance’

>I think you can look at adjoint profunctors from the unit category and show that they consist of giving a consistent ‘distance’ to every object, which in a torsor will be represented.

> The important properties of the logarithm are structural: we usually do not care about units or bases, except when carrying out an actual numerical computation.

Units are important as a sort-of type system, even at the conceptual level.

You are right that bases are not as important conceptually.

Trigonometry for Navigating Officers by WP Winter

https://www.google.com/books/edition/Trigonometry_for_Naviga...

I'm using it with Norie's Nautical Tables, which has the log tables and a whole lot else:

https://bluewaterweb.com/product/nories-nautical-tables-2025...

I'm sure there are plenty of free PDF's of log tables you can find though.

(I believe they used log tables on boats primarily because it's easier to use than a slide rule when everything is constantly rocking back and forth.)

Well, the brightness of celestial objects is also sometimes negative:

> The apparent magnitude of known objects can range from −26.832 for our Sun to about +31.5 for objects in deep space imaged by the Hubble Space Telescope.[3]

See https://en.wikipedia.org/wiki/Apparent_magnitude

To post such a pattern allows the thought process to become distributed. Perhaps someone else will see the insight.

Eg. the integral of x^10 is x^11 / 11 + c. No hyper-operation appears and it's just another exponential (with a division).

The integral of log(x) is xlog(x) - x + c. So still basically just a logarithm

Even the integral of 2^x is just 2^x / log(2). Still basically the same thing.

There's no easy way to pull a hyper-operation out.

I find my explanation simpler.

// The power to which I must raise 10 to get 100 is 2.

log10(100) = 2

// The power to which I must raise 10 to get 1000 is 3.

log10(1000) = 3

// The power to which I must raise 3 to get 27 is 3.

log3(27) = 3

Also it makes solving equations much more intuitive:

log3(x) = 4

^ This means; the power to which I must raise 3 to get x is 4. So it follows logically that if I raise 3 to the power of 4, I will get x. This makes it intuitive that this equation can be rewritten as:

x = 3 ^ 4

You don't even need to know the algebraic rule. I felt retarded when I figured this out. This was a rule I had memorized before. It's even dumber and easier to infer than the rule to compute derivatives. I wonder why teachers even bother to teach you all these rules when they could just explain the fundamentals to you.

What made you want to understand it or did it happen upon you in college

It happened during college.

I had a weird relationship with Math growing up; I alternated between getting very high grades and terrible grades depending on the teacher. I didn't like all the notations and conventions of Math and the way it was taught, but I enjoyed it conceptually. It had ended badly in high school as I did poorly in advanced Math though I did quite well in all my other subjects so I got into a good Software Engineering degree at a top 50 university for engineering globally anyway.

But early in college, it occurred to me that I didn't understand Math concepts as intuitively as I understood programming concepts so I challenged myself to revisit everything from the beginning including numbers, addition, subtraction, fractions, roots, powers, probabilities, derivatives, integrals, vectors, matrices, calculus...

I had to free myself from thinking of Math as symbols on a piece of paper and think of it as being about actual quantities, transformations and combinations. I needed a completely new way to think about it and visualize every single step. When I was practicing calculus, I would stop at each step and try to visualize the equation. For example, when finding the 3D plane perpendicular to a point on a 3D curve, I would put effort into visualizing what happened to the equations across different dimensions at each step when I found the partial derivatives and combined them to get the 3D plane vectors.

My Math grades at university were quite good. I passed all the Math courses with ease and got several distinctions even.

The first section is the good part.

Or, to say a little more explicitly what you're getting at: when you take a logarithm of some quantity, log x, x absolutely must be unitless. There's no way whatsoever to take a logarithm of something with a unit attached. (This is an important and useful dimensional analysis check in formulas and long calculations!)

So what do you do in practice? You have to normalize: you don't calculate log x, but instead log x/U for some scaling unit U. It's typical for U to be something like 1 mV or 1 W in electrical engineering, for example. This is completely legitimate, but it does mean that the thing that comes out needs a corresponding unit attached to it: dBmV, dBW, et cetera.

And it's really kind of important to be careful about that.

0db is usually defined as the loudest sound that the audio system can produce. Hence, everything else must be negative.

More specifically, 0 dB is the loudest sound the audio system is rated to produce without distortion. It's common to be able to actually drive systems harder than their specified engineering limits, which is why meters have a short positive dB section marked in red.

Of course, typical of the wonderful ambiguity of decibels, 0 dB is also usually defined as the quietest sound that the human ear can perceive.

https://en.wikipedia.org/wiki/Absolute_threshold_of_hearing

And it's really kind of important to be careful about that.

It happened during college.

My Math grades at university were quite good. I passed all the Math courses with ease and got several distinctions even.

That is dB full scale where 0 is an absolute ceiling and you can deduct from there.

Of course, typical of the wonderful ambiguity of decibels, 0 dB is also usually defined as the quietest sound that the human ear can perceive.

https://en.wikipedia.org/wiki/Absolute_threshold_of_hearing

That's why important to give the scale. dBfs is full scale level, and db SPL is sound pressure level.

My beef is more with use of the heavily-overloaded words "regular" and "normal" in math, which just seems like lazy naming:

That's like 8 different meanings of normal and 6 different meanings of regular. lol

[0]:https://fr.wikipedia.org/wiki/Torseur

"computer" happened a while ago, it's usage predates the electronic computers as:

"a person who makes calculations, especially with a calculating machine."

Google ngram view:

https://books.google.com/ngrams/graph?content=computer&year_...

Thanks for the writeup!

glad you liked it

I wonder if we should really just call them... vectors? Like the thing that torsors do, being defined only relative to a choice of origin in some space / group, is exactly what displacement vectors do. So really they are just generalizations of the concept of a vector. (In this scheme I would be careful to _not_ refer to points as vectors, so as to reserve the term for things that act like, well, torsors. I happen to think that much pedagogical harm has been done by not distinguishing the two concepts, points and displacements, early on.)

I find my explanation simpler.

// The power to which I must raise 10 to get 100 is 2.

log10(100) = 2

// The power to which I must raise 10 to get 1000 is 3.

log10(1000) = 3

// The power to which I must raise 3 to get 27 is 3.

log3(27) = 3

Also it makes solving equations much more intuitive:

log3(x) = 4

x = 3 ^ 4

That is just the definition of Logarithm which is what is taught to all students today i.e.

Given a^x = b we define log_a(b) = x where 'a' is a +ve real number - https://en.wikipedia.org/wiki/Logarithm#Definition

The above wikipedia page also details the properties, applications and generalization of the logarithm concept which are non-trivial.

As i pointed out above, that does not help in intuiting why it is helpful and needed. That is why you need to read the history of logarithms and see how we arrived at the above standard.

Napier actually calculated logarithms of sines for every minute from 0-90degrees to simplify astronomical calculations. The complexity/sizes involved, precision needed etc. can all be seen in this detailed paper walking you through the entire process of table construction; Napier’s ideal construction of the logarithms (pdf) - https://locomat.loria.fr/napier/napier1619construction.pdf

glad you liked it

"computer" happened a while ago, it's usage predates the electronic computers as:

"a person who makes calculations, especially with a calculating machine."

Google ngram view:

https://books.google.com/ngrams/graph?content=computer&year_...

That is just the definition of Logarithm which is what is taught to all students today i.e.

Given a^x = b we define log_a(b) = x where 'a' is a +ve real number - https://en.wikipedia.org/wiki/Logarithm#Definition

The above wikipedia page also details the properties, applications and generalization of the logarithm concept which are non-trivial.

As i pointed out above, that does not help in intuiting why it is helpful and needed. That is why you need to read the history of logarithms and see how we arrived at the above standard.

Some connections between things, which I have not seen elsewhere. Maybe they mean something?

1. The Baseless Logarithm

Normally one writes a logarithm with a base, \(\log_b (x)\), to mean

\[y = \log_b (x) \Lra b^y = x\]

And then you can change the base of the logarithm with

\[\log_b (x) = \frac{\log_a (x)}{\log_a(b)}\]

Which follows from rearranging \(\log_a (x) = \log_a (b^{\log_b x}) = \log_b (x) \times \log_a (b)\).

One way of thinking about what this formula does is that it is a change of units, akin to writing \(2 \text{ km} = 2000 \text{ m} / \frac{1000 \text{ m}}{1 \text{ km}}\) or \(5 \text{ bytes} = 40 \text{ bits}/\frac{8 \text{ bits}}{1\text{ byte}}\). It says: how many copies of \(b\) are in \(x\)? It’s the number of copies of \(a\) in \(x\), divided by the number of copies of \(a\) that are in \(b\).

This is perfectly simple, but for some reason it’s hard to think about logarithms that way. The notation kind of… obfuscates things? Specifically it is hard to read \(\log_b x\) as “how many copies of \(b\) are in \(x\)”, because that English expression should correspond to the notation \(x/b\), not \(\log_b x\). “How many factors of \(b\) are in \(x\)” is a bit better, but it still feels off.

I found a way of thinking about logarithms which I think makes this clearer, but you have to allow a sort of odd object that I am call the baseless logarithm. It is simply a logarithm without a base:

\[\log N\]

which we regard as an abstract object, not a number. Then we write our normal “based” logarithm as a ratio of two of these baseless logarithms:

\[\log_2 N = \frac{\log N}{\log 2}\]

Note, this is already a thing people do colloquially, e.g. leaving out the base of logarithms in asymptotic formulas. But I do not mean it as a shorthand; it is more useful to regard it as an actual algebraic object.

We interpret \(\log 2\) as being the unit “bits”. To write \(\log N\) in bits is to factor it as a multiple of \(\log 2\):

\[\log N = \frac{\log N}{\log 2} \log 2 = \log_2 (N) \log 2 = \log_2 (N) \text{ bits}\]

Then the change-of-base for logarithms follows from just writing the same geometric quantity in different units. For example \(\log e\) as a unit is sometimes called “nats”:

\[\begin{aligned} \log N = \frac{\log N}{\log 2} \log 2 = \log_2 (N) \text{ bits} = \frac{\log N}{\log e} \log e = \ln (N) \text{ nats} \end{aligned}\]

The baseless \(\log N\) is sort of the multiplicative version of an object that might be familiar from discussions of vectors. It is common with vectors to distinguish between points and displacements: a displacement vector \(\b{v}\) is given by the difference of two points \(\v = (b) - (a)\). When we write think of points as having coordinates, this involves an explicit choice of origin \(\O\), such that \(\b{a} \equiv (a) - \O\) and \(\b{b} \equiv (b) - \O\). Then a displacement vector is constructed by subtracting off the factors of \(\O\), \(\b{v} = \b{b} - \b{a} = ((b) - \O) - ((a) - \O) = (b) - (a)\). The baseless logarithm implements the same thing but with multiplication: the value \(\log N\) may be thought of as \(\log N / \log \O\) for an unspecified choice of origin; turning it into an actual numeric value involves dividing two such logarithms to cancel out the origin, \(\log_M N = \log N / \log M = (\log N / \log \O) / (\log M / \log O)\). I think of \(\log N\) as the point corresponding to \(N\) and \(\log N / \log \O\) as its corresponding displacement vector once you pick a coordinate system. The point version is more fundamental.

You might ask: if we have a baseless logarithm \(\log N\), do we also have a “baseless exponential”? Normally \(b^{\log_b N}\) can be written as something like \(b^{\log_b N} = b^{\ln N / \ln b} = e^{\ln N} = N\); is there any way to do this without actually choosing a base, like \((\ast)^{\log N}\) or something? I think the answer has to be “no”, because I can’t think of a way to make it mean anything. All we can say is that we have split the one object, a logarithm \(\log_b N\) which is the solution of \(b^y = N\), into two objects, \(\log N\) and \(\log b\), each of which on their own are without “units” and so have no numerical meaning.

So logarithms act kinda like multiplicative vectors, in the sense that they have have to defined relative to an ‘origin’, a choice of base. In fact there are many surprising similarities between logarithms and vectors, which I had fun expositing about:

2. Logarithms are Vectors

When doing vector algebra and differential geometry in a properly covariant way, we distinguish between abstract vectors and vectors in a particular coordinate system.

My personal convention for this is to refer to the abstract vectors as “geometric” vectors and always write them in bold, \(\v\), whereas “coordinate” vectors, tuples of their values in coordinates, are written with an arrow over them like \(\vec{v} = (v_x, v_y, v_z)\). Boldface geometric vectors are always coordinate-free, whereas coordinate vectors are just collections of numbers or other objects. The geometric vector \(\b{v}\) can be written as a dot product of its coordinates with a ‘frame’ \(X = (\x, \y, \z)\) of basis vectors

\[\b{v} = \vec{v} \cdot X = (v_x, v_y, v_z) \cdot (\x, \y, \z) = v_x \x + v_y \y + v_z \z\]

The projection of \(\v\) onto a basis vector \(\x\) is then given by ‘measuring’ the vector against the basis vector (which does not have to be of unit length). I like to write this as division because it acts a lot like division (although it’s technically pseudodivision instead):

\[\frac{\v}{\x} = v_x\]

That’s in my own very nonstandard notation 1 for vector division here. The more common way to write this is to project a component of a differential \(df = f_x dx + f_y dy + f_z dz\) with a partial derivative, which is also the pseudodivision operation (which is incidentally the sense in which partial derivatives kinda work like division but not really):

\[\frac{\p f}{\p x} = f_x\]

I will write things in both forms to make it easy to translate between them; I do prefer my vector-division version because it avoids bringing in the irrelevant notations of differential calculus, but since the latter is actually standard I ought to include it for comparison.

Suppose \(\b{v}\) is one-dimensional, \(\b{v} = v_x \x\). Then the projection onto a ‘measuring stick’ \(\b{m} = m \x\) measures its length in terms of multiples of \(m\):

\[\frac{\v}{\b{m}} = \frac{v_x \x}{m \x} = \frac{v_x}{m}\]

Multiplying by \(\b{m}\) again is what we mean by “writing \(\b{v}\) in units of \(\b{m}\)”:

\[\frac{\b{v}}{\b{m}} \b{m} = (\frac{v_x}{m}) (m \x)\]

Here \(m\) is the unit “meters” and \(v_x/m\) is the value of \(v_x\) written in meters. Of course to actually compute \(v_x/m\) you have to have it in units in the first place—but clearly it’s the same kind of thing as in the logarithm case, where you can think of \(\b{v}\) and \(\b{m}\) as “unitless” concepts that are compared geometrically, and then \(v_x/m\) as their projections into an aribtrary coordinate system.2

The baseless logarithm is performing the same operation on logarithms, where \(\log N\) is filling the role of the geometric vector \(\v\) and \(\log 2 = \text{bits}\) is the unit vector or measuring stick, which takes the role of \(\x\).

\[\begin{aligned} \frac{\log N}{\log 2} &= \log_2 N \\ \frac{\log N}{\log 2} \log 2 &= \log_2 N \text{ bits} \end{aligned}\]

In this sense baseless logarithms write numbers in coordinates in exactly the same way that measuring sticks write vectors in coordinates.

The equivalence of logarithms in different units

\[\begin{aligned} \log N &= \frac{\log N}{\log 2} \log 2 = \log_2 (N) \text{ bits} \\ &= \frac{\log N}{\log e} \log e = \ln (N) \text{ nats} \end{aligned}\]

is the same as the equivalence of geometric vectors in different units

\[\begin{aligned} \v &= \frac{\v}{\x} \x = v_x \x \\[1em] &= \frac{\v}{\x'} \x' = v_{\x'} \x' \\ \end{aligned}\]

\[\begin{aligned} df &= \frac{\p f}{\p x} dx = f_x dx \\ &= \frac{\p f}{\p x'} dx' = f_{x'} dx' \end{aligned}\]

And the change of base formula that computes a ratio of logarithms in different bases

\[\begin{aligned} \log_2 N \text{ bits}&= \ln N \text{ nats} \\ \log_2 N &= \frac{\text{nats}}{\text{bits}} \ln N\\ &= \frac{\log e}{\log 2} \ln N \\ &= \log_2 (e) \ln N \end{aligned}\]

is exactly like the change of coordinates for a vector, where \(\x\) and \(\x\) are two units for the same quantity.

\[\begin{aligned} v_x \x &= v_{x'} \x' \\ v_x &= \frac{\x'}{\x} v_{\x'} \\ \end{aligned}\]

or3

\[\begin{aligned} f_x dx &= f_{x'} dx' \\ f_x &= \frac{dx'}{dx} f_{x'} \end{aligned}\]

What logarithms don’t allow that vector division and differential notations easily do is to talk about a partial projection operation or a partial derivative in isolation. For example, if \(N = 2^a 3^b\), you can only talk about the “total” logarithm, the ratio with respect to a single unit \(\log 2\)

\[\frac{\log N}{\log 2} = a \frac{\log 2}{\log 2} + b \frac{\log 3}{\log 2} = a + b \log_2 3\]

which is equivalent to writing a vector as a multiple of a single basis vector (like in Clifford/geometric algebra)

\[\frac{\v}{\x} = v_x + v_y \frac{\y}{\x}\]

or to a total derivative

\[\frac{df}{dx} = f_x + f_y \frac{dy}{dx}\]

But there is no equivalent of the operation of partial differentiation, a “partial logarithm”, which would let you factor a number like

\[N \? (\log_{\p 2} N) \log 2 + (\log_{\p 3} N) \log 3\]

However, I keep finding that people have gone and invented the projection / partial derivative operation on logarithms anyway. For example, the p-adic valuation in number theory

\[\nu_p (n) = \max \{ k \in \bb{N} \mid p^k \mid n \}\]

corresponds to extracting the coefficient of \(\log p\) of an natural number in a logarithmic basis

\[\begin{aligned} \log n &= \log 2^{n_2} 3^{n_3} 5^{n_5} \cdots \\ &= n_2 \log 2 + n_3 \log 3 + n_5 \log 5 + \ldots \\ \nu_p (n) &= n_p \end{aligned}\]

Each coefficient is a positive integer, and \(\nu_p\) just takes the component corresponding to \(\log p\). Clearly \(\log n\) acts like a vector (although since the coefficients are in \(\bb{N}\) it is technically a commutative monoid instead of a vector space… nevertheless, it has the familiar structure of a vector). Since \(\nu_p\) is a ‘projection’ out of this logarithm, it still obeys logarithmic identities like \(\nu_p(m/n) = \nu_p(m) - \nu_p(n)\). But there is not really a good notation for actually expressing it as a projection, so sadly it gets a whole separate nomenclature that you have to learn.4

The same thing also works for rational \(n\) or radical \(n\) (meaning it is the product of radicals of prime factors), in which case the coefficients become integers or rationals. (As a bonus the resulting objects live in an actual vector space.)

Another example of these logarithmic projections: in complex analysis the “order of vanishing” \(\text{ord}_a f(z)\) of a meromorphic function \(f(z)\) at a point \(z=a\) is the order of the pole or zero at a point (where zeroes are like negative poles). That is, it is the degree \(n\) of the lowest-degree term in the Laurent series of the function around the point \(z=a\),

\[f(z) = f_{-n} (z-a)^{-n} + f_{-n+1} (z-a)^{-n+1} + \cdots + f_{-1} (z-a)^{-1} + f_0 + f_1 (z-a) + \cdots\]

(that is, the value of \(n\) such that \((z-a)^n f(z)\) is holomorphic around \(a\)). This is extracted with a logarithm:

\[\text{ord}_a f(z) = \lim_{z \ra a} \frac{\log f(z)}{\log (z-a)} = -n\]

since for \(z \approx a\), \(f(z) \sim f_{-n} (z-a)^{-n}\) which dominates the other terms that blow up less quickly. If we write \(g(z)\) for the rest of \(f(z)\) which has \(\text{ord}_a (g(z)) > -n\):

\[\begin{aligned} \lim_{z \ra a} \frac{\log f(z)}{\log (z-a)} &= \lim_{z \ra a} \frac{\log (f_{-n} (z-a)^{-n} + g(z))}{\log (z-a)}\\ &= \lim_{z \ra a} \frac{\log f_{-n} (z-a)^{-n} (1 + \frac{g(z)}{f_{-n}} (z-a)^n)}{\log (z-a)} \\ &= \lim_{z \ra a} \frac{\log f_{-n}}{\log (z-a)} -n \frac{\log (z-a)}{\log (z-a)} + \frac{\log (1 + c (z-a))}{\log (z-a)} \\ &= -n \end{aligned}\]

So this is a very similar operation: the limit \(\lim_{z \ra a} \log (z-b)/\log(z-a) = 1_{a=b}\) serves to cancel out the rest of the terms, like how \(\p_j dx^i \sim (\p x^i)/(\p x^j) = 1_{i=j}\) serves to cancel out the terms in a partial derivative, extracting the \(dx\) component of \(df = f_x dx + f_y dy + \ldots\).

(I’m not very good at complex analysis so that’s all I’m going to say about that. Still, it seems clear that this is basically the same operation.)

We see that the baseless logarithm \(\log n\) works a lot like a vector \(\v\) or differential \(df\), and then expressing a logarithm in a base like \(\log_2 n = \log n / \log 2\) is a lot like a total derivative \(df/dx\) or Clifford division \(\v \ast \b{x}^{-1}\). What is missing is some equivalent of the partial derivative / projection operator that projects only onto that component… but various fields have gone and Found a way to invent that anyway, either in the form of a partial derivative \(\p f/\p x\), or just by making up the \(p\)-adic valuation \(\nu_p\), or by the limits \(\lim_{z\ra a} \log f(z) / \log (z-a)\) in complex analysis. The similiarities are all suspicious, though, and I can’t help but think there is some unifying theory here that ties all this together… but I can’t see what it is yet.

One thing that we might try in order to invent a \(\log_2 N\) that acts like \(\p_x f\) or \(\b{v}/\x\) is to somehow restrict the values of the logarithms to certain spaces, e.g. integers or rationals. Since the \(\{\log p_i\}\) are linearly indepedent (which is essentially equivalent to prime factorizations being unique), you would end up with objects like \(\log_2 3 = \log_3/\log_2\) which have no value in \(\bb{Q}\); “zeroing” those out then gives something that acts like a partial derivative. But I don’t know if that’s useful. Certainly it doesn’t help in any numeric context.

Anyway, onto more things that are logarithms.

3. Vectors are also Logarithms?

In differential geometry one interprets vectors like \(\v = v_x \x + v_y \y\) being written in a basis of partial derivative operators, \(\v = v_x \p_x + v_y \p_y\). These can then be used to create discrete translations which move around in the various coordinates,

\[T^{\v} = e^{\v} = e^{v_x \p_x + v_y \p_y }\]

The partial derivatives are here in order to make it operate on functions

\[e^{v_x \p_x + v_y \p_y} f(x,y) = f(x + v_x, y + v_y)\]

which is true at the level Taylor expansions as well. I often find it easier to dispense with the partial derivatives and just think of these as translation operators on the space \((x,y)\) directly

\[e^{v_x \p_x + v_y \p_y} (x, y) = (x + v_x, y + v_y)\]

(You can think of this acting on the function \(f(x,y) = (x,y)\) also, but that feels like overkill.)

In any case, all this is really doing (in flat space, at least) is rewriting the additive vector \(\b{v}\) into a multiplicative form \(T^{\b{v}}\) which corresponds to the same operation. Things are just being written differently: its terms are multiplied instead of added, and scalar coefficients are applied via exponentiation instead of multiplication. A basis for the vector space now consists of translation operators in each coordinate:5

\[T^{\v} = e^{v_x \p_x} e^{v_y \p_y} = T_x^{v_x} T_y^{v_y}\]

(In non-flat space this is not so simple because the translations in different coordinates may not commute; you can still write it in this form but it’s a lot more complicated.)

What this means for us is: look, vectors are logarithms too!

\[\begin{aligned} \ln T^{\v} &= \ln T_x^{v_x} T_y^{v_y} \\ &= v_x \ln T_x + v_y \ln T_y \\ &= v_x \p_x + v_y \p_y \end{aligned}\]

I can’t exactly say why, but it seems preferable to have this written in terms of baseless logarithms also. We do this by realizing that \(T_x = e^{\p_x} = T^{\p_x}\) and thinking of this symbol \(T\) as a sort of ‘generic’ base for translations, absent the numeric meaning of the symbol \(e\), which has \(\log T_x = \log T^{\p_x} = \p_x \log T\). Then

\[\log T^{\v} = \v \log T = v_x \p_x \log T + v_y \p_y \log T\]

And then we can write \(\v = \log_T T^{\v} = \log T^{\v} / \log T\). This is equivalent to the natural log version but it avoids explicitly depending on the numeric value of \(e\): any choice of base for the logarithm \(T\) gives the same concept of a vector, written in terms of the exponentiation of \(T\), but now we make explicit that the ‘units’ on \(\v\) come in part from the units on \(\log T\) itself.

So vectors in differential geometry may also be thought of as logarithms, specifically, the logarithms of translation operators.

Regular multiplication can even be viewed as an example of this. A product like \(xa\) can be rewritten as “translation” in the \(\ln a\) coordinate:

\[xa = e^{\ln x} e^{\ln a} = e^{(\ln x) \p_{\, \ln a}} a = x^{\p_{\, \ln a}} a\]

I mention this because it’s cute, but I can’t imagine how it would ever be useful.

4. Logarithms are Derivatives?

This part doesn’t really connect to the rest; I just thought I would mention it so that this article contains every fun fact about logarithms that I know.

One way of defining the natural logarithm is

\[\ln x = \lim_{a \ra 0} \frac{x^a - 1}{a}\]

Which can be found by rewriting \(x^a = e^{a \ln x}\) and then Taylor expanding:

\[\frac{x^a - 1}{a} = \frac{e^{a \ln x} - 1}{a} = \frac{(1 + a \ln x + \ldots) - 1}{a} \stackrel{a \ra 0}{=} \ln x\]

Plugging in \((1+x)\) reproduces the Taylor series for \(\ln\):

\[\begin{aligned} \ln (1+x) &= \frac{(1+x)^a -1}{a} \\ &= \frac{\sum \binom{a}{k} 1^{n-k} x^k - 1}{a} \\ &= \frac{(1 + ax + \frac{a(a-1)}{2} x^2 + \ldots) - 1}{a} \\ &\stackrel{a \ra 0}{=} x - \frac{1}{2} x^2 + \frac{1}{3} x^3 - \ldots \end{aligned}\]

The \(\lim_{a \ra 0} (x^a - 1)/a\) formula for \(\ln x\) resembles a derivative. To make it explicit, we can write it as

\[\ln x = \lim_{dy \ra 0} \frac{x^{y + dy} - x^y}{dy} \mid_{y=0} = \p_{y} x^y \mid_{y =0}\]

What I like about this form is that it explains what \(\ln\) is doing in calculus comes from by connecting \(\ln x\) with \(x^0\). It always struck me as strange that \(\int x^{k} = \ln x\) for \(k=-1\), whereas it is a polynomial for all other values of \(k\). Why is a logarithm like a polynomial? Turns out it’s because in a lot of ways \(\ln x\) acts like \(x^0\). Specifically it’s the ‘interesting’ part of \(x^0\), its first order approximation around \(x=1\):

\[\ln x \sim \frac{x^0 - 1}{0}\]

Just for fun, try using \(\p_x x^k = k x^{k-1}\) on it:

\[\p_x \ln x = \p_x \frac{x^0 - 1}{0} = \frac{0 x^{-1}}{0} = \frac{1}{x}\]

That’s all I really have to say about this. But I wonder if some of the other ideas on this page would benefit from being interpreted via the \(\ln x = \p_y x^y \mid_{y=0}\) form.

5. Dimensions are Logarithms

Another thing which clearly acts like a logarithm is the dimension operator \(\dim\) in linear algebra.

Compare:

\[\begin{aligned} \dim_{K} K^n &= n \dim_K K = n \\ \dim_K U \oplus V &= \dim_K U + \dim_K V \\ \dim_K U/V &= \dim_K U - \dim_K V \\ \dim_K U \o V &= (\dim_K U) \times (\dim_K V) \\ \end{aligned}\]

(where \(\dim_K V\) means its dimension as a vector space over the base field \(K\), and assume we’re only talking about finite-dimensional spaces here) with

\[\begin{aligned} \log_k k^n &= n \log_k k = n \\ \log_k u \times v &= \log_k u + \log_k v \\ \log_k u/v &= \log_k u - \log_k v \\ \log_k k^{\log_k u \times \log_k v} &= (\log_k u) \times (\log_k v) \end{aligned}\]

The direct sum \(\oplus\) corresponds to multiplication \(\times\), which is really just a notational accident, since it is the same as the direct product on finite-dimensional vector spaces; the \(\oplus\) symbol reflects the fact that it adds bases as sets.6 Meanwhile the tensor product \(otimes\) multiplies bases on sets, but corresponds in arithmetic to a sort of “commutative exponentiation” \(e^{\log_k u \log_k v} = u^{\log_k v}\) that you don’t see very much, sometimes called a commutative hyperoperation. (The next ‘displacement’ operation after \(b-a\) and \(b/a\) is therefore \(e^{\ln b / \ln a} = b^{1/\ln a}\).)

I am a bit upset that I have never seen anyone point out that \(\dim\) is a logarithm, since it’s so obviously the case. Maybe I’m missing something? After all I am ignoring the infinite-dimensional cases entirely. But I suspect it’s just that math likes to stay on more solid rigorous ground than I do, and this is all too handwavey to be precise. I have no such qualms and I love to speculate about underappreciated connections between things, so I have no problem saying: dimension is a logarithm.

The simple reason why \(\dim_K\) acts like \(\log_k\) in the case of finite \(K\) is as follows. We need three observations:

One, the dimension of a vector space is defined as the cardinality of its basis. An individual vector \(\b{v} = v_1 \x_1 + v_2 \x_2 + \ldots + v_n \x_n \in K^n \simeq V\) can be thought of as a choice of function \(\dim_K V \ra K\), since it assigns a coefficient \(v_i \in K\) to each basis vector \(\x_i\).

Two, the cardinality of the functions between sets \(B \ra A\) is given by \(\| A \|^{\| B \|}\), which is why we use the symbol \(A^B\) for the sets \(B \ra A\). For example the powerset of \(A\), that is, the set of all possible subsets of \(A\), is notated \(2^A\) because it is equivalent to the functions \(A \ra \{ 0, 1 \} \equiv \b{2}\), where a given subset is identified with the elements that map to \(1\).

Three: applying that to a vector space \(V \simeq K^n\), we can interpret \(K^n\) as describing the set of functions from \(\b{n} = \{ \x_1, \x_2, \ldots, \x_n \}\) from a choice of basis into the underlying field \(K\), which naturally has cardinality \(\| V \| = \|K\|^{\| \dim_K V \|}\). Therefore the logarithm of this is the dimension of \(V\) over \(K\):

\[\dim_K V = \log_{\| K \|} \| V \| = \log_{\| K \|} \|K \|^{\dim_K V}\]

This is literally true in the case where \(V\) is finite dimensional and the field \(K\) is also finite. It’s less solid if either is infinite; however, I tend to think that expressions of this form are also literally true in the case of infinite dimensions, if you define things in a slightly better way. In particular you have to use a concept other than cardinality to measure the size of things if you want infinite expressions like \(\log_{\| \bb{R} \|} \| \bb{R}^2 \| = 2\) to make any sense. I am pretty sure the right choice is what’s sometimes called numerosity, although I don’t know how compatible that is with the rest of linear algebra. More on that some other day.

Anyway, even if you only take this as meaningful on cardinalities of finite-dimensional sets over finite fields, I think it’s strange that it never really comes up. It is such a natural construction! Or maybe it does and I’ve missed it. But anyway, I like it, and I happen to think the correspondence here is much stronger and more significant than what I’ve just described.

If we repeat the above with ‘baseless’ logarithms, we get expressions like

\[\dim K^n = n \dim K\]

such that

\[\dim_K V = \frac{\dim V}{\dim K}\]

This seems mostly fine to me. The one place we have to be careful is in the definition of a tensor product. We want it to be the case that

\[\dim_K K^a \o K^b = \dim_K K^a \times \dim_K K^b = a \times b\]

But the naive approach has an extra factor of \(\dim K\):

\[\dim_K (K^a \o K^b) = \frac{\dim K^a \dim K^b}{\dim K} = \frac{a \dim K b \dim K}{\dim K} = ab (\dim K)\]

The problem is that the definition of the tensor product is a bit more complicated than just multiplying bases. A vector \(\b{u} \o \b{v} \in K^a \o K^b\) is not the Cartesian product of vectors \(\b{u}\) and \(\b{v}\), but rather the Cartesian product modulo a quotient on its scalar coefficient which combines two scalars \((k_1, k_2)\) into one \((k_1 k_2)\). Since this divides out a factor of \(K\), we have to do the same with our \(\o\) operation in order to make the cardinalities work out. This is done by specifying an \(\o_K\) operation, the “tensor product with respect to the field \(K\)”, as

\[U \o_K V = K^{\dim_K U \dim_K V} = K^{\dim U \dim V / \dim K}\]

which allows \(\dim_K K^a \o_K K^b \dim_K K^{ab} = ab\) to work. (I suspect sometimes that the quotient in the definition of \(\o_K\) is not actually needed for most purposes, which would have the nice side effect of making this all work out more simply, but let’s not get into that.)

The definition

\[\dim_K K^a = \frac{\dim K^a}{\dim K} = \frac{a \dim K}{\dim K}\]

seems to imply that one could take the dimension/logarithm of a vector space with respect to a different underlying object, not the field \(K\), and get a meaningful result. For example it is my dream to be able to say that this is how you construct a vector space with a “fractional dimension” of \(\frac{1}{2}\):

\[\dim_{K^2} \? K = \frac{\dim K}{2 \dim K} = \frac{1}{2}\]

This works fine at the level of cardinalities, more or less (if you allow that the rationals are invented precisely to let you make objects like \(1/2\) which interpolate between ratios of non-divisible integers). But it is hard to imagine how it should work if you want anything like a “field” or a “vector space” with its usual axioms to be meaningful. Maybe a vector \(\b{v} \in \bb{R}^4\) is viewed as a vector over \(\bb{R}^2\) via \(\b{v} = (v_w, v_x) \cdot (\w, \x) + (v_y, v_z) \cdot (\y, \z)\)? But then how does scalar multiplication work? If the scalars are \(\in K^2\), they have zero divisors, so you are not working in a field anymore. And what is meant by a vector with dimension \(\frac{1}{2}\) would be spanned by ‘half’ a basis vector over that pseudo-field? Maybe its elements look like \(\u = (u_x, \bullet) \cdot (\x, \bullet)\)? One must attempt to define versions of the theorems of linear algebra which are compatible with this sort of decomposition. No idea how to do that at the moment, but I suspect it can be done, with sufficient imagination, and I hope to attempt it in a future article.

6. Bases are Logarithms

The dimension of a vector space is the cardinality of its basis. But just like we use expressions like \(B^A\) for functions between sets because they are respected at the level of cardinalities \(\| B \|^{\| A \|}\), we may as well interpret the \(\dim\) operator in the same way: if \(\dim\) returns the cardinality of the basis, then let’s say that \(\log\) returns the basis itself, which happens to have that cardinality. For instance if a vector space \(V \simeq K^3\) has basis \((\x, \y, \z)\), we might write

\[\begin{aligned} \log_K V &= (\x, \y, \z) \\ \end{aligned}\]

And then define \(\dim_K\) as the cardinality of this:

\[\begin{aligned} \dim_K V &= \| \log_K V \| \\ &= \| (\x, \y, \z) \| \\ &= 3 \end{aligned}\]

Why not? \((\x, \y, \z)\) is an object for which \(K^{(\x, \y, \z)} \simeq V\), sorta, therefore \(\log K^{(\x, \y, \z)} = (\x, \y, \z)\). (One could also just let \(\dim_K\) refer to both operations, perhaps, or maybe write capital \(\text{Dim}_K V\) for the same thing.) Perpaps it’s a bit weird to treat \(K^{(\x, \y, \z)}\) as a set exponentiation when the exponent is an tuple / Cartesian product, but it should be easy to adjust things to make it work.

There is an obvious issue, though. Why would this particular choice of basis be the value of \(\log_K V\), since \(V\) has very many possibly valid bases and no reason to choose one a particular one?

Maybe it is more correct to \(\log_K V\) as really being an object which refers to all possible bases of \(V\) at once (I’m not sure what it’s called. Sort of a frame bundle but with only one base point?) We can give it coordinates: the space \(X = \log_K V\) is parameterizable by coordinates \((X_0, \Lambda)\), where \(X_0 = (\x, \y, \z)\) is an arbitrary ‘origin’ frame and \(\Lambda\) is an arbitrary linear transformation \(\in GL(V)\), the automorphisms of \(V\).7 I guess we can should just write

\[X = \{ \Lambda X_0 \mid \Lambda \in GL(V) \}\]

and then the dimension itself is the cardinality of the quotient of this by \(\Lambda\), which will be a sort of generic object that represents the size of any choice of basis.

\[\dim_K V = \| \frac{\log_K V}{\Lambda} \| = \| \frac{X}{\Lambda} \|\]

If \(\log_K V = X\), then there ought to be an operation which goes the other way, that reconstructs a vector space from its basis. We may as well equate this with the linear span operation;

\[\span(X) = K^X = V\]

This is not quite how span is normally defined. Usually it’s something like: “\(\span(\x, \y, \z)\) is subspace of the (ambient) vector space \(V\) over the (ambient) field \(K\) which contains the vectors \((\x, \y, \z)\) and is of minimal dimension”. To interpret it algebraically, though we don’t really want to make reference to an “ambient” vector space or field, because it should just be an operation on the vectors itself. For this we need to at least explicitly indicate the underlying field, by writing \(\span_K\) with a subscript:

\[\span_K(X) = K^X = V\]

All of this is definitely rife with abuses of notation, and I’m not sure that it’s quite the best way to think about things. But I still wanted to mention it because it’s nice to think of the operators \(\dim\) and \(\span\) as being linear algebra analogues of \(\log\) and \(\exp\).

It is also interesting consider what might be meant by the baseless logarithm in the sense of bases. In the expression

\[\log_K K^X = \frac{\log K^X}{\log K} = \frac{X \log K}{\log K}\]

what would be meant by \(X \log K\) as a ‘basis’? Presumably the division by \(\log K\) corresponds to some sort of quotient… but we will need a way of interpreting \(\log K\) itself. Perhaps as a “basis for \(K\)”? I’m not sure. I do think there’s something here, but it gets much more speculative so I will leave it for another time.

7. Functions are Logarithms?

Treating \(\log_K K^n = n\) as returning a basis for \(K^n\) as a set is an example of a general procedure which doesn’t quite have a name as far as I know. It is sort of like categorification, but not quite. Rather than locating categories for set operations, we’re locating sets for algebraic operations, and not making any reference to categories really. So I’m not sure. Maybe ‘setification’? Or ‘structurization’? I dunno.

The standard example of this ‘setification’ is to treat arithmetic operations on natural numbers like \(A+B\), \(AB\) and \(B^A\) as being projections out of set operations \(A \sqcup B\), \(A \times B\), and \(B^A\) (the functions \(A \ra B\)). This works nicely for finite sets because the operations respect cardinalities. (As mentioned earlier, I think you have to replace ‘cardinality’ with something like ‘numerosity’ to make this work elegantly on infinite sets, and I don’t know how that works yet.)

A compelling reason for thinking this way is that the setified arithmetic operations in fact explicitly enumerate the sets they describe. For example, given sets \(A = \{ a, b \}\) and \(X = \{ x, y \}\), you can expand \(A^X\) algebraically, the fact that all the variables will later equal \(1\):

\[(a+b)^{x+y} = (a+b)^x (a+b)^y = (a^x + b^x)(a^y + b^y) = a^x b^x + a^x b^y + a^y b^x + a^y b^y\]

Then upon actually setting the variables to \(1\) this correctly describes the relationship in cardinalities: \(2^2 = 1 + 1 + 1 + 1\), so the number of functions \(X \ra A\) is \(4\). What’s interesting is that it also describes the sets themselves. Each term in the expanded sum is one of the four possible functions \(X \ra A\) exactly when we interpret \(a^x b^y\) as the function which maps \(x \ra a\) and \(y \ra b\). Also, evaluation of these variables corresponds to evaluating the functions, e.g. setting e.g. \(x=1\) and \(y=0\) to get \(a^x b^y \mapsto a^1 b^0 = a\), and setting one variable but leaving the other gives restriction, e.g. \(y=0\) sets \(a^x b^y \mapsto a^x\).

All of this basically also works if the variables have values other than \(1\), in which case they represent unlabeled sets of whatever cardinality; however, the algebraic manipulations \((a+b)^x = a^x + b^x\) are not valid and you have to use a binomial expansion instead.

You can do similar constructions with a lot of combinatoric objects, although they don’t always so cleanly correspond to algebraic manipulations. Factorials look like

\[\begin{aligned} (a+b+c)! &= a^a b^b c^c + a^a b^c c^b + a^b b^a c^c + a^b b^c c^a + a^c b^b c^a + a^c b^a c^b \end{aligned}\]

which enumerates the \(3! = 6\) permutations of \(3\) elements. Combinations looke like

\[\begin{aligned} \binom{a+b+c}{x+y} &= \frac{1}{x^x y^y+x^y y^x}[ a^x b^y + a^y b^x + a^x c^y + a^y c^x + b^x c^y + b^y c^x] \\ &= a^{q} b^q + b^q c^q + c^q a^q \end{aligned}\]

which enumerates the \(\binom{3}{2} = 3\) \(2\)-element combinations of \(3\) elements; here the \(\frac{1}{x^x y^y+x^y y^x}\) corresponds to the quotient by \((x+y)!\). Dividing through by the number of permutations implements the quotient \(x \sim y\) that avoids double counting, and \(q\) is a new variable that represents carrying out this quotient (I’m not sure if this is the best way to write this). Note that although all these variables will end up equaling \(1\), by leaving them as independent variables they track meaningful information from step to step.

I suspect that every arithmetic identity has some equivalent set-ified expression like this. I also notice that a lot of information is lost when you map these set expressions back onto arithmetic. For example, you elide the distinctions between all possible quotients that lead to the same cardinality. Probably there is a lot of interesting structure there.

Anyway, for our purposes, I want to observe one thing about these. When thinking of functions as sets we usually picture them as ‘relations’: a function \(f: X \ra A\) is modeled as the set

\[\begin{aligned} f = \{ (x, f(x)) \mid x \in X \} \subset X \times A \\ \end{aligned}\]

E.g. \(\{ (x,a), (y, b) \} = xa + yb\) in our example. This set happens to have the cardinality \(\| f \| = \| X \|\), although it’s not clear what use that is.

Now consider \((a+b)^{x+y} = a^x b^x + a^x b^y + a^y b^x + a^y b^y\) from earlier. If \(a^x b^y\) is supposed to describe a single function from \(X = \{ x, y\}\) to \(A = \{ a,b \}\) , then why doesn’t it setify to something like \(\{ (x, a), (y, b) \}\), with cardinality \(2\)?

Maybe you see where I’m going with this. \(f = a^x b^y\) has cardinality \(1\), because it’s one function. Its logarithm, however, looks closer to its representation as a relation:

\[\log f \? x \log (a) + y \log (b)\]

It’s a lot like \(xa + yb\), but it’s also suspiciously different. Also, it doesn’t have a cardinality since we need to divide by a base, but when we do it seems like any choice we make has to give the cardinality \(\log_b f = \log f / \log b = x \log_b a + y \log_b b = x (0) + y(0) = 0\). What do we make of this?

After thinking this for a while I still don’t really feel like I have a good explanation for it, but I think we are supposed to think of it as equivalent to \(x a + yb\), just with the \(a\) and \(b\) written in a different basis, so it is more like a comparison between \(a \o x + b \o y\) and \(a \o \log x + b \o \log y\) than between numeric expressions. The cardinality being \(0\) doesn’t matter, because it’s not meaningful to talk about the cardinality of a function. And the role of \(\log x\) is just to change algebras for \(x\) from multiplicative to additive, but the two objects are supposed to be isomorphic and regarded as the same, at least in this case where the cardinality doesn’t mean anything.

I’m not sure about this part, and might come back and rewrite it later if I find a better interpretation. In any case I think it is interesting (or amusing, maybe) that \(\log f = \log a^x b^y\) gives something that at least resembles the function’s representation as a relation. Everything is logarithms?

8. Everything is Logarithms

What we have been discussing is the most simple and well-behaved version of a logarithm in mathematics, the isomorphism between the additive real algebra \((\bb{R}, +)\) and the multiplicative one \((\bb{R}^{\geq 0}, \times)\). Of course there are logarithms in mathematics which are more complicated than that, such as the complex logarithm \(\log z = \text{Log } z + 2\pi i k \mid k \in \bb{Z}\), or its messier cousins like the logarithm of a matrix. But I suspect these are a confusion of concepts. What’s really going on in the logarithm on \(\bb{C}\), for instance, is that angles really take their values \(\in S_1\), not \(\bb{R}\), which has a different topology, and the weird behavior follows from not respecting this. A different set of conventions would move the problem out of the logarithm and into the definitions of the values themselves. Unfortunately that’s not how things are are defined today so you have to deal with it—but, still, it doesn’t seem like the logarithm’s fault to me.

Anyway the discussion in this article ignores those cases and assumes that \(\log\) really is an isomorphism: it’s just a way of taking something expressed in a multiplicative form and re-expresses it in an additive form. This is turns out corresponds to many operations that one learns in math, such as the \(\dim\) operator in linear algebra and the \(\nu_p\) operation in number theory (sorta) and the total derivative in calculus (also sorta).

All of these things which appear to be very different seem to in some way be instances of the same basic primitives. And although these associations arise from my sort of… numerology… I can’t shake the feeling that it’s all too clean to not matter. Perhaps math needs to clean this all up: we are somehow missing the forest for the trees by keeping all this redundancy buried in the notations; actually there are only a few basic operations which are being written differently everywhere, and with all the patterns disguised everything is a lot harder than it needs to be. I suspect the patterns I’ve written about in this article should not feel like things I had to rediscover for myself. They follow naturally from the material that everybody learns.

I also keep finding that the math of physics seems to end up at a lot of the same structure. I first noticed these patterns in the operator formulation of quantum mechanics because it seems to insist on a certain ontology for its mathematics. I wonder if this is because physics is telling us how things “should be done”. Since in physics the mathematics is a human lens through which we view reality, the math must not impose its own views on how things are done, and any views you accidentally impose eventually clash with the requirements of the physics.

This is the idea behind the concept general covariance, that the properties of objects are independent of the coordinates we use to express them, and so the meaningful theorems about reality end up being expressed in coordinate-free ways. The same philosophy applied to linear algebra or differential geometry leads to their covariant formulations that are indisputably ‘better’ than the forms in coordinates.

The baseless logarithm, which seems somewhat nonsensical mathematically, is an example of this applied on purely mathematical terms. It basically says that the isomorphism from multiplicative to additive algebraic representations of the same thing is separate from the choice of units on those algebras, but most of its properties are unrelated to the units. Just like how the concept of a geometric vector is distinct from its projection onto a particular coordinate system. Meanwhile a bunch of other things with other notations are basically the same operation as the logarithm, or closely related to it.

When you take general covariance to its extreme you end up asking that all of your mathematics be formulated in a covariant way, as explicit relations between one thing you measure to another thing you measure. For example we think of say having a certain cardinality, but in fact cardinality is a property of the set that we measure, and we have to be clear about how we do that because it’s all relative to the “coordinate system” for those measurements. Such a formulation is necessary to find the answers to why questions about how mathematics works, about what is ‘actually’ going on independent of the human definitions and frameworks of set or category theory or whatever. The observations in this article are not very deep, but they seem to me to be among the many clues which point towards that formulation. I still can’t see it, though.

I hope to write a better standalone article about this notation soon. I’ve been trying to do so for a few years now but I seem to start losing my sanity whenever I try to work on it so it hasn’t happened yet. When I do finally manage to do it I’ll update this. ↩
In differentials, this operation the differential of \(f\) but restricted to its \(dx\) component: \(\frac{\p f}{\p x} dx = f_x dx = df \mid_{x}\). This is a perfectly interesting object, a covariant derivative on the foliations of the \(x\) coordinate, I believe (if I have that right), but it’s not normally written this way. ↩
The \(f_x = \p_x f = \p f / \p x\) notation for partial derivatives is unfortunate; it should be \(df_{dx}\), to indicate that it is the “\(dx\) component” of the vector \(df\), or \(d_x f\), meaning the \(x\) component of \(d\) acting on \(f\). Better yet it would be \((\p f)_{\p x} = \p_x f\) and the \(d\) symbol would be retired, but that seems like a tall order. ↩
There is also a thing called an arithmetic derivative and a corresponding partial derivative \(D_p(n) = \nu_p(n)/p\), but as far as I can tell it’s not quite the same thing and not what I’m looking for. ↩
If you happen to have a vector in a polar form like \(\b{v} = v_r e^{R v_{\theta}}\), that refers to a second layer of exponential representation, via \(T^{\v} = T^{v_r e^{\p_{\theta} v_{\theta}}(\p_x)}\), where \(\p_x\) is a choice of origin for the rotational \(\theta\) coordinate (which may be multidimensional as well). ↩
Apparently the \(\oplus\) symbol is due to Bourbaki because everything was a mess prior to that. Also it happens to be a coproduct (which came later) and those do correspond to addition on sets, so there is at least a connection to addition… but at present I think it is largely a mistake. ↩
The technical term is that it is a \(GL(V)\)-torsor since the choice of origin \(X\) is arbitrary. The concept is easier to understand from Baez. This is one of those mathematical terms which I don’t like because it is so simple that it should not really have a special name (nor such a technical Wikipedia article). ↩

Hacker Times