This argument is very much in line with Mike Acton's data-driven design philosophy [1] -- understand the actual specific problem you need to solve, not the abstract general problem. Understand the statistical distribution of actual problem instances, they'll have parameters in some range. Understand the hardware you're building writing software for, understand its finite capacity & capability.
It's common that new algorithms or data-structures with superior asymptotic efficiency are less performant for smaller problem sizes vs simpler alternatives. As always, it depends on the specifics of any given application.
[1] see Mike's CppCon 2014 talk "Data-Oriented Design and C++" https://www.youtube.com/watch?v=rX0ItVEVjHc
(side note: does anyone else get thrown off by the Epilogue font? It looks very wide in some cases and very narrow in others... makes me want to override it with Stylus if my employer hadn't blocked browser extensions for security reasons, which raises the question of why I am even reading the article on this computer....)
https://www.quantamagazine.org/new-method-is-the-fastest-way...
Supposing one uses a 'trie' as a priority queue, the inserts and pops are effectively constant.
I read this article a few days ago I'm sure, word-for-word, but it wasn't on this site in OP? It stood out because when it mentioned textbooks and said "including ours" I looked at the site and thought to myself "they do textbooks?".
this was a lot of words that sum up to "I heard that new algorithm exists but spent zero effort actually evaluating it"
https://youtu.be/3ge-AywiFxs?si=TbcRsBNkzGhpOxQ4&t=842
(timestamped=) He shows a derivation that at best, a sorting algorithm can do is O(n log(n)) for n real positive numbers.
It's actually common for algorithms with a lower asymptotic complexity to be worse in practice, a classic example is matrix multiplication.
Also please, please, can we stop with the "eww, math" reactions?
> The new approach claims order (m log^(2/3) n) which is clearly going to be less for large enough n. (I had to take a refresher course on log notation before I could even write that sentence with any confidence.)
I'm sure the author is just exaggerating, he's clearly very competent, but it's a sentence with the vibes of "I can't do 7x8 without a calculator."
if you don't care about W or it is essentially constant - then it can be dropped
but it is an input parameter that will change execution time
"sorting" means assigning things into bins (which are usually ordered).
(There's also the problem of how you define your computational model. You can do better than O(n log n) in transdichotomous models. I'm assuming the hand-wavy, naive model the average algorithms class goes along with.)
Indeed: https://systemsapproach.org/books-html/
If you are cheap on money, but you do have time, and like to get into networking, I can only highly recommend https://book.systemsapproach.org/
> if you don't care about W or it is essentially constant - then it can be dropped
Also, every algorithm that ends in a real computer is bound to a constant time. That's still not a practical thing to do.
But sorting by arbitrary strings like names can’t avoid comparison sort.
Also, if you're taking an average of floating point numbers, you might want to sort it first and add from smallest to largest, to better preserve precision
I struggle to see the point of your comment. The blog post in question does not say that the paper in question claims to be faster in practice. It simply is examining if the new algorithm has any application in network routing; what is wrong with that?
That said the finite alphabet and bounded length requirements can be softened a bit. Even for general sorting algorithms.
I mean, for the kind of lexicographic sotable data we're talking about you can basically pick a convenient alphabet size without cost.
And unbounded length is not that big an obstruction. Sure you are going to need O(n log(n)) comparisons. But you can't compare data of unbounded length in constant time anyway. In the end you end up taking an amount of time that is at least proportional to the amount of data, which is optimal up to a constant factor. And if you fiddle with radix sort enough you can get it within something similar.
Basic ASCII strings and tuples aren't that big an obstruction. Fractions are more complicated.
Really the O(n log(n)) for comparison based sorts and O(N) for radix sort mean something different. One is the number of comparisons to the number of elements, and the other closer to the number of operations per amount of data. Though that assumes O(1) swaps, which is technically incorrect for data that doesn't fit a 64 bit computer.
The transdichotomous model looks interesting.
You can generally reduce the problem to a finite alphabet by taking the finite subset that actually appears in the input.
If you have an unbounded length then you can make sorting O(l n) where `l` is a bound on the lengths of your input. It's still linear in n, and also better than the O(l n logn) you would with traditional comparison based algorithms once you factor in the O(l) complexity of the comparison function for such elements.
If m > n (log n)^{1/3}
Then this algorithm is slower.
for 1 Million nodes, if the average degree is >3.5, the new algorithm has worse complexity (ignoring unstated constant factors)
https://arxiv.org/pdf/2504.17033 - Linked from the second sentence of the submission, not hard to track down. And the complexity (by which I presume you mean algorithmic complexity) is stated in the submission and in the PDF linked by the submission and that I just shared with you.
You might substitute "sorted by height" but its certainly not a correction. While "ordered into lines" would be an error.
The unstated implication is that the theory tracks with real world behavior. This is more or less the case for simple problems: your O(n^2) algorithm won't scale, your database query without an index will take forever and so on, it's definitely a useful and high-signal way of thinking about computational problems.
Obviously modern hardware isn't much like a 70s machine and things like "all memory accesses are O(1)" are so wrong it's not even funny.
It's a pure thought experiment with no possible counterfactual, but I think if you tried to develop basic CS theory from a purely mathematical angle (e.g. consider a machine defined so and so with basic operations costing 1, let's run with this ball without caring much about the real world) you'd naturally end up with some (but not all) of the concepts we're used to. For example, arrays and buffers are very natural. We need more space than our basic unit of memory can accomodate, let's use several in a row. Pointers also follow very nicely, and with them structures like linked lists. Other stuff like B-trees and to some extent hash-tables and friends very much less so, they're definitely "imported" from real-world usage.
You can generally sort any array in constant time by taking that constant to be the time it takes to sort the array using bubble sort.
e.g Two body gravity I can just do the math and get exact answers out. But for N> 2 bodies that doesn't work and it's not that I need to think a bit harder, maybe crack out some graduate textbooks to find a formula, if I did hopefully the grad books say "Three body problem generally not amenable to solution". I will need to do an approximation, exact answers are not available (except in a few edge cases).
If you don’t have large numbers of repeats of each element, then l needs to scale like O(log n), so O(l * n) is at least O(n log n).
Fundamentally, what’s going on here is that switching between computation models can easily add and remove log factors.
And all of that was wasted time since it seems that this just isn't at all applicable to A* heuristics the way Dijkstra's is. It's only an improvement in a very specific case.
That's actually given as a reason to study NP-completeness in the classic 1979 book "Computers and Intractability: A Guide to the Theory of NP-Completeness" by Garey & Johnson, which is one of the most cited references in computer science literature.
Chapter one starts with a fictional example. Say you have been trying to develop an algorithm at work that validates designs for new products. After much work you haven't found anything better than exhaustive search, which is too slow.
You don't want to tell your boss "I can't find an efficient algorithm. I guess I'm just too dumb".
What you'd like to do is prove that the problem is inherently intractable, so you could confidently tell your boss "I can't find an efficient algorithm, because no such algorithm is possible!".
Unfortunately, the authors note, proving intractability is also often very hard. Even the best theoreticians have been stymied trying to prove commonly encountered hard problems are intractable. That's where the book comes in:
> However, having read this book, you have discovered something almost as good. The theory of NP-completeness provides many straightforward techniques for proving that a given problem is “just as hard” as a large number of other problems that are widely recognized as being difficult and that have been confounding the experts for years.
Using the techniques from the book you prove the problem is NP-complete. Then you can go to your boss and announce "I can't find an efficient algorithm, but neither can all these famous people". The authors note that at the very least this informs your boss that it won't do any good to fire you and hire another algorithms expert. They go on:
> Of course, our own bosses would frown upon our writing this book if its sole purpose was to protect the jobs of algorithm designers. Indeed, discovering that a problem is NP-complete is usually just the beginning of work on that problem.
...
> However, the knowledge that it is NP-complete does provide valuable information about what lines of approach have the potential of being most productive. Certainly the search for an efficient, exact algorithm should be accorded low priority. It is now more appropriate to concentrate on other, less ambitious, approaches. For example, you might look for efficient algorithms that solve various special cases of the general problem. You might look for algorithms that, though not guaranteed to run quickly, seem likely to do so most of the time. Or you might even relax the problem somewhat, looking for a fast algorithm that merely finds designs that meet most of the component specifications. In short, the primary application of the theory of NP-completeness is to assist algorithm designers in directing their problem-solving efforts toward those approaches that have the greatest likelihood of leading to useful algorithms.
I guess you mean "at least O(n*log(n))".
A maximally sparse connected graph by mean (degree edge/node ratio) is any tree (mean degree ~ 1), not necessarily a linked list.
In order to have n/10 unique elements, you need to be able to construct n/10 different strings, which means that L needs to be at least log_(base = how many distinct symbols you have)(n/10), which is O(log n). So you have L * n = O(n log n) symbols to write down, and even reading the input takes time O(n log n).
As a programmer, it's very very easy to think "64-bit integers can encode numbers up to 2^64, and 2^64 is HUUUUGE, so I'll imagine that my variables can store any integer". But asymptotic complexity is all about what happens when inputs get arbitrarily large, and your favorite computer's registers and memory cells cannot store arbitrarily large values, and you end up with extra factors of log n that you need to deal with.
P.S. For fun, you can try to extend the above analysis to the case where the number of unique elements is sublinear in the number of elements. The argument almost carries straight through if there are at least n^c unique elements for 0 < c < 1 (as the c turns into a constant factor when you take the log), but there's room to quibble: if the number of unique elements is sublinear in n, one might argue that one could write down a complete representation of the input and especially the sorted output in space that is sublinear in L * n. So then the problem would need to be defined a bit more carefully, for example by specifying the the input format is literally just a delimited list of the element values in input order.
Last year a couple of people forwarded to me the same article on a new method of finding shortest paths in networks. The underlying research claims to improve on the classic approach pioneered by Dijkstra that is taught in most networking textbooks (including ours). I was initially a bit skeptical, much as I would be if I read that the Riemann Hypothesis had been proved. Dijkstra is a legend in computer science and his algorithm, which he published in 1959, predates packet switching by a few years. The specification for OSPF, one of two dominant link-state routing protocols (IS-IS is the other), is unusually detailed in its guidance to implementors, and it basically tells them to use Dijkstra’s algorithm. And that is what most implementations have done for decades, with a few minor improvements through the years to speed up the implementation but no real change to the fundamentals.
The new algorithm is no minor tweak to Dijkstra’s but a significantly different approach. Its headline claim is that, whereas Dijkstra requires a sorting operation, and thus is only able to perform as well as the best sorting algorithm, this new approach “breaks the sorting barrier”. That is, it avoids the need for sorting, and is able to deliver better bounds on performance than Dijkstra.
While I don’t consider myself qualified to evaluate the paper that describes the new algorithm, it has passed peer-review at a top-tier conference (ACM Symposium on the Theory of Computing) and received enough scrutiny that I don’t doubt that the theory works. The question I want to discuss here is: does it matter?
Two main issues came immediately to mind when I tried to assess the implications of a theoretical improvement in algorithmic performance. The first is, what are the actual scaling limits that we need to address in a real routing system. For example, the running time of Dijkstra’s algorithm is on the order of (n log n + m) for a network of n vertices (routers) and m edges (links). The new approach claims order (m log__2/3 n) which is clearly going to be less for large enough n. (I had to take a refresher course on log notation before I could even write that sentence with any confidence.) The problem with assessing the scaling properties of anything is you have to wonder how big n must get before it makes a difference. There are constant factors that might differ between the two approaches; at small n, a “less scalable” approach might actually display better performance.
One of my first jobs was part of the team building a scalable packet switch based on arrays of small switching elements. (This is where I got to build the accidental SmartNIC). We had papers to show that we could build switches with thousands of ports at 155 Mbps, in an era when shared Ethernet ran at 10 Mbps and had not yet been overtaken by switched Ethernet. While we at Bellcore were investing lots of time and money to put together a set of 32-port prototype switches, FORE systems delivered commercially successful 16-port switches. Almost certainly they were not as scalable as our design, but it turned out that n=16 was a pretty useful capacity for switches with 155 Mbps links in the 1990s. We felt quite sad that our research seemed to have been overtaken by commercial products. My takeaway was that scalability was a good thing to strive for but that sometimes you might achieve a good result with a less scalable solution. One of my favorite text books, “Principles of Computer Systems Design”, uses the example of stopping a supertanker to demonstrate the scaling limits of a system. The fact that supertankers have a scaling limit doesn’t prevent them from being the backbone of oil shipping. You just need to avoid making them too big.
What’s a large value for n in SPF calculations? I checked in with a couple of colleagues to update my recollection of how many routers you might find in a big OSPF or IS-IS backbone. In my memory it was in the hundreds; for the largest service provider networks today it seems to be in the small number of thousands. So that’s not tiny but it’s small compared to, say, the number of prefixes carried in BGP.
And it doesn’t seem to be limited by the performance of the SPF calculation, as I will explain below.
Another memory I have from my time working for Big Router was an analysis of all the things that go into the performance of routing protocols. I was working on MPLS in its early days, and we were pretty excited about a technology called “fast reroute” (FRR) which uses MPLS to divert packets around a failed link without waiting for routing to converge after the failure. But at the same time, the people responsible for routing protocol development were hard at work improving routing convergence times. One of the most important things, it turns out, for both MPLS and standard routing, was simply detecting failure faster. For example, if you have to wait for tens of seconds of missing OSPF Hello packets before you declare a link down, it doesn’t really matter if you can calculate the shortest path in a fraction of a second. This was the reasoning behind the creation of BFD (Bidirectional Forwarding Detection): a fast mechanism, independent of routing, by which link failures could be detected for any type of link.
Beyond fast failure detection, other factors playing into routing convergence include: the time to send a new link state packet out a link and propagation delay across the network; time to receive such a packet and dispatch it to the right process in the operating system; SPF time; time to update the routing information base; time to calculate the impact on the forwarding table; time to push any forwarding table updates out to the line cards (on big routers that have such things); time to flood the link state packet out to neighbors. All of these steps have been analyzed and optimized over the years, to the point where sub-second routing convergence is now routine. As early as 2003 the improvements to all the steps above had enabled sub-second convergence, as this NANOG presentation shows. Yes, you couldn’t afford to spend 10 seconds doing an SPF calculation if you wanted fast convergence, but that was already a solved issue by then. Optimizing all the other parts was at least as important as improving the SPF calculation time.
Finally, I chatted with another colleague about this topic and he reminded me of a reason that Dijkstra’s algorithm remains the implementation method of choice: it can be understood by the people who have to write code. Dijkstra himself puts it well in a 2001 interview:
The publication is still readable, it is, in fact, quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. I learned later that one of the advantages of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities.
In other words, Keep It Simple, Stupid. I would certainly rather point an engineer at the OSPF spec than send them off to understand the hybrid Bellman-Ford/Dijkstra approach that might shave a few milliseconds off the non-bottleneck part of routing convergence. Maybe one day someone will write the explanation of the novel SPF approach that is as clear as Dijkstra’s paper and the OSPF spec. And the hybrid algorithm might be great for large mapping applications. But I don’t see Dijkstra’s algorithm being replaced in production routers any time soon.
Following up on our recent post on Frank Gehry, the New York Review of Books has a good article on him from an actual architecture critic.
Datacenters in space are a terrible idea and you can tell the FCC if you’re motivated.
Wiki Education has studied the AI slop being submitted to Wikipedia and found that LLM-generated references are often plausible-looking but don’t contain the stated content. It’s almost like LLMs don’t know what they are talking about. There is a project to clean up such articles.