A flawed paper in management science has been cited more than 6k times

I developed and maintain a large and very widely used open source agent-based modeling toolkit. It's designed to be very highly efficient: that's its calling card. But it's old: I released its first version around 2003 and have been updating it ever since.

Recently I was made aware by colleagues of a publication by authors of a new agent-based modeling toolkit in a different, hipper programming language. They compared their system to others, including mine, and made kind of a big checklist of who's better in what, and no surprise, theirs came out on top. But digging deeper, it quickly became clear that they didn't understand how to run my software correctly; and in many other places they bent over backwards to cherry-pick, and made a lot of bold and completely wrong claims. Correcting the record would place their software far below mine.

Mind you, I'm VERY happy to see newer toolkits which are better than mine -- I wrote this thing over 20 years ago after all, and have since moved on. But several colleagues demanded I do so. After a lot of back-and-forth however, it became clear that the journal's editor was too embarrassed and didn't want to require a retraction or revision. And the authors kept coming up with excuses for their errors. So the journal quietly dropped the complaint.

I'm afraid that this is very common.

Nowadays high citation numbers don't mean anymore what they used to. I've seen too many highly cited papers with issues that keep getting referenced, probably because people don't really read the sources anymore and just copy-paste the citations.

On my side-project todo list, I have an idea for a scientific service that overlays a "trust" network over the citation graph. Papers that uncritically cite other work that contains well-known issues should get tagged as "potentially tainted". Authors and institutions that accumulate too many of such sketchy works should be labeled equally. Over time this would provide an additional useful signal vs. just raw citation numbers. You could also look for citation rings and tag them. I think that could be quite useful but requires a bit of work.

Pretty much all fields have shit papers, but if you ever feel the need to develop a superiority complex, take a vacation from your STEM field and have a look at what your university offers under the "business"-anything label. If anyone in those fields manages to produce anything of quality, they're defying the odds and should be considered one of the greats along the line of Euclid, Galileo Galilei, or Isaac Newton - because they surely didn't have many shoulders to stand on either.

> Stop citing single studies as definitive. They are not. Check if the ones you are reading or citing have been replicated.

And from the comments:

> From my experience in social science, including some experience in managment studies specifically, researchers regularly belief things – and will even give policy advice based on those beliefs – that have not even been seriously tested, or have straight up been refuted.

Sometimes people use fewer than one non replicatable studies. They invent studies and use that! An example is the "Harvard Goal Study" that is often trotted out at self-review time at companies. The supposed study suggests that people who write down their goals are more likely to achieve them than people who do not. However, Harvard itself cannot find such a study existing:

https://ask.library.harvard.edu/faq/82314

The root of the problem is referred to implicitly: publish or perish. To get tenure, you need publications, preferably highly cited, and money, which comes from grants that your peers (mostly from other institutions) decide on. So the mutual back scratching begins, and the publication mill keeps churning out papers whose main value is the career of the author and --through citation-- influential peers, truth be damned.

There is a surprisingly large amount of bad science out there. And we know it. One of my favourite writeup on the subject: John P. A. Ioannidis: Why Most Published Research Findings Are False

https://pmc.ncbi.nlm.nih.gov/articles/PMC1182327/pdf/pmed.00...

Family member tried to do work relying on previous results from a biotech lab. Couldn’t do it. Tried to reproduce. Doesn’t work. Checked work carefully. Faked. Switched labs and research subject. Risky career move, but. Now has a career. Old lab is in mental black box. Never to be touched again.

Talked about it years ago https://news.ycombinator.com/item?id=26125867

Others said they’d never seen it. So maybe it’s rare. But no one will tell you even if they encounter. Guaranteed career blackball.

> I’ve been in the car with some drunk drivers, some dangerous drivers, who could easily have killed people: that’s a bad thing to do, but I wouldn’t say these were bad people.

If this isn't bad people, then who can ever be called bad people? The word "bad" loses its meaning if you explain away every bad deed by such people as something else. Putting other people's lives at risk by deciding to drive when you are drunk sounds like very bad people to me.

> They’re living in a world in which doing the bad thing–covering up error, refusing to admit they don’t have the evidence to back up their conclusions–is easy, whereas doing the good thing is hard.

I don't understand this line of reasoning. So if people do bad things because they know they can get away with it, they aren't bad people? How does this make sense?

> As researchers they’ve been trained to never back down, to dodge all criticism.

Exactly the opposite is taught. These people are deciding not to back down and admit wrong doing out of their own accord. Not because of some "training".

Sounds like the Watergate Scandal. The crime was one thing, but it was the cover-up that caused the most damage.

Once something enters The Canon, it becomes “untouchable,” and no one wants to question it. Fairly classic human nature.

> "The most erroneous stories are those we think we know best -and therefore never scrutinize or question."

-Stephen Jay Gould

did “not impact the main text, analyses, or findings.”

Made me think of the black spoon error being off by a factor of 10 and the author also said it didn't impact the main findings.

https://statmodeling.stat.columbia.edu/2024/12/13/how-a-simp...

The webpage of the journal [1] only says 109 citations of the original article, this count only "indexed" journals, that are not guaranty to be ultra high quality but at least filter the worse "pay us to publish crap" journals.

ResearchGate says 3936 citations. I'm not sure what they are counting, probably all the pdf uploaded to ResearchGate

I'm not sure how they count 6000 citations, but I guess they are counting everything, including quotes by the vicepresident. Probably 6001 after my comment.

Quoted in the article:

>> 1. Journals should disclose comments, complaints, corrections, and retraction requests. Universities should report research integrity complaints and outcomes.

All comments, complaints, corrections, and retraction requests? Unmoderated? Einstein articles will be full of comments explaining why he is wrong, from racist to people that can spell Minkowski to save their lives. In /newest there is like one post per week from someone that discover a new physics theory with the help of ChatGPT. Sometimes it's the same guy, sometimes it's a new one.

[1] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1964011

[2] https://www.researchgate.net/publication/279944386_The_Impac...

The problem is in parts, how confirmatory statistics work, and how journals work. Most journals wouldn’t publish „we really tried very hard to get significance that x causes y but found nothing. Probably, and contrary to our prior beliefs, y is completely independent of x.“

Even if nobody would cheat and massage data, we would still have studies that do not replicate on new data. 95 % confidence means that one in twenty surveys finds an effect that is only noise. The reporting of failed hypothesis testing would really help to find these cases.

So pre-registration helps, and it would also help to establish the standard that everything needed to replicate must be published, if not in the article itself, then in an accompanying repository.

But in the brutal fight for promotion and resources, of course labs won’t share all their tricks and process knowledge. Same problem if there is an interest in using the results commercially. E.g. in EE often the method is described in general but crucial parts of the code or circuit design are held back.

Being practical, and understanding the gamification of citation counts and research metrics today, instead of going for a replication study and trying to prove a negative, I'd instead go for contrarian research which shows a different result (or possibly excludes the original result; or possibly doesn't even if it does not confirm it).

These probably have bigger chance of being published as you are providing a "novel" result, instead of fighting the get-along culture (which is, honestly, present in the workplace as well). But ultimately, they are (research-wise! but not politically) harder to do because they possibly mean you have figured out an actual thing.

Not saying this is the "right" approach, but it might be a cheaper, more practical way to get a paper turned around.

Whether we can work this out in research in a proper way is linked to whether we can work this out everywhere else? How many times have you seen people tap each other on the back despite lousy performance and no results? It's just easier to switch private positions vs research positions, so you'll have more of them not afraid to highlight bad job, and well, there's this profit that needs to pay your salary too.

The discussion has mostly revolved around the scientific system (it definitely has plenty of problems), but how about ethics?

The paper in question shows - credibly or not - that companies focusing on sustainability perform better in a variety of metrics, including generating revenue. In other words: Not only can you have companies that do less harm, but these ethically superior companies also make more money. You can have your cake and eat it too. It likely has given many people a way to align their moral compass with their need to gain status and perform well within our system.

Even if the paper is a completely fabrication, I'm convinced it has made the world a better a place. I can't help but wonder if Gelman and King paused to consider the possible repercussions of their actions, and of what kinds of motivations they might have had. The linked post briefly dips into ethics, benevolently proclaiming that the original authors of the paper are not necessarily bad people.

Which feels ironic, as it seems to me that Gelman and King are doing the wrong here.

> and that replicators should tread very lightly

That is not at all how science is supposed to work.

If a result can't be replicated, it is useless. Replicators should not be told to "tread lightly", they should be encouraged. And replication papers should be published, regardless of the result (assuming they are good quality).

"We should distinguish the person from the deed"

No, we shouldn't. Research fraud is committed by people, who must be held accountable. In this specific case, if the issues had truly been accidental, the author's would have responded and revised their paper. They did not, ergo their false claims were likely deliberate.

That the school and the journal show no interest - equally bad, and deserving of public shaming.

Of course, this is also a consequence of "publish or perish."

I appreciate the convenience of having the original text on hand, as opppsed to having to download it of Dropbox of all places.

But if you're going to quote the whole thing it seems easier to just say so rather than quoting it bit by bit interspersed with "King continues" and annotating each I with [King].

Isn't at least part of the problem with replication that journals are businesses. They're selling in part based on limited human focus, and on desire to see something novel, to see progress in one's chosen field. Replications don't fit a commercial publications goals.

Institutions could do something, surely. Require one-in-n papers be a replication. Only give prizes to replicated studies. Award prize monies split between the first two or three independent groups demonstrating a result.

The 6k citations though ... I suspect most of those instances would just assert the result if a citation wasn't available.

It's harder to do social/human science because it's just easier to make mistakes that leads to bias. It's harder to do in maths, physics, biology, medecine, astronomy, etc.

I often say that "hard sciences" have often progressed much more than social/human sciences.

The problem with academia is that it's often more about politics and reputation than seeking the truth. There are multiple examples of researchers making a career out flawed papers and never retracking or even admitting a mistake.

All the talks they were invited to give, all the followers they had, all the courses they sold and impact factor they have built. They are not going to came forward and say "I misinterpreted the data and made long reaching conclusions that are nonsense, sorry for misleading you and thousands of others".

The process protects them as well. Someone can publish another paper, make different conclusions. There is 0 effort get to the truth, to tell people what is and what isn't current consensus and what is reasonable to believe. Even if it's clear for anyone who digs a bit deeper it will not be communicated to the audience the academia is supposed to serve. The consensus will just quietly shift while the heavily quoted paper is still there. The talks are still out there, the false information is still propagated while the author enjoys all the benefits and suffers non of the negative consequences.

If it functions like that I don't think it's fair that tax payer funds it. It's there to serve the population not to exist in its own world and play its own politics and power games.

Social fame is fundamentally unscalable, as it operates in limited room on the scene and even less in the few spot lights.

Benefits we can get from collective works, including scientific endeavors, are indefinitely large, as in far more important than what can be held in the head of any individual.

Incitives are just irrelevant as far as global social good is concerned.

This is simply a case of appeal to authority. No reviewer or editor would reject a paper from either HBS or LBS, let alone a joint paper between the two. Doing so would be akin to career suicide.

And therein lies the uncomfortable truth: Collaborative opportunities take priority over veracity in publications every time.

I don't understand why it has been acceptable to not upload a tarball of your data with the paper in the internet age. Maybe the Asset4 database is only available with license and they can't publish too much. However, the key concern with the method is a pairwise matching of companies which is an invention of the paper authors and should be totally clear to publish. The number of stories I've heard from people forensically investigating PDF plots to uncover key data from a paper is absurd.

Of course doing so is not free and it takes time. A paper represents at least months of work in data collection, analysis, writing, and editing though. A tarball seems like a relatively small amount of effort to provide an huge increase in confidence for the result.

Not even surprised. My daughter tried to reproduce a well-cited paper a couple of years back as part of her research project. It was not possible. They pushed for a retraction but university don't want to do it because it would cause political issues as one of the peer-reviewers is tenured at another closely associated university. She almost immediately fucked off and went to work in the private sector.

Google Scholar citation numbers are unreliable and and cannot be used in bibliometric evaluation. They are auto generated and are not limited to the journal literature. This critique is completely unserious. At the same time bad papers also tend to get more citations on average than middling papers, because they are cited in critiques. This effect should be even larger in a dataset that includes more than the citations from journal papers. This blog post will in time also add to the Google Scholar citation count.

Citation studies are problematic and can and their use should be criticized. But this here is just warm air build on a fundamental misunderstanding of how to measure and interpret citation data.

> Because published articles frequently omit key details

This is a frustrating aspect of studies. You have to contact the authors for full datasets. I can see why it would not be possible to publish them in the past due to limited space in printed publications. In today's world though every paper should be required to have their full datasets published to a website for others to have access to in order to verify and replicate.

I think what these papers prove is my newer theory that organized science isn't scientific at all. It's mostly unverified claims by people rewarded for throwing papers out that look scientific, have novelty, and achieve policy goals of specific groups. There's also little review with dissent banned in many places. We've been calling it scientism since it's like a self-reinforcing religion.

We need to throw all of this out by default. From public policy to courtrooms, we need to treat it like any other eyewitness claim. We shouldn't beleive anything unless it has strong arguments or data backing it. For science, we need the scientific method applied with skeptical review and/or replication. Our tools, like statistical methods and programs, must be vetted.

Like with logic, we shouldn't allow them to go beyond what's proven in this way. So, only the vetted claims are allowed as building blocks (premises) in newly-vetted work. The premises must be used how they were used before. If not, they are re-checked for the new circumstances. Then, the conclusions are stated with their preconditions and limitations to only he applied that way.

I imagine many non-scientists and taxpayers assumed what I described is how all these "scientific facts" and "consensus" vlaims were done. The opposite was true in most cases. So, we need to not onoy redo it but apply scientific method to the institutions themselves assessing their reliability. If they don't get reliable, they loose their funding and quickly.

(Note: There are groups in many fields doing real research and experimental science. We should highlight them as exemplars. Maybe let them take the lead in consulting for how to fix these problems.)

> They intended to type “not significant” but omitted the word “not.”

This one is pretty egregious.

>There’s a horrible sort of comfort in thinking that whatever you’ve published is already written and can’t be changed. Sometimes this is viewed as a forward-looking stance, but science that can’t be fixed isn’t past science; it’s dead science.

Actually it’s not science at all.

This likely represents only a fragment of a larger pattern. Research contradicting prevailing political narratives faces significant professional obstacles, and as this article shows, so does critiques of research that don't.

Maybe that's why it gets cited? People starting with an answer and backfilling?

We’ve developed a “leaning tower of science.” Someday, it’s going to fall.

So 6000 people cited a paper, and either didn’t properly read it (IMO that's academic dishonesty) or weren't able to determine that the methdology was infeasible.

No real surprise. I'm pretty sure most academics spend little time critically reading sources and just scan to see if it broadly supports their point (like an undergrad would). Or just cite a source if another paper says it supports a point.

I've heard the most brutal thing an examiner can do in a viva vocce is to ask what a cited paper is about, lol.

The gatekeepers were able to convince the American public of such heinous things like circumcision at birth based on "science" and now they're having to deal with the corruption. People like RFK Jr. are able to be put into top positions because what they're spewing has no less scientific merit than what's accepted and recommended. The state of scientific literature is incredibly sad and mainly a factor of politics and money than of scientific evidence.

Not enough is understood about the replication crisis in the social sciences. Or indeed in the hard sciences. I do wonder whether this is something that AI will rectify.

Does it bug anyone else when your article has so many quotes it’s practically all italics? Change the formatting style so we don’t have to read pages of italic quotes

The title alone is sus. I guess there are a lot of low quality papers out there in sciencey sounding fields.

Could you also provide your critical appraisal of the article so this can be more of a journal club for discussion vs just a paper link? I have no expertise in this field so would be good for some insights.

I will not go into the details of the topic but the "What to do" is the most obvious thing. If a paper that is impactful cannot be backed by other works that should be a smell

And thus all citing, have fatally flawed there paper if its central to the thesis, thus, he who proofs the root is rotten, should gain there funding from this point forward.

I studied a Masters from Cambridge Judge Business School, and my takeaway is that “Management Science” is to Science what “Software Engineering” is to Engineering.

I was young once too.

“Your email is too long.”

This whole thing is filled with “yeah, no s**” and lmao.

More seriously, pretty sure the whole ESG thing has been debunked already, and those who care to know the truth already know it.

A good rule of thumb is to be skeptical of results that make you feel good because they “prove” what you want them to.

Creators of Studies reflect their own human flaws and shortcomings.

This can directly undermine the scientific process.

There has to be a better path forward.

> This doesn’t mean that the authors of that paper are bad people!

> We should distinguish the person from the deed. We all know good people who do bad things

> They were just in situations where it was easier to do the bad thing than the good thing

I can't believe I just read that. What's the bar for a bad person if you haven't passed it at "it was simply easier to do the bad thing?"

In this case, it seems not owning up to the issues is the bad part. That's a choice they made. Actually, multiple choices at different times, it seems. If you keep choosing the easy path instead of the path that is right for those that depend on you, it's easier for me to just label you a bad person.

What exactly is 'sustainability'

Without looking, first thought was "Are the authors from Harvard Business School?" Sure enough, two out of three are. Something's gone really wrong at that place, they just keep churning out horseshit.

We tried to scale "University". Turns out it doesn't scale well.

The paper publishing industry has a tragedy of the commons problem. Individual authors benefit from fake or misrepresented research. Over time more and more people roll their eyes when they hear “a study found…” Over a long period it depreciates science and elevates superstition.

For example, look at how people interact with LLMs. Lots of superstition (take a deep breath) not much reading about the underlying architecture.

I'm afraid that this is very common.

A while back I wrote a piece of (academic) software. A couple of years ago I was asked to review a paper prior to publication, and it was about a piece of software that did the same-ish thing as mine, where they had benchmarked against a set of older software, including mine, and of course they found that theirs was the best. However, their testing methodology was fundamentally flawed, not least because there is no "true" answer that the software's output can be compared to. So they had used a different process to produce a "truth", then trained their software (machine learning, of course) to produce results that match this (very flawed) "truth", and then of course their software was the best because it was the one that produced results closest to the "truth", whereas the other software might have been closer to the actual truth.

I recommended that the journal not publish the paper, and gave them a good list of improvements to give to the authors that should be made before re-submitting. The journal agreed with me, and rejected the paper.

A couple of months later, I saw it had been published unchanged in a different journal. It wasn't even a lower-quality journal, if I recall the impact factor was actually higher than the original one.

I despair of the scientific process.

If you’re the same Sean Luke I’m thinking of:

I was an undergraduate at the University of Maryland when you were a graduate student there in the mid nineties. A lot of what you had to say shaped the way I think about computer science. Thank you.

This reminds me of my former college who asked me to check some code from a study, which I did not know it was published, and told him I hope he did not write it since it likely produced the wrong results. They claimed some process was too complicated to do because it was post O(2^n) in complexity, decided to do some major simplification of the problem, and took that as the truth in their answer. End result was the original algorithm was just quadratic, not worse, given the data set was easily doable in minutes at best (and not days as claimed) and the end result did not support their conclusions one tiny bit.

Our conclusion was to never trust psychology majors with computer code. And like with any other expertise field they should have shown their idea and/or code to some CS majors at the very least before publishing.

> it became clear that the journal's editor was too embarrassed

How sad. Admitting and correcting a mistake may feel difficult, but it makes you credible.

As a reader, I would have much greater trust in a journal that solicited criticism and readily published corrections and retractions when warranted.

I take the occasion to say that I helped making/rewriting a comparison between various agent-based modelling software at https://github.com/JuliaDynamics/ABMFrameworksComparison, not sure if this correctly represents all of them fairly enough, but if anyone wants to chime in to improve the code of any of the frameworks involved, I would be really happy to accept any improvement

I had a similar experience where a competitor released an academic paper rife with mistakes and misunderstandings of how my software worked. Instead of reaching out and trying to understand how their system was different than mine they used their incorrect data to draw their conclusions. I became rather disillusioned with academic papers as a result of how they were able to get away with publishing verifiably wrong data.

I reviewed for Management Science years ago, once. Once. They had a ridiculously baroque review process with multiple layers of reviewing and looping within them where a paper gets re-reviewed over and over. I couldn't see any indication that it improved the quality over the standard three-people-review-then vote process. The papers I was given were pure numerology, long equations involving a dozen or more terms multiplied out where changing any one of them would throw the results in a completely different direction. And the weightings in some of the equations seemed pretty arbitrary, "we'll put a 0.4 in here because it makes the result look about right". It really didn't inspire confidence in the quality of the stuff they were publishing.

Now I'm not saying that everything in M-S is junk, but the small subset I was exposed to was.

I think the publish or perish academic culture makes it extremely susceptible to glossing over things like this - especially for statistical analysis. Sharing data, algorithms, code and methods for scientific publications will help. For papers above a certain citation count, which makes them seem "significant", I'm hoping google scholar can provide an annotation of whether the paper is reproducible and to what degree. While it won't avoid situations like what the author is talking about, it may force journal editors to take rebuttals and revisions more seriously.

From the perspective of the academic community, there will be lower incentive to publish incorrect results if data and code is shared.

I am going to assume you are the author of MASON and this was agents.jl?

They make a lot of claims on how much faster they are than MASON, Netlogo, and Mesa. But in practice I am not finding that to be the case. Also they arent counting the Julia compilation step which takes an absurdly long time, and by the time that gets done similar simulations are already done, then they start the clock on their own benchmark.

Agents.jl and Mesa have the selling point of having better languages / libraries for numerical computation. But thats really a subset of msor ABM I think.

For example, look at how people interact with LLMs. Lots of superstition (take a deep breath) not much reading about the underlying architecture.

Sounds like the Watergate Scandal. The crime was one thing, but it was the cover-up that caused the most damage.

Once something enters The Canon, it becomes “untouchable,” and no one wants to question it. Fairly classic human nature.

> "The most erroneous stories are those we think we know best -and therefore never scrutinize or question."

-Stephen Jay Gould

Citations being the only metric is one problem. Maybe an improved rating/ranking system would be helpful.

Ranking 1 to 3 - 1 being the best - 3 the bare minimum for publication.

3. Citations only

2. Citations + full disclosure of data.

1. Citations + full disclosure of data + replicated

The same dynamics from school carry over into adulthood: early on it’s about grades and whether you get into a “good” school; later it becomes the adult version of that treadmill : publish or perish.

something something Goodhart's Law