Surely this depends on how the vendor sets their prices? If you're going to buy something from a website to test a stolen credit card you don't just get to make up your own prices.
And I think you may be over-indexing on the US "prices don't include tax" thing. Elsewhere, round-number prices are extremely common.
In fact a lot of the rest of the stuff in the post seems like it wouldn't work very well either. (E.g. you're flagging anyone who has done a transaction in the last 90 days outside the range of hours at which they have 2+ transactions? Wouldn't that be like 50% of people?).
It's unclear to me whether this article is an attempt at breaking down complex expertise into over-simplified SQL queries, or whether it is all speculative and made up.
There is a conflict between "Six SQL patterns I use to catch transaction fraud" and "Nothing here comes from anything I’ve actually worked on or seen".
I don't usually buy gas, coffee or snacks at 2am. But on the very rare occasion that I do, I'm dealing with some kind of personal emergency and don't also want to have to call my bank.
I get that that's also a time opportunistic thieves, etc, might be operating. But the cost of false positives is also a thing.
For example "Impossible travel": these days you can add your credit card to your phone and use Apple Pay. Well, this is useful for many things, one of them being adding your credit card to your kid's (teenager) phone, so that your kid can use your card in case of need/emergency when they are away from you. I did exactly that recently and actually worried about fraud control systems when my child paid using my card in Boston while I was in Europe.
Many things which you think are true might not be.
Anecdotally, US banks are terrible at building fraud control systems. It seems US banks assume any transaction that is charged by an entity outside the US is fraud. In my 10-year history of running a SaaS, the US banks and their "fraud control" systems have been one of the biggest billing problems.
> The roundness is the signal.
> Slight pain, same result.
to point at a few.
We learn simultaneously that 'your team' shouldn't rely on any one of those patterns ('none of them is enough'), but that pattern 1 'alone will surface a useful amount of fraud'.
We also read strange sentences like "Every analyst on your team will use them (ie window functions) once they exist, and adding the next fraud pattern stops being a project. [end of paragraph]"
Or irrelevant discussions about how filtering by "IS NULL" might be not applicable when almost none of the provided examples uses it (and the one which does uses it in different context).
This is low quality and too long.
But I must admit there was a point where I suddenly lost my love for SQL and it was pretty much when the OVER PARTITION BY syntax appeared.
It never clicks. I always have to look up how it works, I always find it unintuitive. I've never understood why I hate it so much.
How do you deal with vacations and online shopping. You could be in another country or two in a few hours and purchase from across the world
Signal's he can check? So some random dude is looking at my credit card purchase history while playing around with his SQL queries?
What about the tables?
I question the described approaches. For example, while impossible travel is a legitimate and widely used technique, it's related to online user behaviour based on IP address. Moreover, tirreno, for example, has separate rules for cases where the IP clearly comes from Apple Relay or VPN/Tor — those are separate flags. I assume some or all examples are LLM-generated, as the context is mixed up and no one actually collects GPS location in bulk for card swipes.
Bunch of thresholds, no data proving those thresholds are meaningful.
Lagging window functions and/or lateral joins probably would have reduced it to 1/4 the size but definitely increased the cost versus just narrowing the sets into smaller tables first.
> Most people are creatures of habit when they spend money. A nine-to-fiver doesn’t suddenly start buying gas at 3am.
Breaking out of a habit once in a while is what keeps one's mind sharp.
A big "fuck you" to financial analysts with those groundhog-day mindsets for making my life much more miserable than it needs to be and for adding a chilling effect to those little getaways that make life interesting and worthwhile. I despise you for this.
chargeback-mcp
or would you turn it all into a markdown file and call it a skill?
Or, the cardholder is trying to do the cannonball run:
Machine learning systems also learn your pattern. The article gives simple SQL rules. Don't dismiss this article as worthless.
However, before going to a distant country, which was also in very different time zone, I warned the bank that issued the card that I intended to use, so that they would not consider suspicious either the place or the time of the transactions.
> When a skimmer compromises a card reader at, say, a gas pump, you don’t get one fraud case. You get dozens. Every card swiped at that pump for the next few weeks is now in someone’s database. So the symptom from the merchant side is: an unusual number of unrelated cards spending more than usual, in a short window.
So he checks for hour-bucketed increases in high-value transactions originating from that merchant.
Seems to me like a good way to catch a sale, an opening, a launch event, or a product “drop,” a single high-value sale that somebody spreads across several cards… less so a good way to detect a steady trickle of stolen card data that’s inexplicably used back at the same merchant.
If you’re installing a card skimmer, why would you charge the stolen cards at the same business where you’re stealing them? And why would you concentrate your spending into bursts if the skimmer’s harvesting all day every day?
If you’re the merchant doing the skimming in order to spend at your own store, wouldn’t it be easier to punch a higher amount into the terminal? If you’re a skimming ring, wouldn’t you prefer to have purchasing power rather than this $5000 threshold (?!) of extra gas (plus a giant neon sign advertising where you placed your skimmer)?
Wouldn’t a more sensible approach involve something like looking for merchant clusters in the combined transaction histories of known-stolen accounts?
The LLM runs so strong in this whole enterprise… I want to give the person the benefit of the doubt, but I can’t resist the sneaking suspicion that LLM fabulism to push a slop novel just wasted 15 minutes of my life.
"Fixel Smith" is an AI-generated person, with an article that has very little to do with fraud analysis. 'This' is also a music artist (1), novelist (2), fraud analyst (3), influencer (4), and whatever else you can imagine.
220+ points and 70 comments, and very few notice it's quite a fake post — and no one that it's an AI generated person?
1. https://www.amazon.it/Forged-Soundtrack-Explicit-Fixel-Smith...
This is an underrated CX factor: If my card gets denied when i’m a new customer or exhibiting a new pattern, i’m impressed with their software.
However if they deny a transaction where there is any previous history of me authenticating, then I’m frustrated by their naive paranoid algorithm.
So this is really just surfacing cases, but with not enough context to be useful to prioritise. I would expect a score to be included.
Apart from that it misses a lot of signals like refunds, declines, disputes etc [1].
1) https://stripe.com/gb/guides/improve-fraud-management-with-r...
Or normal people living in Europe in border-adjacent areas.
Also, I guess you don't include card-not-present transactions in this, but you incorrectly assume that every merchant has their location set correctly. And that every sale happens in a brick-and-mortar establishment, not from travelling salespeople or whatever. And that all transactions happen online.
It's also not all program-integrity, which is the only work that could justify such blanket statements. Worse is better as long as it addresses the problem domain.
Fintech clients are generally interested in knowing whether a transaction happening _right now_ is fraud. They want to know that in a few milliseconds, for high-dimensional data. It's work done at a scale where relational databases cannot meet these real-time constraints, and instead find other uses like historical data loading. That's how you end up with in-memory databases, stream-processing engines, and yes, even machine learning.
Having said that, some of the author's points are valid, and I'm looking forward for their next writings, in particular dealing with noisy alerts is a general problem beyond performance engineering.
If a card swipes in Chicago and seven minutes later swipes in Los Angeles, one of those swipes is fake.
How does this work with online shopping? When I am sitting on the couch and buy from Amazon, where does the address get registered?Can also imagine an edge case: couple shares an online account, one is traveling and purchases with the saved card details.
Coffee usually _is_ a round number in my experience, and I know of people who aim for round numbers when filling their car, and of fuel stations which require a pre-set value, often 10, 20, 50€ etc
a) trivial to bypass by adding dither to the test transactions and
b) trivial to improve upon with proper statistical analysis and
c) shouldn't this kind of heuristic pattern recognition with no expectation of near-100% accuracy be what AI is good at?
This rings home so true, as a Canadian company I am SO TIRED of US banks flagging our transactions as fraud. We have done so much to try to prevent it too. We have a mail forwarding office address in the US. A bank account in USD in the US registered to that address, the merchant account tied to that charging in USD, and still we get these fraud flags. And we’re over the 10 year mark now, I think almost 15. You would thing we would have built up some trust at these banks, but nope.
My next biggest hassle lately is we are a “tokenize and bill later” type service, and we don’t charge monthly recurring exact same amount, depends on the users incurred charges in that period. And lately it seems most Americans leave their cards on a permanently lock, and only unlock to allow a charge, this means most of our charges decline initially until the user unlocks their card and retries the payment. A real support headache if any has a fix to either of these problems I would pay good money for it.
Buying a full tank of gas and then a full tank of petrol (dual-fuel vehicle) in two separate card transactions one after the other. Can't use the same pump, annoyingly, at least with the old system in Morrisons. Don't know if it's different since their petrol stations have been bought over by Motor Fuel Group.
Similarly, buying a full tank of gas at Morrisons in Bradford where the supermarket chain's headquarters are, then driving five hours north and refuelling again in a different Morrisons which show the transaction as coming from their banking systems in Bradford but tagged as a city in Scotland. This is apparently because it's implausible to drive from the central England to central Scotland in a few hours, and then need to refuel.
They are (or was at least, last checked a year ago) 100% repeatable.
Basically it's not just banks and formal financial institutions doing this, and how they do it depends on the company size. Size tends to correlate not only with how many resources you have for a risk team, but also with whether fraud rings are targeting you.
Usually what I've seen is that companies start with some kind of batch SQL/simple logic process that runs daily and tends to flag accounts for manual review and block automatic events like settlement or trading (or whatever the platform does) until that review has been done. Then over time the company will transition to an ML-based approach that still mostly flags things for manual review. The goal of the ML is to improve the precision of the flagging without hurting dollar recall or fraud event recall too much. Depending on the payment system companies may be sensitive to both (for example, in ACH if you get too many returns, even very low dollar payment returns, you're going to get a hard time from your partner bank and you risk not being able to use ACH anymore).
This is Claude talking isn’t it.
Card got blocked as they thought it was fraud. Annoying! And not something inebriated me wanted to deal with at 2am.
Ok. Maybe they protected me from myself, but still!
And my favourite most hated pattern, the no no no:
> Not machine learning, not graph databases, not whatever Gartner is hyping this year.
For Prevention, you're always going to be constrained by latency requirements, available data and an incomplete picture of user behaviour. You make a quick decision using ML and rules that deals with the majority of cases. But those constraints make it impossible to precisely prevent all fraud.
Detection deals with the downstream consequences of this. A team of analysts will typically analyse the accepted transactions for signs of fraud. This is particularly important for fraud types where you don't get an external signal like a chargeback or customer complaint. Platform integrity is one such example. But Fintechs will also see this building anti-money laundering systems - you need to go looking for the fraud. This is the process the article is describing.
I say they're complementary because the detected transactions become the labels for training and evaluating the next iteration of prevention models.
I know someone who worked in fraud detection of financial transactions. He told me that indeed lots of filters that are applied mostly test for anomalies. The thing is that most criminals are not insanely smart, and commonly don't have a lot of inside knowledge about accounting, banking, finance system etc., so criminals often have a bad intuition about more subtle things that are looked at for fraud detection.
But if you are a very dedicated criminal with lots of inside knowledge about, say, accounting, banking, finance system, ..., you could likely outsmart these filters. But these people typically have much better career options (even if they want a career as a "big fish criminal": just look at the history of accounting scandals, stock manipulations, Ponzi schemes, ...).
The DPAN is translated into the CPAN by software at the issuing bank, so it's not identifiable by the merchants.
Merchants get the "last 4" digits, but that's not enough to identify specific CPANs.
Makes me wonder if this AI flood uncovers the unflattering truth about this community acuteness, or it's only a failure of existing guardrails and we just need to change them.
It's definitely a real post. Yes it's obviously LLM-written, but if the worst thing you can say about an article is that it looks LLM-written, then maybe you don't have any real criticisms.
Whether the contents are made up or not is unclear, but you can criticise the content of the article without needing to speculate on whether it was written by an LLM or whether it is a work of fiction. It has plenty of much more concrete flaws.
Kind of fascinating, though it could still be a person doing this using AI as opposed to an entirely generated persona. Thanks for bringing it up.
> What works is running them all and scoring each transaction across the signals. A transaction that fails on three or four of them is almost always fraud. A transaction that fails on one might be your grandma being weird with her debit card on vacation.
Fraudulent transactions will eventually cost the bank (when they would have to reverse/reimburse it and eat the loss). A denied transaction only results in an angry customer who will quickly forget after they complained - so the customer bears the brunt of the externalized cost.
Therefore, the bank's incentive is to err on the side of more caution, and deny transactions when finding false positives.
Human analysts need to be able to explain to compliance in a single 5 minute email why a specific transaction was declined, and most importantly, what could have been done differently to avoid the adverse decision.
Fixing one problem with ML often creates two new problems that aren't quite obvious yet. SQL tends to have fewer surprises with regard to regressions and unexpected side effects as things change over time.
Just set up a direct debit to your favourite charity.
> Real cardholders almost never buy something for exactly $1.00. Coffee is $4.73, gas is $52.81. The roundness is the signal.
Well,sure. But some people come here just for the comments and don't read the articles
The fact that the 'person' behind this post managed to publish a novel, a music album, and a few posts on fraud prevention all within a few days this month is enough for me to redflag it.
> you don't have any real criticisms.
Please check once again, I've already given my opinion on the article itself below. Here it is again for your convenience.
> I question the described approaches. For example, while impossible travel is a legitimate and widely used technique, it's related to online user behaviour based on IP address. Moreover, tirreno, for example, has separate rules for cases where the IP clearly comes from Apple Relay or VPN/Tor — those are separate flags. I assume some or all examples are LLM-generated, as the context is mixed up and no one actually collects GPS location in bulk for card swipes.
Last week I was on a privacy panel discussion where one of the speakers had a bad microphone, and honestly no one in the audience could actually hear what he was talking about for five minutes. Afterward, the audience, perhaps out of politeness, perhaps because no one really cared, reacted and applauded as usual.
This post, and especially its reception, feels like a caricature of what's happening in the tech industry right now. Hundreds upvote, very few really try to understand what the context is about, much less have the real experience to judge it, and yet here we are on the front page of HN.
Now the East Coast will wake up, and I'm really curious whether anything will change in this thread.
The worst scenario for a credit card issuer is when a customer, for whatever reason, starts using another bank’s card in their wallet as their daily driver.
the point of these drives is to get more people to give to charity. Then you use a lullaby-word as if setting up a charitable donation is as easy as saying "yes" when the checker asks if you want to give a small donation.
A fake blog post would be something that purports to be a blog post but actually isn't. This is definitely a real post.
It isn't though, so balancing it with other rules is fine.
Its actually very easy to give ÂŁ5/month direct to a charity. Takes about 2 minutes, just gotta do it.
What would be a possible example of this?
But examples might include something that looks at a glance like a blog post, but actually it's all just squiggles instead of words.
Or a link on the home page that looks like there's a blog post, but when you click it it's just a YouTube video.
May 12, 2026
Quick disclaimer: I do data work on a program-integrity team. Examples below use generic transaction tables and made-up scenarios. Nothing here comes from anything I’ve actually worked on or seen. Views are mine, not my employer’s.
Fraud detection in transaction data is mostly SQL. Not machine learning, not graph databases, not whatever Gartner is hyping this year. SQL, run against the right tables, with the right joins, looking for the right shapes.
I work mostly with government-funded benefit programs, but the patterns below port over to anything with a transactions table: credit cards, healthcare claims, e-commerce, point-of-sale. If money moves and gets logged, these queries will find weird things in the log.
Six patterns. Roughly in the order I’d build them out on a new dataset.
The simplest one. Someone with a stolen card wants to drain it before the holder notices. So they hit the card fast.
SELECT
cardholder_id,
date_trunc('hour', timestamp) AS hour_bucket,
count(*) AS tx_count,
min(timestamp) AS first_tx,
max(timestamp) AS last_tx
FROM transactions
WHERE timestamp >= current_date - INTERVAL '30 days'
GROUP BY 1, 2
HAVING count(*) > 10;
Tune two knobs: the window size and the count threshold. I usually run a 1-minute, 5-minute, and 1-hour version in parallel and compare. Different fraud shows up at different scales — a card-testing ring hits a server in seconds; a benefits-trafficking ring might take an afternoon.
A few cardholders will legitimately blow past the threshold. Route operators servicing vending machines. People reloading prepaid cards in bulk. Your false positives. Worth keeping a whitelist after the first pass.
For sliding-window velocity, this is the form I use:
SELECT
cardholder_id,
timestamp,
count(*) OVER (
PARTITION BY cardholder_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
) AS tx_in_last_5min
FROM transactions
QUALIFY tx_in_last_5min >= 5
ORDER BY cardholder_id, timestamp;
QUALIFY works in Snowflake, BigQuery, Databricks, Teradata. For Postgres you wrap the whole thing in a CTE and filter on the outside. Slight pain, same result.
If a card swipes in Chicago and seven minutes later swipes in Los Angeles, one of those swipes is fake. The card is cloned. This is the most uncontroversial fraud signal you’ll find — there’s almost no legitimate reason a single card is in two distant places in seven minutes.
WITH ordered_tx AS (
SELECT
cardholder_id,
timestamp,
location,
LAG(timestamp) OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_ts,
LAG(location) OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_loc
FROM transactions
)
SELECT
cardholder_id,
prev_ts AS first_tx,
timestamp AS second_tx,
prev_loc AS first_location,
location AS second_location,
EXTRACT(EPOCH FROM (timestamp - prev_ts)) / 60 AS minutes_apart,
haversine(prev_loc, location) AS miles_apart
FROM ordered_tx
WHERE prev_ts IS NOT NULL
AND prev_loc <> location
AND haversine(prev_loc, location)
/ nullif(EXTRACT(EPOCH FROM (timestamp - prev_ts)), 0)
* 3600 > 600;
haversine is the great-circle distance function. Most warehouses ship one. If yours doesn’t, it’s about ten lines to write your own.
The 600 mph threshold is rough — commercial jet cruise is around 575, so this is “faster than a plane could possibly do it.” You can tighten it to 100 mph if you want to catch suspiciously-fast ground travel too, but at that threshold you start picking up real airline travelers, kids with parents driving them home from camp, that kind of thing.
A few other shapes in the same family are worth running:
There are a couple of amounts that show up disproportionately in fraud and almost never in normal use.
SELECT cardholder_id, timestamp, amount, merchant_id
FROM transactions
WHERE
(amount >= 99.50 AND amount < 100.00)
OR (amount >= 499.50 AND amount < 500.00)
OR amount IN (1.00, 5.00, 10.00)
ORDER BY cardholder_id, timestamp;
What’s happening:
Round dollar amounts at small values — $1.00, $5.00, $10.00 — are almost always card tests. Someone got a card number from a dump and they’re checking if it works before reselling it. Real cardholders almost never buy something for exactly $1.00. Coffee is $4.73, gas is $52.81. The roundness is the signal.
Amounts just below a threshold are different. $99.99 is interesting because at a lot of places, $100 is the line where the cashier is supposed to check ID. $499.99 is interesting because $500 is often a daily ATM cap. Whoever’s doing the transaction knows the rules and is staying under them.
(For benefits transactions specifically, the round-number pattern doesn’t help much. Benefits don’t get card-tested the same way. There the signal is usually duplicate recipients, which is a different post.)
When a skimmer compromises a card reader at, say, a gas pump, you don’t get one fraud case. You get dozens. Every card swiped at that pump for the next few weeks is now in someone’s database. So the symptom from the merchant side is: an unusual number of unrelated cards spending more than usual, in a short window.
SELECT
merchant_id,
date_trunc('hour', timestamp) AS hour_bucket,
count(DISTINCT cardholder_id) AS unique_cards,
count(*) AS total_tx,
sum(amount) AS total_amount
FROM transactions
WHERE timestamp >= current_date - INTERVAL '7 days'
GROUP BY 1, 2
HAVING count(DISTINCT cardholder_id) > 20
AND sum(amount) > 5000
ORDER BY total_amount DESC;
The problem with static thresholds (20 unique cards, $5000) is they don’t account for size. A Costco does that in 90 seconds. A used bookshop, never. So the better version compares each merchant against itself:
WITH merchant_hourly AS (
SELECT
merchant_id,
date_trunc('hour', timestamp) AS hour_bucket,
count(DISTINCT cardholder_id) AS unique_cards
FROM transactions
WHERE timestamp >= current_date - INTERVAL '60 days'
GROUP BY 1, 2
),
with_baseline AS (
SELECT
*,
avg(unique_cards) OVER (
PARTITION BY merchant_id
ORDER BY hour_bucket
ROWS BETWEEN 168 PRECEDING AND 1 PRECEDING
) AS rolling_avg_cards
FROM merchant_hourly
)
SELECT *,
unique_cards / nullif(rolling_avg_cards, 0) AS spike_ratio
FROM with_baseline
WHERE unique_cards > rolling_avg_cards * 3
ORDER BY spike_ratio DESC;
The 168 is the trailing seven days of hourly buckets. I use a week because daily and weekly seasonality matters — Tuesday 2pm at a coffee shop is not the same baseline as Saturday 9am at the same shop. A week catches both cycles.
Three times normal is where I start. It’s loose enough not to drown you in alerts but tight enough to flag the actually weird hours.
Most people are creatures of habit when they spend money. A nine-to-fiver doesn’t suddenly start buying gas at 3am. If their card does, it’s either being used by someone else or they’re traveling — and travel produces other signals you can check.
WITH cardholder_hour_pattern AS (
SELECT
cardholder_id,
EXTRACT(HOUR FROM timestamp) AS hour_of_day,
count(*) AS tx_count
FROM transactions
WHERE timestamp >= current_date - INTERVAL '90 days'
GROUP BY 1, 2
),
cardholder_normal AS (
SELECT
cardholder_id,
min(hour_of_day) FILTER (WHERE tx_count >= 2) AS earliest_hour,
max(hour_of_day) FILTER (WHERE tx_count >= 2) AS latest_hour
FROM cardholder_hour_pattern
GROUP BY 1
)
SELECT t.cardholder_id, t.timestamp, t.amount, t.merchant_id
FROM transactions t
JOIN cardholder_normal cn USING (cardholder_id)
WHERE EXTRACT(HOUR FROM t.timestamp) NOT BETWEEN cn.earliest_hour AND cn.latest_hour
ORDER BY t.timestamp DESC;
The “two or more in that hour” filter on the inner query is doing important work. Without it, one stray late-night gas station purchase three months ago becomes part of the cardholder’s “normal” hours, and you never flag them again. Requiring at least two purchases in a given hour, in 90 days, sets the bar at “actually a habit” instead of “happened once.”
Drawback: this doesn’t work until you have history. New accounts have no baseline. For those, you fall back to global hour patterns or just skip this pattern entirely until they’ve been around for a couple months.
This one isn’t really a pattern. It’s a setup that makes the other five patterns composable.
SELECT
cardholder_id,
timestamp,
amount,
merchant_id,
timestamp - LAG(timestamp) OVER w AS time_since_last,
CASE WHEN merchant_id <> LAG(merchant_id) OVER w
THEN 'changed' ELSE 'same' END AS merchant_change,
sum(amount) OVER (
PARTITION BY cardholder_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL '24 hours' PRECEDING AND CURRENT ROW
) AS running_24h_total,
ROW_NUMBER() OVER (
PARTITION BY cardholder_id, date(timestamp)
ORDER BY timestamp
) AS tx_of_day
FROM transactions
WINDOW w AS (PARTITION BY cardholder_id ORDER BY timestamp)
ORDER BY cardholder_id, timestamp;
Once you’ve materialized those columns, fraud rules collapse to filter expressions. Say you’re hunting card-testing rings, where the tell is “lots of small charges, all at different merchants, within minutes of each other.” The rule becomes:
SELECT *
FROM tx_with_windows
WHERE tx_of_day >= 5
AND time_since_last < INTERVAL '60 seconds'
AND merchant_change = 'changed';
Three filters. That’s it.
The reason this matters is that the moment your analysts can express new fraud hypotheses as SQL filters instead of engineering tickets, your iteration loop drops from weeks to hours. You catch more fraud, faster.
None of these alone is enough. Velocity has false positives (vending operators). Geographic impossibility misses anything inside a single metro. Amount anomalies don’t apply outside of card-test contexts. The off-hours rule needs history.
What works is running them all and scoring each transaction across the signals. A transaction that fails on three or four of them is almost always fraud. A transaction that fails on one might be your grandma being weird with her debit card on vacation.
If you’re brand new to fraud detection, start with pattern 1. It alone will surface a useful amount of fraud and very little legitimate activity, and it’s cheap to run.
If you’ve already got 1 through 5, the place to invest is pattern 6 — those window-function primitives. Every analyst on your team will use them once they exist, and adding the next fraud pattern stops being a project.
A few things this post doesn’t cover that come up constantly:
NULL handling. Real transaction tables don’t use NULL the way intro SQL books do. A lot of legacy systems use sentinel values like 9999-12-31 for “no end date” or 0001-01-01 for “no start date.” Filtering with IS NULL will silently miss those rows. Always check what the convention is in your specific table before writing WHERE clauses that assume NULL.
False positives. Every rule above will flag real cardholders doing weird-but-legitimate things. Your fraud workflow needs human review of flagged cases, with a feedback loop that lets you tune thresholds based on what’s actually fraud and what isn’t. Auto-blocking on a single rule is how you lose customers.
Privacy. If the data has PII, your queries need to comply with your applicable data-use policies. De-identified or sampled data first, production data with authorization second.
Cost. Window functions with big partitions are not cheap. Filter your date range first, then apply the window, not the other way around. I’ve watched a junior analyst burn through a warehouse credit budget by running a LAG() across two years of transactions on the entire dataset before adding the WHERE.
Things I want to write about next, depending on what people ask for:
LAG and ROW_NUMBERIf there’s something specific you want covered, message through fixelsmith.com.
Fixel Smith is an experienced Program Integrity Analyst working in public-sector data.