I somehow find the concept of a general time series model strange. How can the same model predict egg prices in Italy, and global inflation in a reliable way?

And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…

It would be nice to add (2024) to the title, this is not news (see: https://research.google/blog/a-decoder-only-foundation-model...)

Here is the link to the blogpost, that actually describe what this is: https://github.com/google-research/timesfm?tab=readme-ov-fil...

Can someone explain ELI5 how it does work? and how many data points it can read?

I somehow find the concept of a general time series model strange. How can the same model predict egg prices in Italy, and global inflation in a reliable way?

And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…

It would be nice to add (2024) to the title, this is not news (see: https://research.google/blog/a-decoder-only-foundation-model...)

Here is the link to the blogpost, that actually describe what this is: https://github.com/google-research/timesfm?tab=readme-ov-fil...

Can someone explain ELI5 how it does work? and how many data points it can read?

I'm willing to bet an intelligent LLM with a dataset and a pandas stats package could outperform this model by running its own experiments and making predictions

So the time series are provided with no context? It's just trained on lots of sets of numbers? Then you give it a new set of numbers and it guesses the rest, again with no context?

My guess as to how this would work: the machine will first guess from the data alone if this is one of the categories it has already seen/inferred (share prices, google trend cat searches etc.) Then it'll output a plausible completion for the category.

That doesn't seem as if it will work well for any categories outside the training data. I would rather just use either a simple model (ARIMA or whatever) or a theoretically-informed model. But what do I know.

This has been around a few months now, has anyone built anything on it?

Let's say I have long time series of past solar irradiation and long time series of past weather forecasts. Can this model make use of weather forecasts for time X in the future to predict electricity prices in the future?

That is, can it use one time series at time X to predict another time series at time X?

Or is this strictly about finding patterns WITHIN a time series.

Somehow I missed that one. Are there any competition on this?

I always had difficulties with ML and time series, I'll need to try that out.

Has anyone gotten this to run on MLX yet?

isn't this basically prophet?

Can this finally break the stock markets?

This has been around a few months now, has anyone built anything on it?

Let me be blunt: Shannon would tell us that time forecasting is bullshit:

There is infinitely more entropy in the real world out there than any model can even remotely capture.

The world is not minecraft.

What is not generally understood is that these models don’t predict egg prices or inflation in Italy.

They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.

They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

My understanding is that the synthetic training data helps capture abstract time-series patterns that are common in all domains.

As they say in appendix 8:

> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:

> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.

> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.

> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.

If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.

And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.

  > How can the same model predict egg prices in Italy, and global inflation in a reliable way?

For one, there's Benford's law: https://en.wikipedia.org/wiki/Benford%27s_law

So, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.

Actually it can. See https://youtu.be/FUQwijSDzg8?si=LWd5gVNYRd3HH9rJ

Or just search for the James-Stein paradox.

It’s not really predicting “egg prices” or “inflation” — it’s mostly fitting patterns that happen to show up in those series.

The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.

People ask how one model can understand everything, but that assumes there’s any understanding involved at all.

At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?

I would say:

- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors

- memorization: some patterns are recurrent in many domains such as power low

- multitask: exploit cross-domain connections such as weather vs electricity

> How can the same model predict egg prices in Italy, and global inflation in a reliable way?

How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?

Not directly 2024, there was a big update end 2025

Wish they gave some numbers for total GPU hours to train this model, seems comparatively tiny when compared to LLMs so interested to know how close this is to something trainable by your average hobbyist/university/small lab

That takes me to the same content as the submission, a GitHub repo (Chrome on iOS)

This has been around a few months now, has anyone built anything on it?

What is not generally understood is that these models don’t predict egg prices or inflation in Italy.

They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.

They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

So the time series are provided with no context? It's just trained on lots of sets of numbers? Then you give it a new set of numbers and it guesses the rest, again with no context?

If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.

That is, can it use one time series at time X to predict another time series at time X?

Or is this strictly about finding patterns WITHIN a time series.

The paper suggests it’s for forecasting. How this doesn’t just represent the relatively small number of training samples isn’t obvious to me. If most of the time series for training go up and to the right then I assume that’s what the model will (generally) do, but who knows.

> They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

well...

That's what traditional time-series modelling does. This is a foundational model, which means it's just a neural network trained on lots of time series. (So maybe OP's question still stands? But it's the same question as "how can LLMs be good at so many different kinds of conversations?")

Wars in the middle east seem to have increasingly regular patterns tied to stock market opening hours, unfortunately.

I am not familiar with time series models, but judging from your answer, it would be necessary to feed long time series into this model for it to detect trends. What is a token here? Can it, for the lack of a better example, take in all intraday movements of a stock for a day, a week, a month, etc?

What makes these models different from models used for e.g. audio?

Or other low-dimensional time domain signals?

Do these models predict on just a single time series then?

it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.

ARIMA and ARMA models

It is the Middle East. Wars are always in season. And supply is more than the demand.

The main issue is that people do use them to predict bitcoin prices intraday and that sort of things.

My understanding is that the synthetic training data helps capture abstract time-series patterns that are common in all domains.

As they say in appendix 8:

> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:

> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.

> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.

> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.

If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.

And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.

Actually it can. See https://youtu.be/FUQwijSDzg8?si=LWd5gVNYRd3HH9rJ

Or just search for the James-Stein paradox.

  > How can the same model predict egg prices in Italy, and global inflation in a reliable way?

For one, there's Benford's law: https://en.wikipedia.org/wiki/Benford%27s_law

So, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.

I would say:

- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors

- memorization: some patterns are recurrent in many domains such as power low

- multitask: exploit cross-domain connections such as weather vs electricity

Not directly 2024, there was a big update end 2025

If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.

Somehow I missed that one. Are there any competition on this?

I always had difficulties with ML and time series, I'll need to try that out.

There are some other transformer based models on the GIFT leaderboard: https://huggingface.co/spaces/Salesforce/GIFT-Eval

https://www.datadoghq.com/blog/datadog-time-series-foundatio...

https://moment-timeseries-foundation-model.github.io/

https://arxiv.org/abs/2403.07815

A friend at work used one to predict when our CEO would post in Slack, which is verry entertaining to see if correct.

there is TabPFN [1] which also has time series capabilities.

[1] https://priorlabs.ai/tabpfn

Can this finally break the stock markets?

The safe bet is no. Based on other comments, this would depend a lot on the specific trends you're trying to predict. But it wouldn't work for everything in the stock market.

This has been around a few months now, has anyone built anything on it?

we did some internal tests. The quality isn't bad, it works quite well. But it's essentially on the same level of an ARIMA model trained on the data just much bigger and slower.

So in my opinion it currently falls into a kind of void. If your use case is worth predicting and you put a data scientist on it, you're better off just training cheaper ARIMA models.

Let me be blunt: Shannon would tell us that time forecasting is bullshit:

There is infinitely more entropy in the real world out there than any model can even remotely capture.

The world is not minecraft.

Time series forecasting has proven useful in a number of different domains from weather to health monitoring. Sure you can easily over fit on the training data, but in general that's a data source/input problem where you need many high quality data sources to find the signal in the noise.

The world is chaotic sure, but there are still truths to be found in noisy time series data; saying that the world is too random to be knowable is a bit dismissive, no?

Yeah all weather forecasts are just magic

> Shannon would tell us that time forecasting is bullshit

If you're trying to forecast random data, then yes, it's bullshit. Otherwise you have a chance.

> They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

well...

The Middle East war season is upon us once again

Because traditional time-series modelling (ARIMA, GARCH, ...) is too "simple" and "strict". Just like "simple" computer vision (OpenCV, edge-detection, ...) was crushed by neural networks when having to deal with real world images.

I tend to avoid time series forecasting when I can help it because I find it hard to communicate to stakeholders that a neural network (or another method) is not an oracle.

If you are talking about granularity of observations, it would depend on what you are trying to predict (the price in an hour or the price in 12 months?) and how quickly you need the prediction (100ms? Tomorrow morning?). If I had infinite data I would use granularity as a hyper parameter and tune that to a level that produced the best test results.

I am for example currently using weekly averages for non-price data forecasting. I could use daily data but weekly is absolutely adequate for this purpose.

Wars in the middle east seem to have increasingly regular patterns tied to stock market opening hours, unfortunately.

I mean it's super obvious, it's directly tied to scrubs popularity.

New season of scrubs = new war in the middle east.

I totally agree with the sentiment but from what I can tell, I’d say they tend happen immediately before or after markets open and close. Essentially, and to their maximum, screwing absolutely everyone who isn’t in the clique from participating in the trade.

FWIW— the only sure fire way to win the trade is to buy time and assume both gross incompetence and negligence when it comes action. The only caveat is if the markets tank enough, this administration will signal capitulation before hand, e.g. Trump mildly capitulating on tariffs last April after the markets proceed to relentlessly defecate themselves.

0-DTE options are typically, and for good reason, stupid gambles. But, right now they can’t even be considered gambling, because there’s zero chance of winning. Not just bad odds, but no odds. Again just signaling how truly malicious this admin is and its disdain for anyone and everyone not close to them.

What makes these models different from models used for e.g. audio?

Or other low-dimensional time domain signals?

You could abstract speech or other audio as a series of sounds, where time is indeed a factor. Speech, however, has patterns that are more similar to written language than to seasonal patterns that are typically assumed in time series. While trained on different data, the architecture of TimesFM is actually similar to LLMs. But not identical, as pointed out at https://research.google/blog/a-decoder-only-foundation-model...:

> Firstly, we need a multilayer perceptron block with residual connections to convert a patch of time-series into a token that can be input to the transformer layers along with positional encodings (PE). > [...] > Secondly, at the other end, an output token from the stacked transformer can be used to predict a longer length of subsequent time-points than the input patch length, i.e., the output patch length can be larger than the input patch length.

It’s not really predicting “egg prices” or “inflation” — it’s mostly fitting patterns that happen to show up in those series.

The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.

People ask how one model can understand everything, but that assumes there’s any understanding involved at all.

At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?

"curve-fitting" has a long history (centuries old) and could be regarded more as a numerical method issue.

Rigorous understanding of what is over fitting, techniques to avoid it and select the right complexity of the model, etc, are much newer. This is a statistical issue.

My point is that forecasting isn't curve fitting, even thought curve fitting is one element of it.

> How can the same model predict egg prices in Italy, and global inflation in a reliable way?

How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?

It can't compress pictures of everything in a reliable way.

Text and anything with lots of high frequency components looks terrible

Edit, it looks like the paper does

TPUv5e with 16 tensor cores for 2 days for the 200M param model.

Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs

That takes me to the same content as the submission, a GitHub repo (Chrome on iOS)

There are some other transformer based models on the GIFT leaderboard: https://huggingface.co/spaces/Salesforce/GIFT-Eval

there is TabPFN [1] which also has time series capabilities.

[1] https://priorlabs.ai/tabpfn

The safe bet is no. Based on other comments, this would depend a lot on the specific trends you're trying to predict. But it wouldn't work for everything in the stock market.

The world is chaotic sure, but there are still truths to be found in noisy time series data; saying that the world is too random to be knowable is a bit dismissive, no?

ARIMA and ARMA models

It is the Middle East. Wars are always in season. And supply is more than the demand.

"curve-fitting" has a long history (centuries old) and could be regarded more as a numerical method issue.

Rigorous understanding of what is over fitting, techniques to avoid it and select the right complexity of the model, etc, are much newer. This is a statistical issue.

My point is that forecasting isn't curve fitting, even thought curve fitting is one element of it.

It can't compress pictures of everything in a reliable way.

Text and anything with lots of high frequency components looks terrible

It still doesn't pretty well on text. And we have newer formats and ideas that would also deal with that. (To be really dead simple: have a minimal container format that decides between png or jpg, use png for text.)

However: white noise is where it really struggles. But real pictures of the real world don't look like white noise. Even though in some sense white noise is the most common type of picture a priori.

Similar for real world time series: reality mostly doesn't look like white noise.

Edit, it looks like the paper does

TPUv5e with 16 tensor cores for 2 days for the 200M param model.

Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs

And https://arxiv.org/pdf/2310.10688 if you want the full paper.

https://www.datadoghq.com/blog/datadog-time-series-foundatio...

https://moment-timeseries-foundation-model.github.io/

https://arxiv.org/abs/2403.07815

A friend at work used one to predict when our CEO would post in Slack, which is verry entertaining to see if correct.

Many thanks for the links!

we did some internal tests. The quality isn't bad, it works quite well. But it's essentially on the same level of an ARIMA model trained on the data just much bigger and slower.

So in my opinion it currently falls into a kind of void. If your use case is worth predicting and you put a data scientist on it, you're better off just training cheaper ARIMA models.

That is disappointing. One would say that with all the budget and compute, Google would be able to create something that beats methods from 70s. Maybe we are hitting some hard limits.

Maybe it would be better to train an LLM with various tuning methodologies and make a dedicated ARIMA agent. You throw in data, some metadata and requested window of forecast. Out comes parameters for "optimal" conventional model.

Yeah all weather forecasts are just magic

Weather forecasts are notoriously iffy, and accuracy drops with time, but we understand the physics behind it (to a large extent). There's also a lot of fine-grained data available. For some arbitrary time series, there's only one data sequence, and the model is unknown. Extrapolation then becomes a lot more magical.

Whether forecasting is simple: it either rains or it doesn’t. 50/50 probability!

And JPG doesn't work either..

Do these models predict on just a single time series then?

The main issue is that people do use them to predict bitcoin prices intraday and that sort of things.

Is it an issue because it works, or because it doesn’t? Or because it’s bitcoin?

I genuinely want to know. Thank you

I'm willing to bet an intelligent LLM with a dataset and a pandas stats package could outperform this model by running its own experiments and making predictions

Many thanks for the links!

Whether forecasting is simple: it either rains or it doesn’t. 50/50 probability!

And JPG doesn't work either..

The Middle East war season is upon us once again

Is it an issue because it works, or because it doesn’t? Or because it’s bitcoin?

I genuinely want to know. Thank you

Instead of willing to bet, you can do it yourself and prove it. It is not like there is a ceiling for doing what you are proposing. I am willing to bet that you are wrong.

However: white noise is where it really struggles. But real pictures of the real world don't look like white noise. Even though in some sense white noise is the most common type of picture a priori.

Similar for real world time series: reality mostly doesn't look like white noise.

And https://arxiv.org/pdf/2310.10688 if you want the full paper.

That is disappointing. One would say that with all the budget and compute, Google would be able to create something that beats methods from 70s. Maybe we are hitting some hard limits.

I think this could be an interesting read for you, I read it last week and it kind of argues the same points: https://shakoist.substack.com/p/against-time-series-foundati...

> Shannon would tell us that time forecasting is bullshit

If you're trying to forecast random data, then yes, it's bullshit. Otherwise you have a chance.

I tend to avoid time series forecasting when I can help it because I find it hard to communicate to stakeholders that a neural network (or another method) is not an oracle.

I am for example currently using weekly averages for non-price data forecasting. I could use daily data but weekly is absolutely adequate for this purpose.

I mean it's super obvious, it's directly tied to scrubs popularity.

New season of scrubs = new war in the middle east.

I think this could be an interesting read for you, I read it last week and it kind of argues the same points: https://shakoist.substack.com/p/against-time-series-foundati...

But, if you don't have the information required for a forecast, then the outcome can look random. We know the physics needed to predict the outcome of a dice throw, but, since to predict the outcome you would need a lot of information that you don't have, the output is random to you.

This seemed like a good answer at first. But on further thought, images on the whole really do seem to have quite a bit more standard structure / "grammar" to exploit compared to arbitrary time-series. Many images are of the world, where there is gravity so you might see preponderance of blobs at the bottom, or the repetitive types like people, animals, faces, eyes. Wildly abstract images still have some continuity, pixels in a neighborhood are likely to be similar.

Time series in general have none of this kind of structure that's strictly necessary. I'm sure that many real-world sensors typically have some gaussian distribution aspects + noise and/or smoothness and locality types of assumptions that are pretty safe, but presumably that simple stuff is exactly what traditional time-series modelling was exploiting.

Maybe the real question is just what kind of time-series are in the training data, and why do we think whatever implicit structure that is there actually generalizes? I mean, you can see how any training that mixes pictures of dogs and cats with picturing of people could maybe improve drawing hair, detecting hair, or let you draw people AND dogs. It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.

You can use lightgbm with appropriate feature engineering.

Wow, I didn't know. Thank you! Such a great show.

If "seasonal patterns" is the thing that differentiates between these two data sources, then perhaps time series models should be called seasonal models?

It is an issue because bitcoin is highly unpredictable.

These tools are good at predicting timeseries that are in fact quite predictable. Like insurances will use this to estimate the number of people who will die from cancer in the next year, the year after that, and so on up to 50 years in the future. The model will extrapolate the progresses made in cancer treatment from the current trend, etc. It is a prediction, cause it's still possible that a breakthrough comes in and suddenly people don't die from a certain form of cancer, but generally it should be roughly correct.

Bitcoin prices are a lot more chaotic, influenced by a ton of unrelated events that shape its path a certain way. There is absolutely no certainty that studying the shape of its past evolution will help in any way understand its future evolution.

Of course here I mean by studying its price alone. If you add more information, like who's behind each trend and why, you have a much better sense of what could happen next.

White noise is random, so it's incompressible by definition. By JPG or by any other method no matter how clever.

Has anyone gotten this to run on MLX yet?

You can use lightgbm with appropriate feature engineering.

If "seasonal patterns" is the thing that differentiates between these two data sources, then perhaps time series models should be called seasonal models?

isn't this basically prophet?

No. Prophet is based on curve-fitting.

> It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.

Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.

Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.

I'm sure people will be doing mechanical interpretation on these models to extract what they pattern match for prediction.

Wow, I didn't know. Thank you! Such a great show.

It's suprisingly good, like it's it's 100% worth watching if you liked scrubs.

It is an issue because bitcoin is highly unpredictable.

Of course here I mean by studying its price alone. If you add more information, like who's behind each trend and why, you have a much better sense of what could happen next.

White noise is random, so it's incompressible by definition. By JPG or by any other method no matter how clever.

I have a very peculiar coin. With 1% probability it turns up heads and with 99% probability it turns up tails.

A string of flips is random, but it's very compressible.

In any case, my point was that reality ain't uniformly random. And not only that: pretty much anything you can point your camera at shares enough similarity in their distribution that we pretty much have universal compression algorithms for real world data.

It's suprisingly good, like it's it's 100% worth watching if you liked scrubs.

I have a very peculiar coin. With 1% probability it turns up heads and with 99% probability it turns up tails.

A string of flips is random, but it's very compressible.

TimesFM

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.

Paper: A decoder-only foundation model for time-series forecasting, ICML 2024.
All checkpoints: TimesFM Hugging Face Collection.
Google Research blog.
TimesFM in BigQuery: an official Google product.

This open version is not an officially supported Google product.

Latest Model Version: TimesFM 2.5

Archived Model Versions:

1.0 and 2.0: relevant code archived in the sub directory v1. You can pip install timesfm==1.3.0 to install an older version of this package to load them.

Update - Oct. 29, 2025

Added back the covariate support through XReg for TimesFM 2.5.

Update - Sept. 15, 2025

TimesFM 2.5 is out!

Comparing to TimesFM 2.0, this new 2.5 model:

uses 200M parameters, down from 500M.
supports up to 16k context length, up from 2048.
supports continuous quantile forecast up to 1k horizon via an optional 30M quantile head.
gets rid of the frequency indicator.
has a couple of new forecasting flags.

Along with the model upgrade we have also upgraded the inference API. This repo will be under construction over the next few weeks to

add support for an upcoming Flax version of the model (faster inference).
add back covariate support.
populate more docstrings, docs and notebook.

Install

Clone the repository:

git clone https://github.com/google-research/timesfm.git
cd timesfm

Create a virtual environment and install dependencies using uv:

# Create a virtual environment
uv venv

# Activate the environment
source .venv/bin/activate

# Install the package in editable mode with torch
uv pip install -e .[torch]
# Or with flax
uv pip install -e .[flax]
# Or XReg is needed
uv pip install -e .[xreg]

[Optional] Install your preferred torch / jax backend based on your OS and accelerators (CPU, GPU, TPU or Apple Silicon).:

Install PyTorch.
Install Jax for Flax.

Code Example

import torch
import numpy as np
import timesfm

torch.set_float32_matmul_precision("high")

model = timesfm.TimesFM_2p5_200M_torch.from_pretrained("google/timesfm-2.5-200m-pytorch")

model.compile(
    timesfm.ForecastConfig(
        max_context=1024,
        max_horizon=256,
        normalize_inputs=True,
        use_continuous_quantile_head=True,
        force_flip_invariance=True,
        infer_is_positive=True,
        fix_quantile_crossing=True,
    )
)
point_forecast, quantile_forecast = model.forecast(
    horizon=12,
    inputs=[
        np.linspace(0, 1, 100),
        np.sin(np.linspace(0, 20, 67)),
    ],  # Two dummy inputs
)
point_forecast.shape  # (2, 12)
quantile_forecast.shape  # (2, 12, 10): mean, then 10th to 90th quantiles.

> It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.

Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.

Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.

I'm sure people will be doing mechanical interpretation on these models to extract what they pattern match for prediction.

Personally, coming from an EE background and not finance or statistics, I would go about identifying these patterns with an Signals & Systems toolbox, like systems identification, various matched filters/classifiers.

This might be a totall wrong approach, but I think it might make sense to try to model a matched filter based on previous stock selloff/bullrun trigger events, and then see if the it has any predictive ability, likewise the market reaction seems to be usually some sort of delayed impulse-like activity, with the whales reacting quickly, and then a distribution of less savvy investors following up the signal with various delays.

I'm sure other smarter people have explored this approach much more in depth before me.

> Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.

Or, you know, maybe they aren't. Thermometers and photon counts are related to weather sometimes, but not holidays. Holidays are related to traffic sensors and to markets, but not Geiger counters.

> Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.

Prices are the opposite, up like a shot during shocks, falling slowly like a feather. So that particular pattern seems like a great example of over-fitting danger and why you wouldn't expect mixing series of different types to be work very well.

> Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.

Or, you know, maybe they aren't. Thermometers and photon counts are related to weather sometimes, but not holidays. Holidays are related to traffic sensors and to markets, but not Geiger counters.

> Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.

Instead of willing to bet, you can do it yourself and prove it. It is not like there is a ceiling for doing what you are proposing. I am willing to bet that you are wrong.

No. Prophet is based on curve-fitting.

I'm sure other smarter people have explored this approach much more in depth before me.

You're crafting features. The modern approach to ML (deep learning) is to use over-parameterized models and let them learn the features. Perhaps you remember this? https://www.nytimes.com/2012/06/26/technology/in-a-big-netwo...

Hacker Times

Hacker Times

Google's 200M-parameter time-series foundation model with 16k context

Discussion

Discussion

TimesFM

Update - Oct. 29, 2025

Update - Sept. 15, 2025

Install

Code Example