Quick request: unsloth quants; bit per bit usually better. Or more generally UI for huggingface model selections. I understand you won't be able to serve everything, but I want to mix and match!
Also - grounding:
"open safari" (safari opens, voice says: "I opened safari") "navigate to google.com in safari" (nothing happens, voice says: "I navigated to google.com")
Anyway, really fun.
Not sure why they decided to reinvent the wheel and write yet another ML engine (MetalRT) which is proprietary. I would most likely bet on CoreML since it have support for ANE (apple NPU) or MLX.
Other popular repos for such tasks I would recommend:
https://github.com/FluidInference/FluidAudio
https://github.com/DePasqualeOrg/mlx-swift-audio
I think this has to be the future for AI tools to really be truly useful. The things that are truly powerful are not general purpose models that have to run in the cloud, but specialized models that can run locally and on constrained hardware, so they can be embedded.
I'd love to see this able to be added in-path as an audio passthrough device so you can add on-device native transcriptioning into any application that does audio, such as in video conferencing applications.
And sorry to say but I don't think that Lets go!! is a valid comment, this makes me even more suspicious.
Especially given the history and suspicions I already had.
Either way, this is a tremendous achievement and it's extremely relevant in the OpenClaw world where I might not want to have sensitive information leave my computer.
How does the RAG fit in, a voice-to-RAG seems a bit random as a feature?
I don’t mean to come across as dismissive, I’m genuinely confused as to what you’re offering.
they are a company that registers domains similar to their main one, and then uses those domains to spam people they scrape off of github without affecting their main domain reputation.
edit: here is the post https://news.ycombinator.com/item?id=47163885
----
edit2: it appears that RunAnywhere is getting damage-control help by dang or tom.
this comment, at this time, has 23 upvotes yet is below 2 grey comments (i.e. <=0 upvotes) that were posted at roughly the same time (1 before, 1 after) -- strong evidence of artificial ordering by the moderators. gross.
Before I install, is there any telemetry enabled here or is this entirely local by default?
I was curious so I did some more research within the company to find more shady stuff going on like intentionally buying new domains a month prior to send that spam to not have the mail reputation of their website down. You can read my comment here[2]
Just to be on the safe side here, @dang (yes pinging doesn't work but still), can you give us some average stats of who are the people who upvoted this and an internal investigation if botting was done. I can be wrong about it and I don't ever mean to harm any company but I can't in good faith understand this. Some stats
Some stats I would want are: Average Karma/Words written/Date of the accounts who upvoted this post. I'd also like to know what the conclusion of internal investigation (might be) if one takes place.
[There is a bit of conflicts of interest with this being a YC product but I think that I trust hackernews moderator and dang to do what's right yeah]
I am just skeptical, that's all, and this is my opinion. I just want to provide some historical context into this company and I hope that I am not extrapolating too much.
It's just really strange to me, that's all.
[0]: https://news.social-protocols.org/stats?id=47326101 (see the expected upvotes vs real upvotes and the context of this app and negative reception and everything combined)
[1]: Tell HN: YC companies scrape GitHub activity, send spam emails to users: https://news.ycombinator.com/item?id=47163885
What about for on-device RAG use cases?
Seems pretty clear. You can supply documents to the model as input and then verbally ask questions about them.
In other words, your perception wasn't wrong, but the interpretation was off. I've put "Launch HN" and "YC W26" back in the title to make that clearer - I edited them out earlier, which was my mistake.
As for the booster comments, those are pretty common on launch threads and often pretty innocent - most people who aren't active HN users have no idea that it's against the rules. We do our best to communicate about that, but it's not a cardinal sin—there are far worse offenses.
Maybe its just (n=2) that only we both remember this fiasco but I don't agree with that. I don't really understand how this got so so many upvotes in short frame of time especially given its history of not doing good things to say the very least... I am especially skeptical of it.
Thoughts?
Edit: I looked deeper into Sanchit's Hackernews id to find 3 days ago they posted the same thing as far as I can tell (the difference only being that it had runanywhere.ai domain than github.com/runanywhere but this can very well be because in hackernews you can't have two same links in small period of time so they are definitely skirting that law by pasting github link)
Another point, that post (https://news.ycombinator.com/item?id=47283498) got stuck at 5 points till right now (at time of writing)
So this got a lot more crazier now which is actually wild.
Edit: just reloaded, its fixed now.
What...? It is terrible, even compared to Whisper Tiny, which was released years ago under an Apache 2.0 license so Apple could have adopted it instantly and integrated it into their devices. The bigger Whisper models are far better, and Parakeet TDT V2 (English) / V3 (Multilingual) are quite impressive and very fast.
I have no idea what would make someone say that iOS dictation is good at understanding speech... it is so bad.
For a company that talks so much about accessibility, it is baffling to me that Apple continues to ship such poor quality speech to text with their devices.
Umm, ah, wait no, uhh yes you are. Unless, hang on, you are possessed with greater umm speech capabilities than most, wait nevermind start over. Unless you never make a mistake while talking, you want AI to take out the "three, wait no four" and just leave the output with "four" from what you actually spoke. Depending on your use case.
Talk to your Mac, query your docs, no cloud required.
RCLI is an on-device voice AI for macOS. A complete STT + LLM + TTS pipeline running natively on Apple Silicon — 43 macOS actions via voice, local RAG over your documents, sub-200ms end-to-end latency. No cloud, no API keys.
Powered by MetalRT, a proprietary GPU inference engine built by RunAnywhere, Inc. specifically for Apple Silicon.
Real-time screen recordings on Apple Silicon — no cloud, no edits, no tricks.
Requires macOS 13+ on Apple Silicon (M1 or later).
One command:
curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
Or via Homebrew:
brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup
rcli # interactive TUI (push-to-talk + text)
rcli listen # continuous voice mode
rcli ask "open Safari" # one-shot command
rcli ask "play some jazz on Spotify"
A full STT + LLM + TTS pipeline running on Metal GPU with three concurrent threads:
Control your Mac by voice or text. The LLM routes intent to actions executed locally via AppleScript and shell commands.
| Category | Examples |
|---|---|
| Productivity | create_note, create_reminder, run_shortcut |
| Communication | send_message, facetime_call |
| Media | play_on_spotify, play_apple_music, play_pause, next_track, set_music_volume |
| System | open_app, quit_app, set_volume, toggle_dark_mode, screenshot, lock_screen |
| Web | search_web, search_youtube, open_url, open_maps |
Run rcli actions to see all 43, or toggle them on/off in the TUI Actions panel.
Tip: If tool calling feels unreliable, press X in the TUI to clear the conversation and reset context. With small LLMs, accumulated context can degrade tool-calling accuracy — a fresh context often fixes it.
Index local documents, query them by voice. Hybrid vector + BM25 retrieval with ~4ms latency over 5K+ chunks. Supports PDF, DOCX, and plain text.
rcli rag ingest ~/Documents/notes
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"
A terminal dashboard with push-to-talk, live hardware monitoring, model management, and an actions browser.
| Key | Action |
|---|---|
| SPACE | Push-to-talk |
| M | Models — browse, download, hot-swap LLM/STT/TTS |
| A | Actions — browse, enable/disable macOS actions |
| B | Benchmarks — run STT, LLM, TTS, E2E benchmarks |
| R | RAG — ingest documents |
| X | Clear conversation and reset context |
| T | Toggle tool call trace |
| ESC | Stop / close / quit |
MetalRT is a high-performance GPU inference engine built by RunAnywhere, Inc. specifically for Apple Silicon. It delivers the fastest on-device inference for LLM, STT, and TTS — up to 550 tok/s LLM throughput and sub-200ms end-to-end voice latency.
Apple M3 or later required. MetalRT uses Metal 3.1 GPU features available on M3, M3 Pro, M3 Max, M4, and later chips. M1/M2 support is coming soon. On M1/M2, RCLI automatically falls back to the open-source llama.cpp engine.
MetalRT is automatically installed during rcli setup (choose "MetalRT" or "Both"). Or install separately:
rcli metalrt install
rcli metalrt status
Supported models: Qwen3 0.6B, Qwen3 4B, Llama 3.2 3B, LFM2.5 1.2B (LLM) · Whisper Tiny/Small/Medium (STT) · Kokoro 82M with 28 voices (TTS)
MetalRT is distributed under a proprietary license. For licensing inquiries: founder@runanywhere.ai
RCLI supports 20+ models across LLM, STT, TTS, VAD, and embeddings. All run locally on Apple Silicon. Use rcli models to browse, download, or switch.
LLM: LFM2 1.2B (default), LFM2 350M, LFM2.5 1.2B, LFM2 2.6B, Qwen3 0.6B, Qwen3.5 0.8B/2B/4B, Qwen3 4B
STT: Zipformer (streaming), Whisper base.en (offline, default), Parakeet TDT 0.6B (~1.9% WER)
TTS: Piper Lessac/Amy, KittenTTS Nano, Matcha LJSpeech, Kokoro English/Multi-lang
Default install (rcli setup): ~1GB — LFM2 1.2B + Whisper + Piper + Silero VAD + Snowflake embeddings.
rcli models # interactive model management
rcli upgrade-llm # guided LLM upgrade
rcli voices # browse and switch TTS voices
rcli cleanup # remove unused models
Mic → VAD → STT → [RAG] → LLM → TTS → Speaker
|
Tool Calling → 43 macOS Actions
Three dedicated threads in live mode, synchronized via condition variables:
| Thread | Role |
|---|---|
| STT | Captures audio, runs VAD, detects speech endpoints |
| LLM | Generates tokens, dispatches tool calls |
| TTS | Double-buffered sentence-level synthesis and playback |
Key design decisions:
src/
engines/ STT, LLM, TTS, VAD, embedding engine wrappers
pipeline/ Orchestrator, sentence detector, text sanitizer
rag/ Vector index, BM25, hybrid retriever
core/ Types, ring buffer, memory pool, hardware profiler
audio/ CoreAudio mic/speaker I/O
tools/ Tool calling engine with JSON schema definitions
actions/ 43 macOS action implementations
api/ C API (rcli_api.h)
cli/ TUI dashboard (FTXUI), CLI commands
models/ Model registries with on-demand download
CPU-only build using llama.cpp + sherpa-onnx (no MetalRT):
git clone https://github.com/RunanywhereAI/RCLI.git && cd RCLI
bash scripts/setup.sh
bash scripts/download_models.sh
mkdir -p build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(sysctl -n hw.ncpu)
./rcli
All dependencies are vendored or CMake-fetched. Requires CMake 3.15+ and Apple Clang (C++17).
rcli Interactive TUI (push-to-talk + text + trace)
rcli listen Continuous voice mode
rcli ask <text> One-shot text command
rcli actions [name] List actions or show detail
rcli rag ingest <dir> Index documents for RAG
rcli rag query <text> Query indexed documents
rcli models [llm|stt|tts] Manage AI models
rcli voices Manage TTS voices
rcli bench [--suite ...] Run benchmarks
rcli setup Download default models
rcli info Show engine and model info
Options:
--models <dir> Models directory (default: ~/Library/RCLI/models)
--rag <index> Load RAG index for document-grounded answers
--gpu-layers <n> GPU layers for LLM (default: 99 = all)
--ctx-size <n> LLM context size (default: 4096)
--no-speak Text output only (no TTS)
--verbose, -v Debug logs
Contributions welcome. See CONTRIBUTING.md for build instructions and how to add new actions, models, or voices.
RCLI is open source under the MIT License.
MetalRT is proprietary software by RunAnywhere, Inc., distributed under a separate license.
Built by RunAnywhere, Inc.
Clearly I am not the only one here as john_strinlai here seems to have had somewhat of the same conclusion as me.
Dang I know you care about this community so can you please talk more what you think about this in particular as well.
I understand that YC companies get preferential treatment, Fine by me. But this feels something larger to me
I have written everything that I could find in this thread from the same post being shown here 3 days ago in anywhere.ai link to now changing to github to skirt off HN rule that same link can't be posted in short period of time and everything.
This feels somewhat intentional just like the spam issue, I hope you understand what I mean.
(If you also feel suspicious, Can you then do a basic analysis/investigiation with all of these suspicious points in mind and everything please as well and upload the results in an anonymous way if possible?)
I wish you to have a nice day and waiting for your thoughts on all of this.
what i do know is that their name is etched into my mind under the category of "shady, never do business with them".
https://news.ycombinator.com/item?id=47326953 is grey (i.e <=0 karma). my top-level comment is at 14 karma. we posted within 15 minutes of each other. their comment is higher up the page. ive never seen something like that before.
the two posts calling out unethical behavior have been living at the bottom of this post the entire time, until a couple of actually [flagged] comments ended up under them.
i do not care about the karma itself, at all. but i do care to know if launch/show posts have comment sections with cherry-picked ordering or organic ordering.
edit 2: i am at 19 points, and now below two grey (<=0 karma) comments (https://news.ycombinator.com/item?id=47326455). whats up dang?
Maybe you just don’t know what you’re missing? Google’s default speech to text is still bad compared to Whisper and Parakeet, but even Google’s is markedly better than Apple’s.
I cannot think of a single speech to text system that I’ve run into in the past 5 years that is less accurate than the one Apple ships.
I was writing the comment at time of 18 upvotes and then it went to 24 upvote all of a sudden that I had gone suspicious.
see at 2026-03-10T17:38-39:00Z timeframe within this particular graph(0)