How Cruxwire works, pipeline, scoring, and architecture

The pipeline

What happens on every run

By default the pipeline fires every two hours between 06:00 and 22:00, and once on start. Each run does the same six things.

1

Fetch & filter

Requests every feed, parses RSS/Atom, drops items older than your lookback_hours window and anything matching the blocklists, and de-dupes by URL.

2

Carry forward unread

Unread stories from the previous digest are carried into the new one, so a story you didn't get to doesn't vanish when its feed rotates it out. Read stories are vacated; stale ones are cut.

3

Score, summarize & embed

Each fresh article is scored 0–10 for relevance, given a 1–2 sentence summary and a category, and embedded, all via your local Ollama model. Carried stories reuse their score and only re-embed.

4

Cluster same-story coverage

Articles about the same story across outlets are grouped by cosine similarity above your merge threshold. A story is boosted by how many independent sources cover it.

5

Retain within a band

The pool is pruned to a rank-weighted keep set held inside a floor/ceiling band, so the inbox never goes dry on a slow weekend or floods on a busy news day.

6

Write the digest

The ranked, de-duplicated result is written atomically to digest.json. The frontend picks it up on the next load, no restart, no downtime.

Ranking

How a story rises to the top

A story's position blends three signals: the LLM's relevance score against your category interest descriptions, a cross-source boost for how widely it's covered, and your learned affinity for its source.

The cross-source boost is bounded, min(cap, k · log₂(N sources)), so a genuinely big story climbs without letting one noisy topic swamp the page.

ranking signal

# relevance from your local model (0–10)
score     = llm_relevance(article, category.interest)

# bounded boost for multi-source coverage
boost     = min(BOOST_CAP, BOOST_K * log2(n_sources))

# learned per-source affinity (0.5×–2.0×)
affinity  = source_affinity(article.source)

rank = (score + boost) * affinity

Personalization

It learns quietly, in the background

Two learned signals shape your feed. A per-source affinity multiplier moves with your opens, saves, and dismisses. An embedding-based "taste" vector boosts stories similar to ones you engage with and sinks ones near things you dismiss.

Both are learned automatically and stored per device. Inbox hygiene never wipes them, your learned preferences persist independently of which stories are currently in the digest.

cruxwire.home.lan / settings / categories

Cruxwire · Categories

AIai

AI / machine learning, models, research, tools, industry news

Dev Toolsdevtools

Developer tools, editors, CLIs, platforms, DevOps, cloud-native

Productivityproductivity

Productivity / PKM, Obsidian, PARA, note-taking, workflows

Categories, interest sentences feed the scoring prompt.

State lives in JSON, on a volume

Read state, Read Later, History, and learned source stats persist to state.json on the cruxwire-data volume. The app code is baked into the image, so rebuilds preserve your data and new settings keys fall back to defaults.

Read state syncs across devices

Read / Read Later / History sync through the server, so what you've cleared on your laptop is cleared on your phone. The app is fully usable offline from local cache, then reconciles when it reconnects.

The architecture

Everything ships in one image. The only thing outside the container is your Ollama endpoint and a local volume for state.

single container
│
├─ server.py    HTTP: UI, /digest.json, /state, /settings,
│               /feeds, /status, /refresh
│
├─ pipeline.py  background scheduler (cron-like):
│                 fetch feeds → carry forward unread → score + embed
│                 (Ollama) → cluster → retain → write digest.json
│
└─ /data volume → state.json, feeds.json, digest.json, settings.json
        │
        └─ HTTP → Ollama (OLLAMA_HOST)

server.py

The HTTP server: serves the single-file UI, exposes the state / settings / feeds API, serves the current digest, and prunes state server-side.

pipeline.py

The ingestion pipeline and scheduler. Fetches, carries forward, scores, clusters, and retains, then atomically writes the new digest.

digest.html

A vanilla-JS single-file frontend: Home, Read Later, History, and Settings. Applies your per-device affinity multiplier when ordering.

See the knobs for yourself

Every threshold, schedule, and retention band is documented and editable. The setup guide walks through getting it running and tuning it to your reading.

Homelab setup guide Explore the features

One container. A local model. A pipeline that runs itself.