Tayf: real-time Turkish news media-bias analyser
"Ground News for Turkey." Ingests 144 outlets across the political spectrum, clusters the same story with a 3-method ensemble, and shows who is framing it which way. "Aynı haber, farklı dünyalar."
00Overview
Turkish media is heavily polarised: depending on which outlet you open, the same event arrives pre-framed. Tayf is a "Ground News for Turkey": it ingests RSS from 144 outlets spanning pro-government, opposition, nationalist, Kurdish, Islamist-conservative, state-media and international sources, clusters articles describing the same story, and renders the spread so a reader sees every frame at once.
Co-founded January 2026 (AI SaaS, Istanbul/remote). The rarest signal at new-grad level: a real, production-grade shipped product, averaging ~2,200 monthly site visits as of June 2026. Early stage, stated plainly. Alongside clustering, the pipeline runs news sentiment analysis and who-said-what views across bias zones. Same Next.js 16 / React 19 / Supabase / Vercel stack as this portfolio.
01The problem
Readers see one frame at a time. A protest is "chaos" in one outlet and "a democratic awakening" in another; a court ruling is "justice served" or "a political purge." Without seeing the distribution, you cannot tell whether you are reading the consensus, a blindspot, or an outlier. Tayf's job is to make that distribution glanceable.
02Live demo: bias distribution
Pick a sample clustered story. The bias-distribution bar rolls 10 source-level categories up into 3 Medya DNA zones. This is the component that leads every card on Tayf: a distribution, not a logo grid.
03How it works: event-driven worker stream
No long-running workers. Everything is event-driven: ingestion is a cron-triggered Edge Function, and a Postgres AFTER INSERT trigger fans new articles into pgmq queues that serverless consumers drain.
image-consumer
04Clustering: a 3-method ensemble
Matching the same story across outlets that deliberately word it differently is hard. No single method is robust, so Tayf votes across three:
- Turkish character 4-gram fingerprint: catches near-duplicate phrasings and wire copy, and copes with Turkish morphology.
- TF-IDF cosine over a 48h window: semantic overlap of vocabulary, time-boxed so unrelated old stories don't collide.
- Entity-overlap heuristic: shared named entities (people, places, institutions) confirm two articles are about the same event.
An ensemble beats any single method: the 4-gram catches re-worded copy that TF-IDF misses, TF-IDF catches paraphrases the 4-gram misses, and entity overlap vetoes false positives where vocabulary coincides but the subject differs.
05Signature features
Bias-distribution bar
Every cluster leads with the spread of coverage: a distribution, never a logo grid.
Blindspot detection
Flags stories covered by only one side of the spectrum: what your usual feed hides.
Cross-spectrum surprise
Catches an outlet framing a story against its own bias: the most informative signal.
06Stack & engineering decisions
The core decision was no long-running workers. On a serverless platform, a daemon polling queues burns money and complicates deploys. Instead, work is triggered by data: a Postgres AFTER INSERT trigger enqueues into pgmq, and Edge Functions drain queues on cron ticks. The system scales to zero between ticks and has no process to babysit.