How I Built a Swedish Crossword Solver with Astro and 400,000+ Words
I recently launched Korsordsakuten — a free Swedish crossword solver — and learned a lot about building SEO-driven content sites with Astro. Here's what I built and what I learned along the way.
What It Does
The site lets you:
- Search 400,000+ Swedish word forms by clue, pattern, or length
- Filter answers by letter count (e.g. "give me 6-letter synonyms only")
- Browse prefix/suffix indexes (words starting with SK, ending with ERA etc.)
- Solve anagrams
- Play a daily Wordle-style word game
The Stack
- Astro 6 (SSR mode, Node adapter) — perfect for content-heavy sites. Each route is server-rendered but the build is still fast.
- Node.js 22 on Render (free tier)
- GitHub Pages as a static sitemap mirror — more on this below
- Zero client-side JS frameworks. Just vanilla JS where needed.
The Data Pipeline
The word database is built from public Swedish word lists, processed through a Node.js pipeline:
scripts/
build-worddb.mjs # Build words.json, synonyms.json, related.json
build-phrases.mjs # Process multi-word crossword entries
build-sitemaps.mjs # Generate 278 sitemap files × 1000 URLs each
publish-sitemaps.mjs # Push sitemaps to GitHub Pages mirror
The synonym/related data is derived from Swedish lexical resources, giving each word a list of crossword-appropriate answers.
The Trickiest Part: Memory on a Free Tier
The site has some large JSON files:
-
clue-index.json— 18 MB -
related.json— 8.5 MB -
synonyms.json— 6.75 MB
Loading all of these at module startup caused OOM crashes on Render's 256 MB free tier. The fix: lazy loading via getter functions.
// Before — crashes on startup
import wordsData from '../data/words.json';
// After — only loads when first accessed
let _words: string[] | null = null;
export function getWords(): string[] {
if (!_words) _words = wordsData as string[];
return _words;
}
Also added --max-old-space-size=460 to the start script (Render allows slightly over the nominal 256 MB before killing the process).
The Sitemap Problem
With 227,000+ URLs, Google wouldn't read our sitemaps. Two issues:
Files were too large — 45,000 URLs × ~120 bytes = 5.4 MB per file. Google has an unofficial ~1 MB soft limit. Fixed by reducing to 1,000 URLs per file (278 files).
The host went down — Render free tier sleeps on inactivity. When it's down, Google can't fetch the sitemap and backs off for weeks.
Solution: Push sitemaps to GitHub Pages as a static mirror. A simple script clones the repo, copies the XML files, rewrites the sitemap index URLs to point to the mirror, and pushes:
// publish-sitemaps.mjs (simplified)
const files = readdirSync(SRC_DIR).filter(f => /^sitemap.*\.xml$/.test(f));
for (const f of files) {
if (f === 'sitemap.xml') {
// Rewrite index so sub-sitemap URLs point to the mirror
const content = readFileSync(join(SRC_DIR, f), 'utf-8');
const rewritten = content.replace(
/https:\/\/www\.korsordsakuten\.se\/(sitemap-\d+\.xml)/g,
'https://sitemaps.korsordsakuten.se/$1'
);
writeFileSync(join(WORK_DIR, f), rewritten);
} else {
copyFileSync(join(SRC_DIR, f), join(WORK_DIR, f));
}
}
A custom subdomain (sitemaps.korsordsakuten.se → GitHub Pages via CNAME) makes the URLs clean. Now Google can always fetch sitemaps even if the main site is sleeping.
Structured Data That Actually Helps
For clue pages (/korsord/[word]), I added QAPage schema instead of just FAQPage:
{
"@type": "QAPage",
"mainEntity": {
"@type": "Question",
"name": "avslutningsvis — korsordssvar",
"answerCount": 8,
"acceptedAnswer": {
"@type": "Answer",
"text": "SLUTLIGEN (9 bokstäver)",
"upvoteCount": 8
},
"suggestedAnswer": []
}
}
This signals to Google that each clue page is structured Q&A content — similar to how Q&A forum sites are interpreted — rather than auto-generated thin content.
For word pages, DefinedTerm with synonyms as alternateName marks the page as a lexical resource:
{
"@type": "DefinedTerm",
"name": "PLATS",
"alternateName": ["STÄLLE", "POSITION", "LÄGE"],
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"name": "Korsordsakuten ordlista"
}
}
Daily Fresh Content
One thing competitor sites have that purely static sites lack: freshness signals. Google re-crawls active sites more frequently.
Solution: /dagens-ledtradar — a page showing 24 curated crossword clues that rotates every day using a deterministic seed:
const dayNum = Math.floor((Date.now() - epoch) / 86400000);
const rng = mulberry32(dayNum * 7919 + 13);
// Pick 24 unique entries from top-2000 clues
Same date = same picks (the page is cacheable), but crawling the URL the next day returns different content. Triggers Google's freshness heuristic without any database or cron job.
What I'd Do Differently
- Start with a paid host. Render free tier sleeping + OOM issues cost weeks of SEO recovery time.
- Plan for JSON size early. Lazy loading was the fix but the problem was predictable.
- Submit 10 hand-picked URLs to GSC from day one. Don't wait for the sitemap crawler to discover everything.
Links
- Site: korsordsakuten.se
- Daily clues: /dagens-ledtradar
Happy to answer questions about any part of the stack!
United States
NORTH AMERICA
Related News
How Braze’s CTO is rethinking engineering for the agentic area
10h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago

Implementing Multicloud Data Sharding with Hexagonal Storage Adapters
15h ago

DeepMind’s CEO Says AGI May Be ~4 Years Away. The Last Three Missing Pieces Are Not What Most People Think.
15h ago

CCSnapshot - A Claude Code Configs Transfer Tool
21h ago