Why I wrote this: After maintaining aijsons.com for 18 months, I learned that static sites fail quietly. A broken sitemap returns 500. A blog index lists 105 articles while only 28 exist. A redirect glob pattern silently fails to match. None of these throw errors during git push — they only surface when Google crawlers or AdSense reviewers hit dead links. We previously ran a GitHub Actions workflow that auto-generated blog posts daily; disabling it was the right call, but it left a gap: we still needed automated quality gates before deploy. This article documents the CI checks I wish we had from day one, and the workflow structure we use now.
What "CI" Means for a Static Site
There is no compile step for aijsons.com — no TypeScript build, no test suite in the traditional sense. GitHub Pages serves HTML files directly from the main branch. CI for this stack is not about catching type errors; it is about catching structural drift between files that should stay in sync:
sitemap-blog.xmlURLs must match directories underblog/blog/index.htmllinks must not point to deleted articles_redirectsrules must not exceed Cloudflare's 100-rule limit- XML sitemaps must parse without errors
These are the failures that caused our AdSense "低价值内容" review problems — not missing unit tests.
The Workflow We Retired (And Why)
Our repository still contains a disabled workflow at .github/workflows/daily-content.yml.disabled. It ran nightly, called an OpenAI API to generate blog posts, and auto-committed them:
on:
schedule:
- cron: "15 2 * * *"
workflow_dispatch:
jobs:
publish-content:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: python automation/content_automation.py
- run: git commit -m "chore: daily blog/news automation" && git push
This produced ~97 JSON-themed articles in three months — nearly identical structure, titles ending in "in 2026", one per day. Google classified the site as programmatic SEO. We disabled the workflow in May 2026 and pruned the blog to 28 curated articles. The lesson: automate quality checks, not content creation.
The CI Workflow We Use Now
Our replacement workflow runs on every push to main and on pull requests. It performs four checks before GitHub Pages deploys:
name: Site Quality Checks
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
contents: read
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate XML sitemaps
run: |
sudo apt-get install -y libxml2-utils
xmllint --noout sitemap.xml
xmllint --noout sitemap-pages.xml
xmllint --noout sitemap-tools.xml
xmllint --noout sitemap-blog.xml
- name: Count _redirects rules
run: |
RULES=$(grep -v '^#' _redirects | grep -v '^$' | wc -l)
echo "Redirect rules: $RULES / 100"
if [ "$RULES" -gt 100 ]; then
echo "ERROR: Cloudflare Pages allows max 100 redirect rules"
exit 1
fi
- name: Verify blog URLs in sitemap exist on disk
run: |
python3 - <<'PY'
import re, sys, pathlib
xml = pathlib.Path("sitemap-blog.xml").read_text()
urls = re.findall(r"<loc>https://www\.aijsons\.com(/blog/[^/]+/)</loc>", xml)
missing = [u for u in urls if not (pathlib.Path(".") / u.strip("/")).exists()]
if missing:
print("Missing blog directories:", *missing, sep="\n ")
sys.exit(1)
print(f"OK: {len(urls)} blog URLs verified")
PY
- name: Check ads.txt present
run: test -f ads.txt && grep -q "google.com" ads.txt
None of these steps require a build toolchain. They run in under 30 seconds on GitHub's free tier. The critical insight: each check maps to a real incident we hit in production.
Check 1: Sitemap XML Validation
Our sitemap returned HTTP 500 for weeks because of a malformed XML comment in an auto-generated file. xmllint --noout catches unclosed tags, invalid characters, and encoding issues before deploy. This single line would have saved hours of Search Console debugging.
We split sitemaps into an index plus three sub-files (pages, tools, blog) — see our deployment guide for the rationale. CI validates all four files on every push.
Check 2: Redirect Rule Budget
Cloudflare Pages enforces a hard limit of 100 rules in _redirects. During our content prune, we accumulated ~90 rules merging old blog URLs. Without a counter, we would have silently failed to add new rules — duplicate tool pages would stay indexed alongside their canonical versions.
The CI step prints the current count and fails the build if it exceeds 100, forcing us to remove stale rules before adding new ones.
Check 3: Sitemap ↔ Filesystem Consistency
This is the check that would have prevented our worst SEO incident: the blog index listing 105 articles while sitemap-blog.xml contained 28. The Python script extracts every <loc> URL from sitemap-blog.xml and verifies the corresponding directory exists under blog/.
Real incident: We deleted 70 blog directories in one commit but updated only sitemap-blog.xml, not blog/index.html. Users clicking from the blog page got 404s. Google Search Console reported hundreds of "Not found" crawl errors within 48 hours.
Extended version (recommended): also parse blog/index.html for href="/blog/..." links and verify each target exists. We added this after the incident.
Check 4: ads.txt Presence
AdSense requires ads.txt at the domain root with the correct publisher ID. Our file is two lines:
google.com, pub-1902338749101993, DIRECT, f08c47fec0942fa0
Accidentally deleting this file during a repo cleanup would break AdSense authorization. A one-line CI check prevents that class of mistake entirely.
GitHub Pages Deploy Integration
GitHub Pages deploys automatically when you push to main — no separate deploy job needed if Pages is configured to use GitHub Actions or branch deploy. Our CI job runs before Pages picks up the commit because both trigger on the same push event.
If you want an explicit gate — deploy only when CI passes — use a GitHub Environment with required status checks:
- Settings → Environments → Create "production"
- Add required check: "Site Quality Checks / validate"
- Configure Pages to deploy through the environment
For a solo-maintainer site like aijsons.com, running checks on every push without a deploy gate is sufficient. Failed checks send an email notification; I fix before the next push.
Optional Checks Worth Adding
As the site grows, these checks pay for themselves:
| Check | Tool | Catches |
|---|---|---|
| Internal link crawl | linkinator, lychee | Related-article 404s on live pages |
| HTML validation | html-validate | Unclosed tags, duplicate IDs |
| Canonical URL check | custom script | non-www canonicals after domain migration |
| File size budget | find + du | Accidentally committed large binaries |
What We Deliberately Do NOT Automate
- Content generation — every article is hand-written; see our Editorial Policy
- Sitemap regeneration — we update sitemap XML manually in the same commit as content changes, keeping the diff reviewable
- Auto-merge dependabot PRs — static sites have few dependencies; manual review is fine
The goal of CI is to protect editorial quality, not to replace human judgment about what to publish.
Debugging Failed CI Runs
When a check fails, GitHub Actions shows the exact step and log output. Common fixes:
- xmllint error → open the sitemap file, look for unescaped
&or unclosed tags - Missing blog directory → either create the directory or remove the URL from sitemap-blog.xml in the same commit
- Redirect count > 100 → audit
_redirectsfor rules targeting already-deleted paths; remove them - ads.txt missing → restore from git history:
git checkout HEAD~1 -- ads.txt
Key Takeaways
- Static site CI should validate structure (sitemaps, links, redirects), not run traditional build/test pipelines.
- Automate quality gates, not content creation — our daily auto-generator caused an AdSense rejection.
- Verify sitemap URLs match filesystem directories on every push.
- Track
_redirectsrule count against Cloudflare's 100-rule limit. - Protect
ads.txtwith a CI presence check if you use AdSense. - Keep CI fast (<60 seconds) so it does not discourage frequent small commits.