Why I wrote this: After maintaining aijsons.com for 18 months, I learned that static sites fail quietly. A broken sitemap returns 500. A blog index lists 105 articles while only 28 exist. A redirect glob pattern silently fails to match. None of these throw errors during git push — they only surface when Google crawlers or AdSense reviewers hit dead links. We previously ran a GitHub Actions workflow that auto-generated blog posts daily; disabling it was the right call, but it left a gap: we still needed automated quality gates before deploy. This article documents the CI checks I wish we had from day one, and the workflow structure we use now.

What "CI" Means for a Static Site

There is no compile step for aijsons.com — no TypeScript build, no test suite in the traditional sense. GitHub Pages serves HTML files directly from the main branch. CI for this stack is not about catching type errors; it is about catching structural drift between files that should stay in sync:

  • sitemap-blog.xml URLs must match directories under blog/
  • blog/index.html links must not point to deleted articles
  • _redirects rules must not exceed Cloudflare's 100-rule limit
  • XML sitemaps must parse without errors

These are the failures that caused our AdSense "低价值内容" review problems — not missing unit tests.

The Workflow We Retired (And Why)

Our repository still contains a disabled workflow at .github/workflows/daily-content.yml.disabled. It ran nightly, called an OpenAI API to generate blog posts, and auto-committed them:

on:
  schedule:
    - cron: "15 2 * * *"
  workflow_dispatch:

jobs:
  publish-content:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: python automation/content_automation.py
      - run: git commit -m "chore: daily blog/news automation" && git push

This produced ~97 JSON-themed articles in three months — nearly identical structure, titles ending in "in 2026", one per day. Google classified the site as programmatic SEO. We disabled the workflow in May 2026 and pruned the blog to 28 curated articles. The lesson: automate quality checks, not content creation.

The CI Workflow We Use Now

Our replacement workflow runs on every push to main and on pull requests. It performs four checks before GitHub Pages deploys:

name: Site Quality Checks

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  contents: read

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Validate XML sitemaps
        run: |
          sudo apt-get install -y libxml2-utils
          xmllint --noout sitemap.xml
          xmllint --noout sitemap-pages.xml
          xmllint --noout sitemap-tools.xml
          xmllint --noout sitemap-blog.xml

      - name: Count _redirects rules
        run: |
          RULES=$(grep -v '^#' _redirects | grep -v '^$' | wc -l)
          echo "Redirect rules: $RULES / 100"
          if [ "$RULES" -gt 100 ]; then
            echo "ERROR: Cloudflare Pages allows max 100 redirect rules"
            exit 1
          fi

      - name: Verify blog URLs in sitemap exist on disk
        run: |
          python3 - <<'PY'
          import re, sys, pathlib
          xml = pathlib.Path("sitemap-blog.xml").read_text()
          urls = re.findall(r"<loc>https://www\.aijsons\.com(/blog/[^/]+/)</loc>", xml)
          missing = [u for u in urls if not (pathlib.Path(".") / u.strip("/")).exists()]
          if missing:
              print("Missing blog directories:", *missing, sep="\n  ")
              sys.exit(1)
          print(f"OK: {len(urls)} blog URLs verified")
          PY

      - name: Check ads.txt present
        run: test -f ads.txt && grep -q "google.com" ads.txt

None of these steps require a build toolchain. They run in under 30 seconds on GitHub's free tier. The critical insight: each check maps to a real incident we hit in production.

Check 1: Sitemap XML Validation

Our sitemap returned HTTP 500 for weeks because of a malformed XML comment in an auto-generated file. xmllint --noout catches unclosed tags, invalid characters, and encoding issues before deploy. This single line would have saved hours of Search Console debugging.

We split sitemaps into an index plus three sub-files (pages, tools, blog) — see our deployment guide for the rationale. CI validates all four files on every push.

Check 2: Redirect Rule Budget

Cloudflare Pages enforces a hard limit of 100 rules in _redirects. During our content prune, we accumulated ~90 rules merging old blog URLs. Without a counter, we would have silently failed to add new rules — duplicate tool pages would stay indexed alongside their canonical versions.

The CI step prints the current count and fails the build if it exceeds 100, forcing us to remove stale rules before adding new ones.

Check 3: Sitemap ↔ Filesystem Consistency

This is the check that would have prevented our worst SEO incident: the blog index listing 105 articles while sitemap-blog.xml contained 28. The Python script extracts every <loc> URL from sitemap-blog.xml and verifies the corresponding directory exists under blog/.

Real incident: We deleted 70 blog directories in one commit but updated only sitemap-blog.xml, not blog/index.html. Users clicking from the blog page got 404s. Google Search Console reported hundreds of "Not found" crawl errors within 48 hours.

Extended version (recommended): also parse blog/index.html for href="/blog/..." links and verify each target exists. We added this after the incident.

Check 4: ads.txt Presence

AdSense requires ads.txt at the domain root with the correct publisher ID. Our file is two lines:

google.com, pub-1902338749101993, DIRECT, f08c47fec0942fa0

Accidentally deleting this file during a repo cleanup would break AdSense authorization. A one-line CI check prevents that class of mistake entirely.

GitHub Pages Deploy Integration

GitHub Pages deploys automatically when you push to main — no separate deploy job needed if Pages is configured to use GitHub Actions or branch deploy. Our CI job runs before Pages picks up the commit because both trigger on the same push event.

If you want an explicit gate — deploy only when CI passes — use a GitHub Environment with required status checks:

  1. Settings → Environments → Create "production"
  2. Add required check: "Site Quality Checks / validate"
  3. Configure Pages to deploy through the environment

For a solo-maintainer site like aijsons.com, running checks on every push without a deploy gate is sufficient. Failed checks send an email notification; I fix before the next push.

Optional Checks Worth Adding

As the site grows, these checks pay for themselves:

Check Tool Catches
Internal link crawl linkinator, lychee Related-article 404s on live pages
HTML validation html-validate Unclosed tags, duplicate IDs
Canonical URL check custom script non-www canonicals after domain migration
File size budget find + du Accidentally committed large binaries

What We Deliberately Do NOT Automate

  • Content generation — every article is hand-written; see our Editorial Policy
  • Sitemap regeneration — we update sitemap XML manually in the same commit as content changes, keeping the diff reviewable
  • Auto-merge dependabot PRs — static sites have few dependencies; manual review is fine

The goal of CI is to protect editorial quality, not to replace human judgment about what to publish.

Debugging Failed CI Runs

When a check fails, GitHub Actions shows the exact step and log output. Common fixes:

  • xmllint error → open the sitemap file, look for unescaped & or unclosed tags
  • Missing blog directory → either create the directory or remove the URL from sitemap-blog.xml in the same commit
  • Redirect count > 100 → audit _redirects for rules targeting already-deleted paths; remove them
  • ads.txt missing → restore from git history: git checkout HEAD~1 -- ads.txt

Key Takeaways

  • Static site CI should validate structure (sitemaps, links, redirects), not run traditional build/test pipelines.
  • Automate quality gates, not content creation — our daily auto-generator caused an AdSense rejection.
  • Verify sitemap URLs match filesystem directories on every push.
  • Track _redirects rule count against Cloudflare's 100-rule limit.
  • Protect ads.txt with a CI presence check if you use AdSense.
  • Keep CI fast (<60 seconds) so it does not discourage frequent small commits.