${title}

Currency Converter

${title}

...

``` --- ## Schema Markup by Playbook Match schema type to content type — don't use a generic WebPage schema when a more specific type exists: | Playbook | Schema type | Key properties | |---|---|---| | Glossary / definitions | `DefinedTerm`+`DefinedTermSet`|`name`,`description`,`inDefinedTermSet` | | How-to guides | `HowTo`|`step[]`with`HowToStep`,`tool`,`supply` | | FAQ sections | `FAQPage`|`mainEntity[]`with`Question`+`Answer` | | Comparisons | `ItemList`or`Review`|`itemListElement[]`,`ratingValue` | | Data profiles | `Dataset`or`Place`|`name`,`description`,`spatialCoverage` | | Recipes/templates | `CreativeWork`|`name`,`description`,`genre` | Multiple schema types can appear on the same page (e.g., a HowTo guide can also have a FAQPage block at the bottom). --- ## Internationalization and Locale Handling Only add multilingual pSEO when you have **genuinely translated data** — not machine-translated boilerplate. Machine-translating 10,000 thin pages at once is one of the fastest ways to trigger a spam manual action. ### URL structure — subfolders win | Approach | Example | SEO verdict | |---|---|---| | Subfolder | `yoursite.com/es/glosario/presupuesto` | **Preferred** — consolidates domain authority | | ccTLD | `yoursite.es/glosario/presupuesto` | Strong signal but costly to maintain | | Subdomain | `es.yoursite.com/...` | Treated as separate site; avoid for pSEO | | Query param | `yoursite.com/glossary?lang=es` | Not crawlable reliably; never use | ### hreflang implementation Every page that has a locale variant must declare all variants (including `x-default`) in``: ```html ``` Rules: - Every page in the set must list **all** pages in the set (bidirectional) - Omitting a language from even one page breaks the cluster for Google - `x-default` should point to the most general version (usually English or a language-selector page) - Validate with Google Search Console → Enhancements → International targeting ### Data structure for multilingual pages Store translations as a map keyed by locale so the generator stays clean: ```typescript const TERM = { slug: { en: "budget", es: "presupuesto" }, name: { en: "Budget", es: "Presupuesto" }, definition: { en: "...", es: "..." }, }; function getTermHtml(slug: string, locale: "en" | "es"): string { ... } ``` ### Machine-translation risk Machine-translated prose is detectable by Google and often results in near-duplicate content across locales. Only translate if: - A human reviews/edits the output, OR - The page is primarily structured data (tables, numbers, steps) with minimal prose --- ## Content Freshness Signals Google rewards pages that are genuinely kept up to date. Stale pSEO pages — especially data-driven ones — gradually lose rankings as fresher competitors appear. ### `dateModified` in schema Every page with structured data should carry a timestamp: ```json { "@type": "WebPage", "datePublished": "2024-01-15", "dateModified": "2025-03-01" } ``` **Only update `dateModified`when content actually changes.** Bumping it on a schedule without real changes is a deception signal that Google penalises. If you auto-regenerate pages, tie`dateModified`to the data source's`updatedAt`, not the deploy timestamp. ### When and how to regenerate | Data type | Trigger strategy | |---|---| | Exchange rates / prices | SSR with short cache (`s-maxage=60`); no static generation | | Product/company data | Webhook from data source on change → re-render and invalidate CDN cache | | Weekly-updated datasets | Nightly cron job → regenerate changed pages → update sitemap `` | | Static reference content (glossary, guides) | Regenerate on code deploy only | For Express SSR, the page is always fresh — no regeneration needed. The cost is latency; mitigate with a CDN in front. For ISR (Next.js), trigger on-demand revalidation via the revalidation API when upstream data changes: ```typescript // When data changes, invalidate the specific path await fetch(`/api/revalidate?secret=TOKEN&path=/glossary/${slug}`, { method: "POST" }); ``` ### Signals that indicate staleness to Google - Dates in page content (e.g., "Updated January 2022") that are old - Prices or rates that no longer match the live source - Links to pages that now 404 - Schema `dateModified` older than 12 months for competitive queries --- ## Crawl Budget Management at Scale Crawl budget is the number of pages Googlebot will crawl on your site per day. For sites with 10k+ URLs it becomes a real constraint — Googlebot will simply stop crawling before it reaches all your pages, and newly added pages may take weeks to be discovered. ### What wastes crawl budget - **Faceted navigation**: Filter/sort URLs like `?color=red&size=M` that produce near-duplicate pages - **Pagination**: `/page/2`,`/page/3`... where the same products appear on multiple pages - **Session IDs / tracking params**: `?sessionid=abc123` creating millions of unique URLs - **Internal search results**: `/search?q=anything` — never crawlable, always thin ### Solutions **Block via `robots.txt`** (prevents crawling, doesn't affect indexation of already-indexed pages): ```text User-agent: Googlebot Disallow: /search Disallow: /*?sessionid= ``` **`noindex` on parameterized URLs** (crawled but not indexed — use when the URL must exist for users): ```html ``` **Canonical for near-duplicates** (one sorted/filtered variant is canonical; others point to it): ```html ``` **Pagination**: Use `rel="next"`/`rel="prev"` or consolidate paginated content into one long page. Don't noindex paginated pages if they contain unique products not on page 1. ### Diagnosing crawl budget problems In Google Search Console → Settings → Crawl stats: - **Crawled pages/day declining** → Googlebot is throttling due to slow responses or crawl errors - **Response codes: high 404 rate** → fix broken links; they waste crawl budget - **Crawled but not indexed** rising → thin content or crawl budget exhausted before quality assessment For large sites, use separate sitemaps per page type (`sitemap-glossary.xml`,`sitemap-guides.xml`) and prioritize high-value pages in the first sitemap. --- ## AI-Generated Content Policy ### What Google actually detects Google does not need to determine whether content was written by AI or a human. It identifies a **pattern**: low-cost mass production where pages look different on the surface but are essentially repetitive underneath. The signals of this pattern include: - Many pages with similar titles and angles - Smooth, grammatically correct prose with no new information - Comprehensive coverage that is really just reorganized common knowledge - Each page is readable, but after reading it you've learned nothing new - High page count with no corresponding user engagement or trust signals **This pattern triggers penalties regardless of authorship.** A human content farm producing the same pattern gets the same treatment. The issue is never "AI wrote this" — it is "this adds nothing." ### The correct framing The question is not: "Is it safe to use AI for SEO content?" The question is: **"Does this page contain enough information gain, search intent match, and trust signals that Google would rank it — regardless of how it was produced?"** If yes, AI is a powerful accelerator. If no, AI just lets you produce worthless pages faster. ### Where AI helps in pSEO | Use case | Why it works | |---|---| | Summarising structured data into readable prose | Each page's data is unique → output is unique | | Generating FAQ blocks from a data object | Templated but data-driven → not boilerplate | | Translating + adapting content (with review) | Saves time; review catches errors | | Writing meta descriptions from page data | Short, formula-driven → low risk | | Structuring raw data into tables and comparisons | Saves formatting time; data does the heavy lifting | | Drafting initial content for human review and enrichment | Faster starting point; human adds the information gain | ### Where AI hurts in pSEO | Anti-pattern | Risk | |---|---| | Generic "overview" paragraphs with no data | Identical across all pages → thin content | | AI-expanded boilerplate ("X is a type of Y that...") | Detectable by pattern; no unique value | | Mass-generating content without data variation | Amplifies the thin content problem at scale | | Fabricating entity-specific facts | Hallucinations become trust/legal liabilities | | Using AI to "rewrite the top 10 results" for a keyword | Produces content with zero information gain — just reshuffled common knowledge | | Letting AI make strategic decisions (keyword selection, page type, whether to publish) | AI will confidently produce content for keywords where the user has no chance of ranking | ### The role boundary **AI is an execution tool, not a strategy tool for SEO.** AI should: - ✅ Draft content from structured data the user provides - ✅ Format, structure, and polish pages - ✅ Generate meta descriptions, FAQ blocks, schema markup - ✅ Help with technical implementation (SSR, sitemaps, caching) AI should NOT: - ❌ Decide which keywords to target - ❌ Determine whether a page has enough information gain to publish - ❌ Replace competitive analysis and intent matching - ❌ Assess whether the user's site has enough authority to compete - ❌ Generate "content" without underlying data that varies per page When the user asks for "SEO content," the agent's first job is to ensure there is a real data source and strategic rationale — not to start generating pages. ### Practical checklist before publishing AI-assisted content - [ ] Does each page's AI output differ meaningfully from every other page? (run a diff on 5 random pages) - [ ] Is the AI prose grounded in real data from the data source, or is it freestanding prose? - [ ] Does each page contain at least one data point or insight not found in the current top 10 results? - [ ] Could a human reviewer catch factual errors before publish? (build a review step for high-stakes verticals) - [ ] Would a user who arrived from Google find this page more useful than the search results they came from? - [ ] Has the page type been matched to the query intent (not just a blog post by default)? If all answers are yes, the content is likely safe. If any is no, rethink the content strategy before scaling. --- ## Performance and Core Web Vitals for SSR Pages pSEO pages are often deprioritized for performance work because they're "just content pages." But Core Web Vitals are a confirmed ranking factor, and a slow pSEO page competes against well-optimized competitors who target the same keyword. ### The three metrics that matter | Metric | Threshold | Common cause in pSEO | |---|---|---| | **LCP** (Largest Contentful Paint) | < 2.5s | Hero image without dimensions/preload; render-blocking CSS | | **CLS** (Cumulative Layout Shift) | < 0.1 | Images without `width`/`height`; fonts causing FOUT | | **INP** (Interaction to Next Paint) | < 200ms | Large JS bundles blocking the main thread | ### For SSR Express pages specifically #### Eliminate render-blocking resources ```html ``` **Inline critical CSS** (the styles needed to render above-the-fold content): - Inline it in `