When 'Google can read JavaScript now' isn't enough: a teardown of a React SPA marketing site in 2026
A line-by-line teardown of a real AI-infrastructure company's React SPA marketing site. Empty raw HTML, soft 404s returning HTTP 200, sitewide duplicate meta descriptions, zero structured data — and why 'Google can render JavaScript now' is a half-truth that quietly costs you traffic.
14 min readIf you're an engineering leader at a dev-tools company, an AI infrastructure platform, or an API-first SaaS, there's a good chance your marketing site is a React SPA — Vite, Create React App, or something similar — that your team built five-to-eight years ago, shipped, and never had a reason to revisit. Conversions weren't bad. The site loaded. Your engineers respected it because they wrote it. There was no fire.
Quietly, though, your marketing pages may not be in Google's index. Or they may be in the index in a degraded form that's worse than not being indexed at all. Either way, the searches that should bring buyers to your pricing page or your developer-experience post are going to your competitors instead — competitors who probably migrated to Next.js the same year you decided not to.
I want to walk through one site in detail to show what this actually looks like. Not to embarrass the team that built it — they built a real product and a real business — but because the failure mode is so widespread, and the specific shape of the failure so different from what most engineers expect, that the only way to make it concrete is to look at one site, in 2026, with current tools.
The site is beam.cloud, an AI-infrastructure platform out of New York. It's a real product with real paying customers (Magellan AI, Geospy, Hooktheory). It is, today, an unusually clear example of several common SEO failure modes that show up in older React SPA marketing sites.
Beam appears to be a legitimate and successful company with real customers and a real engineering team. The purpose of this teardown is not to criticize the product or the engineers who built it, but to illustrate a class of SEO failure modes that are surprisingly common in older React SPA marketing sites — and that are almost always more solvable than they look. I have no relationship with the company. Every finding below is reproducible in your browser in under a minute.
This class of failure — silent search-acquisition loss caused by a frontend architecture chosen years ago — is exactly what I help engineering teams diagnose and migrate. The important part, as you'll see at the end, is that the fix is narrowly scoped: only the public-marketing surfaces move, while the authenticated application stays exactly as it is.
The findings at a glance
I ran my diagnostic tool against beam.cloud on 2026-05-11. The tool uses headless Chromium to capture the raw HTML response (JavaScript disabled), parses meta tags and structured data, checks the sitemap and robots.txt, and runs Lighthouse multiple times per surface. Thirteen public URL templates were analyzed. None of them appear fully optimized for reliable search indexing.
In rough order of severity:
- No
<h1>in the raw HTML of any page. The body of every route's initial response is an empty<div id="root">. - Six of thirteen routes return a
404 • Beamtitle in raw HTML while responding HTTP 200. A textbook soft-404 pattern. - All thirteen pages share an identical meta description. The homepage description, copied sitewide.
- No JSON-LD anywhere. Zero structured data on any surface, including 35+ technical blog posts.
- Seven of thirteen route templates are missing from
sitemap.xml. Including the case-study pages — the most credibility-rich content on the site. - Lighthouse measurements were unstable and consistently poor, with five surfaces exceeding the tool's measurement budget without producing metrics at all.
The rest of this post walks through each finding with the underlying evidence and explains what's actually happening — both in beam.cloud's stack and in how Google's indexing pipeline interacts with it.
The findings, top to bottom
1. There is no <h1> in the raw HTML of any page
When Google's crawler fetches https://www.beam.cloud, it gets back an HTML document of around 11 kilobytes. That document is identical across every route on the site: /, /pricing, /about, /blog, /customers. The body of the document contains essentially this:
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
<script src="/static/js/main.[hash].js"></script>
</body>
That's it. No <h1>. No <p>. No headings. No content of any kind. Every page on the site renders its actual content client-side after JavaScript executes, inside that empty <div id="root">.
Google's rendering pipeline is capable of executing JavaScript — the search team has documented this thoroughly, and Martin Splitt from Google has been clear since at least 2019 that there is no longer a separate "first wave / second wave" indexer. But "Google can render JavaScript" and "Google reliably indexes your SPA exactly the way you'd expect" are very different statements. In practice, JavaScript-heavy sites introduce uncertainty around crawl timing, DOM capture timing, metadata extraction, and indexing consistency. The initial HTML response provides little meaningful body content for the crawler to extract before rendering. What it does have — title, meta description, canonical URL — comes only from the <head>. Everything else is contingent on rendering completing cleanly, in the right order, within the budget Google decides to grant your site.
That budget is not unlimited and not uniform. In practice, higher-authority domains appear to receive more consistent and timely rendering than smaller domains competing for crawl and rendering resources. For a 35-person AI-infra company competing with established players, you're on the smaller side of that allocation curve.
So when beam.cloud's homepage advertises "Run sandboxes, inference, and training with ultrafast boot times, instant autoscaling, and a developer experience that just works" — that string appears nowhere in the raw HTML response. It appears only inside the React component tree that renders after JavaScript executes. Whether Google captures that string, and when, depends on a pipeline you don't observe and can't control.
2. Half the routes return a "404" title in raw HTML — while returning HTTP 200
This is the finding I wasn't expecting, and it's probably the most damaging single problem on the site.
Six of the thirteen routes — /careers, /contact, /docs, /help, /privacy, /terms — return the title 404 • Beam in their raw HTML. Not Careers • Beam. Not Privacy Policy • Beam. Just 404 • Beam.
What's happening is that beam.cloud uses react-router to handle routing client-side. When you request https://www.beam.cloud/privacy, the server returns the same 11-kilobyte SPA shell it returns for every route, with HTTP status 200 OK. The React app then loads, looks at the URL, decides there's no matching route, and renders the <NotFound /> component. That component sets document.title to 404 • Beam.
This is a textbook soft 404. The HTTP status says the page exists. The rendered page says it doesn't. Google's documentation has been explicit about this for over a decade: Google treats soft 404s as low-quality URLs and progressively de-prioritizes the site for crawling.
It gets worse. When Google's renderer does execute the page's JavaScript on these routes, what it captures is document.title — which by that point has been set to 404 • Beam. So even when rendering completes successfully on these surfaces, what gets indexed is a 404 title, not the rendered "Privacy Policy" or "Careers" content that should be there.
In other words: even the optimistic story about JavaScript SEO ("rendering will fix it") doesn't hold for these six routes. Rendering runs, captures the title, and confirms the page is a 404. These aren't obscure routes. They include /careers, /contact, and pages adjacent to /pricing — exactly the URLs a buyer or candidate would search for.
3. Every page has the same meta description
The raw HTML for every page on beam.cloud — every single one of the thirteen surfaces — contains the same 130-character meta description:
"Run sandboxes, inference, and training with ultrafast boot times, instant autoscaling, and a developer experience that just works."
This is the homepage description. It appears on /pricing, on /about, on /customers/geospy-case-study. Every page's "what is this page about" hint is identical.
Google's official guidance treats site-wide duplicate meta descriptions as a quality signal: they tell Google your pages aren't differentiated, and Google responds by reducing the weight given to the description in snippet generation. In practice, that often means Google generates a snippet from whatever's in the body of the page — which on beam.cloud is empty — so it falls back to showing just the URL and title.
If you search site:beam.cloud today, you can see the result of this. Every snippet shows the same meta description, regardless of the page being indexed. The user clicking through has no way to tell what each page is actually about.
4. Zero structured data anywhere
There is no JSON-LD on any page of beam.cloud. Zero. No Organization schema. No WebSite. No Product for the platform. No Article schema on the blog posts. No FAQPage on /help. No BreadcrumbList.
Structured data is how a modern SEO site tells Google "this is a software product, here are its features, here's its pricing, this article is by this author, this page is a step-by-step guide." It's how rich snippets get generated. It's how AI search engines (ChatGPT browsing, Perplexity, Claude) extract entities from your page to surface in answers.
Beam has 35+ blog posts at beam.cloud/blog/* covering topics like "CUDA vs Tensor Cores," "Fine-tuning Llama 3," "Faster Whisper" — exactly the kind of high-intent technical content a buyer in the AI-infra space would be searching for. None of those posts have Article schema. None of them tell Google who wrote them, when they were published, or what topic cluster they belong to. They're competing with structured-data-enriched competitor blog posts and losing.
5. Seven of thirteen routes are missing from the sitemap
Beam has a sitemap.xml. It lists fifty-three URLs. But seven of the thirteen route templates that the SPA renders are not in that sitemap — including all six of the soft-404 routes above, plus /customers/[slug] (which has at least three real case-study pages).
This means Google has no way to discover the case-study pages except by crawling internal links — and the internal links live inside a JavaScript-rendered DOM, which the initial crawl doesn't see. So customers/geospy-case-study, customers/hooktheory-case-study, and customers/magellan-ai-case-study are much harder for Google to consistently discover and index. These are the most credibility-rich pages on the entire site — real customer stories with real results.
6. Lighthouse measurements were unstable and consistently poor
I'll mention this last because it's the least surprising finding and the one most likely to be misread.
The most telling part of the Lighthouse data is what's missing from it. On five of the thirteen surfaces — /help, /pricing, /privacy, /terms, and one of the docs templates — Lighthouse exhausted its measurement budget without producing metrics at all. When a tool that exists specifically to measure page performance gives up before producing a number, that is itself a finding. It tells you the page is so slow to reach an interactive state that synthetic measurement doesn't terminate cleanly within reasonable bounds.
On the surfaces where Lighthouse did complete, the results were poor and unstable. LCP exceeded the recommended 2.5-second threshold by an order of magnitude. CLS reached 1.71 on /blog/[slug] (threshold: 0.1). Performance scores landed between 0.06 and 0.28 out of 1.0.
I want to be careful with these numbers, though. Lighthouse measures the rendered experience, not the SEO impact. Even if every Core Web Vitals number on beam.cloud was perfect, the structural problems above would still be there. The page that scores 100 on Lighthouse but has no <h1> and a 404 title is just as invisible to Google as the page that scores 6. The CWV findings are an additional symptom of the same root cause — a heavyweight JavaScript bundle rendering everything client-side — but they're not the SEO story. The SEO story is the raw-HTML story.
I include them only because they tell you something about user experience: a buyer who clicks through from a search result is going to wait many seconds for the page to become usable. Most of them won't.
So what does this mean if your site looks similar?
If you're reading this and thinking "wait, this might be us," here's the short list of things to check, in roughly this order of severity:
View raw HTML on your homepage — actually view-source, or curl -A "Mozilla/5.0" https://yoursite.com/. If the body of the document is <div id="root"></div> and a couple of script tags, you have finding #1. The fix is server-side rendering for at least your marketing routes.
View raw HTML on a non-existent route — try https://yoursite.com/this-page-does-not-exist. If the response is HTTP 200 with your SPA shell and an unrelated title, you have soft 404s. If it's HTTP 404 with the same shell, that's still bad — Google has to render the page to discover it's a 404, which costs you crawl budget — but it's recoverable. If it's HTTP 404 with a server-rendered 404 page, you're fine on this finding.
Check site:yoursite.com in Google. Look at the snippets. If every snippet shows the same description, you have finding #3. If your case-study pages or pricing page aren't appearing at all, you may have finding #5 (sitemap incomplete) or finding #1 (raw HTML empty) or both.
Check structured data. View source on any page and search for application/ld+json. If there's nothing, that's finding #4. The fix is to inject JSON-LD into your page templates — every modern framework supports this.
The Lighthouse score is the least important thing. A buyer who lands on your page may be patient. The buyer who never reaches your page because it isn't indexed never gets the chance to leave.
Why this happens — and why "Google can read JavaScript now" is a half-truth
I want to address the most common pushback I get when I describe these problems, which is: "We solved this in 2018. Google can render JavaScript now."
It's a half-truth.
The literal claim is correct: Google's rendering pipeline does execute JavaScript, and the old "two-wave" mental model where rendering was a wholly separate downstream stage has been officially retired. What hasn't gone away — and what the simplified version of the claim glosses over — is that rendering JavaScript-heavy sites introduces uncertainty at several points:
-
Rendering is rationed by domain authority and crawl budget. Google does not have infinite Chromium instances. Higher-authority domains get rendered quickly and frequently. Lower-authority domains get rendered slowly, partially, or with stale results. If you're a smaller engineering-led company competing with established players, you're on the smaller side of that allocation.
-
DOM capture is timing-dependent. The renderer captures the DOM at some point during page load. If your content loads asynchronously after that point — fetched from an API, hydrated late, lazy-rendered — it may not be captured in the indexed version.
-
What gets captured is whatever exists at render time, regardless of correctness. This is the trap that catches beam.cloud's soft-404 pages. By the time the renderer captures the DOM,
react-routerhas already setdocument.title = "404 • Beam". Rendering ran successfully. What it captured was wrong. -
Metadata extraction happens against the rendered DOM, not your component tree. If your title, canonical URL, or structured data are injected late in the JavaScript lifecycle, what Google extracts can be different from what your developers expect to see when they view the page.
The "Google can read JavaScript" claim is true the way "I can lift 200 pounds" is true if I tell you I can do it sometimes, in the right conditions, with my back warm.
What you do about it
You move the public-facing routes — and only the public-facing routes — onto server-side rendering. Next.js is the dominant choice in the React ecosystem because it's specifically designed for this hybrid: the public marketing surfaces SSR for SEO, the authenticated application keeps running as the SPA you already have.
The migration doesn't have to be all-or-nothing. The hybrid pattern — Next.js in front, the existing SPA continuing to serve authenticated routes — is the cheapest path through. You keep your component tree. You keep your routing logic for the app. Your engineers don't rewrite product logic. They don't touch the parts of the codebase that actually deliver value to logged-in users. The marketing layer — and only the marketing layer — moves onto a renderer that gives Google what it actually needs: real HTML, with real content, on the first request.
What that looks like in practice, mapped against the six findings above: every public route returns real <h1> and body copy in raw HTML before any JavaScript executes. Non-existent routes return HTTP 404 with a server-rendered 404 page — soft 404s become impossible. Each surface has its own per-route title and meta description, generated at build or request time from the underlying content. JSON-LD is injected at render time, deterministically, on every page where it applies. The sitemap is generated from the same route definitions that produce the pages, so it cannot drift out of sync. And the surfaces that previously timed out under Lighthouse measurement reach interactive state in well under a second on a cold load. None of this depends on Google's render queue cooperating; the HTML is correct on the first byte.
That's the work I do. If your site looks anything like beam.cloud's, I run the same diagnostic against your specific site, produce a PDF report identifying every surface that needs migration, and quote a fixed-fee project to migrate the public surfaces while leaving your existing app untouched.
The diagnostic is free. Most teams I talk to are surprised by what shows up — particularly the soft-404 finding, which almost no one has seen before. If you'd like to know what your site looks like to Google in 2026, the service page at richardwrobinson.com/react-spa-seo-migration explains how the engagement works and has a 15-minute call booking right at the top.
The diagnostic tool used to generate the findings above is private, but the underlying methodology — headless Chromium with JavaScript disabled, real-Chromium Lighthouse runs, sitemap and robots.txt validation — is reproducible. If you want to verify any of the beam.cloud findings yourself, view-source on the URL and look at the body. It will take about thirty seconds.