Step 1: Export and Clean Your GSC Query Data
Every content plan we build starts with what Google already knows about our site. Google Search Console (GSC) holds the raw demand signals: which queries trigger impressions for our pages, how often people click, and where we rank. Before we can cluster topics or prioritize what to write, we need a clean dataset to work from.
Which Date Range to Pull
We default to the last 90 days. This window balances recency with enough data volume to smooth out weekly fluctuations. If our site operates in a seasonal niche (think tax preparation or holiday gifting), we extend the range to capture one full cycle, or we pull the same 90-day window from the prior year for comparison.
Getting Past the 1,000-Row UI Cap
The GSC web interface caps exports at 1,000 rows. For most sites, that misses the long-tail queries where cluster opportunities hide. Two ways around this:
- GSC API export using a script or connector (Google Sheets add-ons work for smaller sites).
- BigQuery bulk export, which Google offers natively from the GSC settings panel. This gives us every query with at least one impression, no row limits.
Filtering Out Branded Queries
Branded queries (searches containing our company name or product names) tell us about brand demand, not topic demand. We filter these out using GSC's built-in query filter. Set it to "queries not containing" and add our brand terms.
Removing Noise
Not every query in the export belongs in a content plan. We remove:
- Zero-click queries where the SERP feature (calculator, definition box) satisfies the user without a click. If a query has thousands of impressions but a CTR below 0.5% across all positions, it likely falls here.
- Irrelevant impressions where GSC matched our page to a query we have no business ranking for.
- Duplicate variants like singular/plural or slight reorderings ("best running shoes" vs. "running shoes best"). We keep the variant with the most impressions and note the others as synonyms.
Final Output
A clean spreadsheet with five columns: Query, Clicks, Impressions, Average Position, and CTR. This becomes the foundation for every step that follows.
Step 2: Tag Every Query by Search Intent
Intent determines what type of page a query needs. A query asking "what is topic clustering" needs an educational article. A query like "best topic cluster tool" needs a comparison page. Mixing these on one page dilutes relevance for both.
Four Intent Categories
We use a simple taxonomy:
- Informational: The searcher wants to learn. Modifiers include "how to," "what is," "guide," "examples," "why does."
- Commercial investigation: The searcher is comparing options before buying. Modifiers include "best," "vs," "review," "top," "alternative to."
- Transactional: The searcher is ready to act (buy, sign up, download). Modifiers include "pricing," "buy," "discount," "free trial," "login."
- Navigational: The searcher wants a specific site or page. Usually contains a brand name or exact URL fragment.
How Intent Determines Hub vs. Spoke Assignment
Hub pages typically align with broad informational or commercial investigation intent (e.g., "topic cluster strategy"). Spoke pages target narrower informational queries (e.g., "how to identify hub topics from GSC data") or specific commercial queries (e.g., "best tools for SERP overlap analysis").
Practical Tagging Rules
We scan the modifier words in each query and assign intent. For ambiguous queries, we check the actual SERP: if Google shows mostly blog posts, it is informational. If it shows product pages, it is transactional.
Flagging Homepage Ranking Signals
If our homepage ranks for a query instead of a dedicated page, that is a content gap. The homepage is a generalist page. A dedicated spoke would almost always outperform it for a specific query. We flag these rows in our spreadsheet for priority content creation.
Final Output
Our spreadsheet now has a sixth column: Intent (informational, commercial investigation, transactional, or navigational).
Step 3: Group Queries into Topic Clusters Using SERP Overlap
This is where the cluster structure starts to form. The question we are answering: which queries should live on the same page, and which need their own page?
Why SERP Overlap Determines Grouping
Search volume alone does not tell us whether two queries need separate pages. Google does. If Google ranks the same pages for two different queries, it is treating them as the same topic. If Google ranks completely different pages, it sees them as distinct topics requiring their own content.
Example: "how to export GSC data" and "download search console queries" might look like separate topics. But if three or more of the top five results are identical for both queries, Google considers them one topic. We merge them into a single spoke.
Manual SERP Overlap Method
- Take two queries from the same general subject area.
- Search both in an incognito browser.
- Compare the top five organic results.
- If three or more URLs appear in both SERPs, merge the queries into one cluster entry (one page should target both).
- If fewer than three overlap, keep them as separate spokes.
Lightweight NLP Alternative
For larger query sets (500+), manual checks become impractical. We can group queries by semantic similarity using free tools. The approach: convert each query into an embedding (a numerical representation of meaning), then group queries whose embeddings are highly similar. Python libraries like sentence-transformers handle this without paid API costs. The output is rough clusters that we then validate with a few manual SERP checks per group.
Naming Each Cluster
Once queries are grouped, we name each cluster after the broadest query in the group (usually the one with the highest impression volume and the widest informational intent). This becomes our candidate hub topic.
Example: If a group contains "content cluster strategy," "how to build topic clusters," "topic cluster template," and "hub and spoke SEO," the hub topic might be "Topic Cluster Strategy" because it is the broadest framing.
How Many Spokes Per Hub
We aim for 5 to 12 spokes per hub as a working target. Fewer than 5 means the cluster may not have enough supporting content to build topical authority. More than 12 and we risk spreading internal links too thin or forcing artificial subtopics. The actual number depends on how many distinct queries our GSC data and later research steps surface.
Final Output
A cluster map document listing each hub topic with its associated spoke topics beneath it. Each spoke includes the merged queries it targets and the intent tag from Step 2.
Step 4: Validate and Expand Clusters with Reddit Questions
GSC data shows us what people already search for. Reddit shows us what people struggle with, argue about, and ask in their own unfiltered language. This step fills subtopic gaps and gives us phrasing we can use to match real user pain.
Selecting Subreddits
We look for subreddits that meet three criteria:
- Minimum 50,000 members (ensures enough activity to identify patterns, not just one-off questions).
- Niche-relevant to our topic clusters.
- Active question threads (weekly Q&A posts, frequent "help me" flair, or recurring beginner threads).
What to Extract
We are looking for:
- Recurring pain points: Questions that appear multiple times in different threads signal persistent demand.
- Objections: "I tried X but it didn't work because..." tells us what existing content fails to address.
- Exact phrasing: The words real people use often differ from keyword tool suggestions. These become headline candidates and subheadings.
- Emotional language: Frustration, confusion, and urgency signals help us score pain intensity later in Step 6.
Mapping Reddit Findings to Existing Clusters
Most Reddit questions will map to a cluster we already have. They fill in subtopic angles we had not considered. For example, our cluster on "GSC data export" might gain a spoke idea like "how to handle GSC data discrepancies" because multiple Reddit threads express confusion about why query-level clicks do not match page-level totals.
Some Reddit findings will not fit any existing cluster. These become candidates for entirely new spokes or even new hub topics, depending on breadth.
Avoiding Over-Reliance on Anecdotal Data
A question appearing five times on Reddit does not guarantee search demand. It might be a niche frustration that nobody searches for in Google. Every Reddit-sourced idea still needs a search demand check: does the topic or a close variant show meaningful impressions in GSC, or does a keyword research tool confirm monthly search volume?
Quick Validation Check
Before adding a Reddit-sourced spoke to our cluster map, we verify:
- Does a related query appear in our GSC data (even with low impressions)?
- If not in GSC, does a keyword tool show any monthly volume for the topic?
- Is existing SERP competition low enough that we could realistically rank?
If all three answers are "no," we park the idea in a backlog for future reassessment rather than wasting a publishing slot on it.
Final Output
New spoke ideas appended to the relevant clusters in our cluster map, each tagged with "Reddit-sourced" so we can track performance of this validation method over time.
Step 5: Run a Competitor Gap Check to Find Missing Subtopics
Our GSC data shows what we already rank for. Reddit shows what people ask. Competitors show what is already working for someone else that we are missing entirely.
Selecting Comparison Targets
We pick 2 to 3 direct competitors (sites targeting the same audience with similar offerings) plus one unexpected SERP competitor. The unexpected one is a site that keeps appearing in our target SERPs but is not an obvious business competitor (maybe a media site, a community wiki, or an adjacent niche blog). They often reveal subtopics our industry peers also miss.
Running a Keyword Gap Report
Most SEO platforms offer a gap analysis feature where we input our domain and competitor domains, and the tool surfaces queries where competitors rank but we do not. We do not need a specific tool for this; the methodology is the same across platforms.
We export the results and focus on queries where:
- At least one competitor ranks in positions 1 through 20.
- We either do not rank at all or rank below position 50.
Filtering for Opportunity
For sites in an authority-building phase, we filter further:
- Keyword difficulty under 30 (or equivalent low-competition indicator in whatever tool we use). This ensures we target subtopics where a well-structured spoke can realistically rank without needing extensive backlinks.
- Minimum search volume threshold to avoid targeting queries with negligible demand.
Distinguishing Competitor Gaps from Cluster Gaps
A competitor gap means they have content we do not. A cluster gap means no one covers the subtopic well. We distinguish by checking the quality of the top-ranking content for the gap query. If existing pages are thin, outdated, or off-topic, that is a cluster gap with extra opportunity.
Decision Rule for Existing vs. New Content
- Average rank 30 to 40 for a related query: We likely have a page that touches this topic but does not address it thoroughly. Update the existing page rather than creating a new one.
- Average rank 50+ or no ranking at all: Create a new spoke. Our existing content is not close enough for Google to consider it relevant.
Final Output
Gap-sourced spokes added to our cluster map with a "competitor-gap" source tag. Each entry notes whether the action is "update existing" or "create new."
Step 6: Score and Rank Every Topic with the Four-Factor Prioritization Rubric
We now have a cluster map full of potential spokes from three sources: GSC queries, Reddit research, and competitor gaps. We cannot publish everything at once. This rubric helps us decide what to write first.
Factor 1: Impressions Trend (Momentum)
We compare impressions for the query (or query cluster) in the last 30 days versus the prior 30 days.
- 40% or greater month-over-month growth: Score 3 (high priority; rising demand).
- 10% to 39% growth: Score 2 (moderate; topic is gaining traction).
- Flat or declining: Score 1 (stable or fading; lower urgency).
Why this matters: publishing into rising demand means the content matures (gains authority, earns links) just as search volume peaks.
Factor 2: Position Band
Where we currently rank for the target query determines the effort required:
- Positions 4 to 10: Score 3. These are quick wins. A content refresh, better on-page optimization, or a dedicated spoke can push us into the top three. Minimal new content needed.
- Positions 11 to 15: Score 2. A focused push with a dedicated spoke or significant page update can move us onto page one.
- Positions 16 to 20: Score 1. We likely need a new, more targeted page. The existing content is not specific enough.
- No ranking (new topic): Score 1. Full new content creation required.
Factor 3: Pain Intensity
This score draws from our Reddit research:
- High (Score 3): The question appears frequently (5+ threads), uses emotional or urgent language ("desperate," "nothing works," "need this yesterday"), and threads remain unresolved.
- Medium (Score 2): The question appears 2 to 4 times, with moderate frustration.
- Low (Score 1): The question appears once or twice, casual tone, easily resolved.
Binary qualifier: Is the problem urgent enough that someone would click today? If the answer is no (purely hypothetical or academic interest), we cap the pain score at 1 regardless of frequency.
Factor 4: Internal-Link Potential
Can we place 2 to 3 contextual internal links from existing published pages on the day we publish this new spoke?
- Yes, easily (Score 3): We have existing hub and spoke pages with relevant paragraphs where a link fits naturally.
- Partially (Score 2): We have one page that could link, but would need to stretch context.
- No (Score 1): No existing content mentions this subtopic. The spoke would launch as an orphan page.
Why this matters: orphan pages (pages with no internal links pointing to them) struggle to get crawled and indexed quickly. If we cannot link to a spoke on day one, we either build supporting content first or delay publication.
Combining Scores into a Ranked Backlog
We weight the four factors based on what drives results fastest:
| Factor | Weight |
|---|---|
| Impressions Trend | 2x |
| Position Band | 2x |
| Pain Intensity | 1.5x |
| Internal-Link Potential | 1.5x |
Example calculation:
A spoke with Momentum 3, Position Band 3, Pain Intensity 2, Internal-Link Potential 3:
(3 × 2) + (3 × 2) + (2 × 1.5) + (3 × 1.5) = 6 + 6 + 3 + 4.5 = 19.5
A spoke with Momentum 1, Position Band 1, Pain Intensity 3, Internal-Link Potential 1:
(1 × 2) + (1 × 2) + (3 × 1.5) + (1 × 1.5) = 2 + 2 + 4.5 + 1.5 = 10
Tiering the Backlog
- High priority (publish weeks 1 to 2): Scores 16 and above. These combine rising demand, a realistic ranking path, validated pain, and ready-to-deploy internal links.
- Medium priority (weeks 3 to 4): Scores 12 to 15.
- Lower priority (next monthly cycle): Scores below 12.
Step 7: Build Your 30-Day Publishing Cadence and Internal Linking Plan
With our ranked backlog in hand, we map topics to a four-week publishing schedule. The cadence below assumes a team that can publish 6 to 8 pieces per month. Scale up or down based on capacity.
Week 1: Hub Page + Highest-Priority Spokes
- Publish the hub page for our primary cluster.
- Publish 1 to 2 spokes targeting striking-distance queries (positions 4 to 10 from our rubric). These are the fastest to show ranking movement.
- Immediately add internal links from the hub to each spoke and from each spoke back to the hub.
Week 2: Cluster Gap Spokes
- Publish 2 spokes addressing subtopics where no competitor has strong coverage (cluster gaps from Step 5).
- Link each new spoke to the hub and to at least one Week 1 spoke where context overlaps.
Week 3: Reddit-Validated and Competitor Gap Spokes
- Publish 2 spokes sourced from Reddit pain points and competitor gap analysis.
- Cross-link between related spokes (not just hub-to-spoke, but spoke-to-spoke where the reader benefits from the connection).
Week 4: Update, Link, and Measure
- Update the hub page to incorporate mentions and links to all new spokes published in weeks 1 through 3.
- Publish 1 additional spoke if capacity allows.
- Run an initial GSC performance check: are new pages indexed? Are impressions appearing for target queries?
- Submit all new URLs via GSC's URL Inspection tool after each publish batch to accelerate crawling.
Internal Linking Checklist
For every new spoke we publish:
- Hub → Spoke: The hub page links to the new spoke with descriptive anchor text (not "click here" but a phrase reflecting the spoke's topic).
- Spoke → Hub: The new spoke links back to the hub, reinforcing the cluster relationship.
- Spoke → Related Spoke: If another spoke in the same cluster covers a prerequisite or adjacent concept, cross-link them.
- Anchor text rule: Use natural, descriptive phrases. Vary anchor text across links rather than repeating the exact same keyword phrase.
Google discovers new pages primarily through internal links. If we cannot assign at least 2 links from existing pages on publish day, the spoke is not ready.
Step 8: Review Performance and Refresh the Backlog Monthly
A content plan is not a one-time document. We treat it as a living backlog that improves with each monthly cycle.
When to Expect Measurable Movement
- New content: 6 to 12 weeks before we see stable ranking positions. Early impressions may appear within days, but meaningful click data takes longer.
- Optimized existing content: 4 to 8 weeks for re-crawl and rank adjustment.
We avoid making strategic decisions based on less than 3 to 4 weeks of stable data. Single-week spikes or drops are noise.
Monthly Refresh Process
- Re-pull GSC data for the last 90 days.
- Re-score the prioritization rubric for all backlog items. Some medium-priority spokes will have risen (their queries gained impressions momentum). Some high-priority items we published will have moved off the backlog.
- Identify new striking-distance queries that entered positions 4 to 20 since our last cycle. These are immediate candidates for the next month's plan.
- Check for new homepage-ranking signals (queries where our homepage still ranks instead of a dedicated page).
Quarterly Refresh Additions
Every three months, we re-run:
- Reddit research (new threads, new pain points, language shifts).
- Competitor gap analysis (competitors publish new content constantly; gaps change).
Excluding Outlier Data
Before making any strategic decision, we look for data anomalies: a viral social post that spiked impressions for one week, a Google algorithm update that temporarily shuffled rankings, or a site outage that suppressed clicks. We exclude these outlier weeks and base decisions on the stable baseline.
Common Mistakes That Break This Workflow
These are the errors we see most often when teams attempt this process:
Creating spokes without a distinct angle. If two spokes target essentially the same query intent and SERP overlap shows they should be one page, publishing both creates cannibalization. Google splits ranking signals between them, and neither performs as well as a single consolidated page would.
Treating Reddit volume as proof of search demand. A question asked 20 times on Reddit might have zero monthly Google searches. Always validate with GSC data or a keyword research tool before committing a publishing slot.
Ignoring the privacy-filtered impression gap in GSC. GSC does not report all queries. Roughly 75% of impressions at the query level are filtered out for privacy. This means the sum of impressions across all individual queries will be significantly lower than the total impressions shown at the page or site level. We account for this by using query-level data for relative prioritization (comparing queries to each other) rather than treating absolute numbers as complete.
Publishing without pre-planned internal links. An orphan page (no internal links pointing to it) struggles to gain cluster authority. Google discovers new pages primarily through internal links. If we cannot assign at least 2 links from existing pages on publish day, the spoke is not ready.
Skipping the SERP overlap check. Without verifying whether Google treats two queries as the same topic, we risk splitting one strong page into two weak pages. The five minutes spent on a SERP overlap check prevents months of underperformance.
Expert Summary: What Makes This Method Work Over Time
The four signals (GSC momentum, position band, Reddit pain, internal-link potential) compound across cycles. Each month of data makes the next month's prioritization more accurate. Queries that were invisible in cycle one surface as striking-distance opportunities in cycle two, because our published cluster content earns impressions for related terms we had not originally targeted.
Clustered content outperforms standalone pages because it sends consistent topical relevance signals to search engines. Hub pages accumulate authority from spoke links, and spokes benefit from the hub's broader visibility. Over time, this structure holds rankings longer than isolated articles because the internal linking network reinforces each page's relevance.
The constraint that matters most in this entire workflow: if we cannot assign internal links on day one, the spoke is not ready to publish. This single rule prevents the most common failure mode (orphan pages that never gain traction) and forces us to build clusters intentionally rather than publishing scattered content.
Refresh cadence is not optional. Competitor landscapes shift, new queries emerge, and search behavior changes every quarter. A cluster map built in January will be partially outdated by April. The monthly re-scoring process catches these shifts early.
This is not a one-time project. It is a recurring monthly operating rhythm. The first cycle takes the most effort (building the initial cluster map, establishing the rubric, publishing the first hub and spokes). Each subsequent cycle is faster because we are refining an existing backlog rather than building from scratch. The system improves with each iteration as we accumulate more GSC data, more Reddit insights, and more internal pages to link from.
Useful materials
- Why GSC Hides Your Data (And What the API Reveals)
- Facebook Ads: Query optimizations to reduce errors
- How Pyramid Determines the Maximum Number of Rows in Reports and Queries
- Good CTR Benchmarks 2025: What Marketers Should Know
- How Long Does SEO Take? 2026 Ranking Timeframes Explained
- Default Retention Policy in Exchange Online - Microsoft Learn
- Wheel Building 101: Basic Rules for Crafting a Custom Wheelset - Bike Components
- How Google Views Internal Links - SEO Office Hours Notes - Lumar