Find pages cannibalizing each other by clustering URLs that share significant SERP overlap.
- Analyzes SERP data to find queries that rank similar pages
- Clusters pages using graph-based algorithms (connected components + cliques)
- Calculates consolidation scores (0-100) for each cluster
- Outputs actionable recommendations for content consolidation
- Identify content cannibalization issues
- Find pages competing for the same keywords
- Make data-driven decisions about merging or consolidating content
- Improve internal linking between related pages
pip install -r requirements.txtpython content_consolidation_analyzer.py --input serp_data.csv --output results.csvpython content_consolidation_analyzer.py --input "data/*.csv" --output results.csvpython content_consolidation_analyzer.py \
--input serp_data.csv \
--query-col "keyword" \
--url-col "url" \
--min-urls 3| Argument | Description | Default |
|---|---|---|
--input, -i |
Input CSV file or pattern | Required |
--output, -o |
Output CSV file | consolidation_results.csv |
--min-urls |
Minimum shared URLs to cluster | 4 |
--query-col |
Column name for queries | search.q |
--url-col |
Column name for URLs | result.organic_results.link |
CSV file with columns for queries and URLs. Example:
search.q,result.organic_results.link
best running shoes,https://example.com/shoes
best running shoes,https://example.com/trainers
running shoe reviews,https://example.com/shoesThe tool outputs a CSV with:
cluster_id: Identifier for the clusterquery: The search queryconsolidation_score: 0-100 score indicating consolidation strengthconsolidation_recommendation: Human-readable recommendationcluster_size: Number of pages in the clustershared_urls: URLs that appear for multiple queries- And more metrics...
| Score | Recommendation |
|---|---|
| 80-100 | Strong consolidation candidate |
| 60-79 | Good consolidation candidate |
| 40-59 | Possible consolidation |
| 20-39 | Weak consolidation candidate |
| 0-19 | Keep separate |
Lee Foot - leefoot.com