Name	Name	Last commit message	Last commit date
parent directory ..
legacy-code	legacy-code
category_generator.py	category_generator.py
readme.md	readme.md
requirements.txt	requirements.txt

Automatic Category Page Suggester

A Streamlit app that analyzes your crawl data to suggest new category pages based on your product inventory and real search demand.

Originally presented at Brighton SEO.

How It Works

N-Gram Generation: Extracts 2-7 word n-grams from product H1 tags to generate thousands of potential category keywords
Product Matching: Filters suggestions to only those matching a minimum number of products (exact and fuzzy matching)
Duplicate Detection: Uses PolyFuzz TF-IDF matching to identify suggestions too similar to existing categories
Search Validation: Keywords Everywhere API validates suggestions against real search volume data, keeping only legitimate keywords with actual demand
Fragment Removal: Optionally keeps only the longest keyword variant, removing shorter fragments

Features

Generates thousands of keyword variations from product titles using n-grams
Matches suggestions to existing categories using fuzzy matching (PolyFuzz)
Keywords Everywhere API integration filters to only real search terms
Configurable similarity threshold to avoid duplicate categories
Export results to CSV

Requirements

pip install -r requirements.txt

Usage

Start the app:
```
streamlit run category_generator.py
```
Upload files from Screaming Frog:
- inlinks.csv - Export inlinks to your product pages (Bulk Export > Links > All Inlinks)
- internal_html.csv - HTML export (Bulk Export > All > Internal HTML)
Map columns:
- Select which custom extraction column identifies product pages
- Select which custom extraction column identifies category pages
Configure settings in the sidebar
Download results as CSV

Input Files

inlinks.csv

Export from Screaming Frog: Bulk Export > Links > All Inlinks

Required columns:

From / Source - The linking page
To / Destination - The linked page

internal_html.csv

Export from Screaming Frog: Bulk Export > All > Internal HTML

Required columns:

Address - Page URL
Indexability - Indexability status
H1-1 - Primary H1 tag
Title 1 - Page title
Custom extraction columns for product/category identification

Configuration Options

Setting	Description	Default
Min Product Match (Exact)	Minimum products a keyword must exactly match	3
Min Product Match (Fuzzy)	Minimum fuzzy matches required	3
Min Similarity	Max similarity to existing category (lower = more unique)	96%
Min CPC	Minimum cost-per-click filter	$0
Min Search Volume	Minimum monthly search volume	100
Keep Longest Word	Remove shorter keyword fragments	Enabled
Fuzzy Product Match	Enable slower but more thorough matching	Disabled

Keywords Everywhere Integration

Add your Keywords Everywhere API key to validate suggestions against real search data. This filters out n-gram combinations that nobody actually searches for.

Validates keyword suggestions have real search volume
Provides CPC data for commercial intent analysis
PAYG pricing with no expiration

Output

CSV file with columns:

Parent Category - The category the products belong to
Keyword - Suggested new category name
Search Volume - Monthly searches (if KWE enabled)
CPC - Cost per click (if KWE enabled)
Matching Products (Exact) - Products containing exact keyword
Matching Products (Fuzzy) - Products matching all words
Matched Category - Most similar existing category
Similarity - How similar to existing (lower = more unique)

Use Cases

Expand category structure based on actual product inventory
Align taxonomy with search demand using real keyword data
Identify gaps between what you sell and how users search
Prioritize new categories by search volume and product coverage

Author

Lee Foot - eCommerce SEO Consultant

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

Automatic Category Page Suggester

How It Works

Features

Requirements

Usage

Input Files

inlinks.csv

internal_html.csv

Configuration Options

Keywords Everywhere Integration

Output

Use Cases

Author

FilesExpand file tree

automatic-category-suggester

Directory actions

More options

Directory actions

More options

Latest commit

History

automatic-category-suggester

Folders and files

parent directory

readme.md

Automatic Category Page Suggester

How It Works

Features

Requirements

Usage

Input Files

inlinks.csv

internal_html.csv

Configuration Options

Keywords Everywhere Integration

Output

Use Cases

Author