A Streamlit app that analyzes your crawl data to suggest new category pages based on your product inventory and real search demand.
Originally presented at Brighton SEO.
- N-Gram Generation: Extracts 2-7 word n-grams from product H1 tags to generate thousands of potential category keywords
- Product Matching: Filters suggestions to only those matching a minimum number of products (exact and fuzzy matching)
- Duplicate Detection: Uses PolyFuzz TF-IDF matching to identify suggestions too similar to existing categories
- Search Validation: Keywords Everywhere API validates suggestions against real search volume data, keeping only legitimate keywords with actual demand
- Fragment Removal: Optionally keeps only the longest keyword variant, removing shorter fragments
- Generates thousands of keyword variations from product titles using n-grams
- Matches suggestions to existing categories using fuzzy matching (PolyFuzz)
- Keywords Everywhere API integration filters to only real search terms
- Configurable similarity threshold to avoid duplicate categories
- Export results to CSV
pip install -r requirements.txt-
Start the app:
streamlit run category_generator.py
-
Upload files from Screaming Frog:
inlinks.csv- Export inlinks to your product pages (Bulk Export > Links > All Inlinks)internal_html.csv- HTML export (Bulk Export > All > Internal HTML)
-
Map columns:
- Select which custom extraction column identifies product pages
- Select which custom extraction column identifies category pages
-
Configure settings in the sidebar
-
Download results as CSV
Export from Screaming Frog: Bulk Export > Links > All Inlinks
Required columns:
From/Source- The linking pageTo/Destination- The linked page
Export from Screaming Frog: Bulk Export > All > Internal HTML
Required columns:
Address- Page URLIndexability- Indexability statusH1-1- Primary H1 tagTitle 1- Page title- Custom extraction columns for product/category identification
| Setting | Description | Default |
|---|---|---|
| Min Product Match (Exact) | Minimum products a keyword must exactly match | 3 |
| Min Product Match (Fuzzy) | Minimum fuzzy matches required | 3 |
| Min Similarity | Max similarity to existing category (lower = more unique) | 96% |
| Min CPC | Minimum cost-per-click filter | $0 |
| Min Search Volume | Minimum monthly search volume | 100 |
| Keep Longest Word | Remove shorter keyword fragments | Enabled |
| Fuzzy Product Match | Enable slower but more thorough matching | Disabled |
Add your Keywords Everywhere API key to validate suggestions against real search data. This filters out n-gram combinations that nobody actually searches for.
- Validates keyword suggestions have real search volume
- Provides CPC data for commercial intent analysis
- PAYG pricing with no expiration
CSV file with columns:
- Parent Category - The category the products belong to
- Keyword - Suggested new category name
- Search Volume - Monthly searches (if KWE enabled)
- CPC - Cost per click (if KWE enabled)
- Matching Products (Exact) - Products containing exact keyword
- Matching Products (Fuzzy) - Products matching all words
- Matched Category - Most similar existing category
- Similarity - How similar to existing (lower = more unique)
- Expand category structure based on actual product inventory
- Align taxonomy with search demand using real keyword data
- Identify gaps between what you sell and how users search
- Prioritize new categories by search volume and product coverage
Lee Foot - eCommerce SEO Consultant