Name	Name	Last commit message	Last commit date
parent directory ..
readme.md	readme.md
requirements.txt	requirements.txt
template_fingerprinting.py	template_fingerprinting.py
template_fingerprinting_app.py	template_fingerprinting_app.py

Name

Last commit message

Last commit date

Template Fingerprinting Tool

Automatically identify and classify pages by their template type using HTML structure analysis and machine learning clustering.

Features

Analyzes HTML structure (tags, classes, IDs, meta tags)
Uses TF-IDF vectorization for feature extraction
K-Means clustering to identify template patterns
Identifies common structural patterns per cluster
Exports results with page type classifications

Use Cases

Identify different page templates on a website (PDP, PLP, blog, etc.)
Audit template usage across large sites
Find pages with unusual/broken templates
Group pages for template-specific SEO recommendations

Requirements

pip install -r requirements.txt

Usage

Export URLs from Screaming Frog (or create a CSV with an Address column)
Update configuration variables in the script:
- INPUT_FILE: Path to your CSV file
- OUTPUT_FILE: Where to save results
- N_CLUSTERS: Number of template types to identify
Run the script:

python template_fingerprinting.py

Configuration

Variable	Default	Description
`INPUT_FILE`	`./urls.csv`	CSV file with 'Address' column
`OUTPUT_FILE`	`./classified_urls.csv`	Output file path
`N_CLUSTERS`	`5`	Number of template types to detect
`TIMEOUT`	`10`	HTTP request timeout in seconds

Output

The script generates a CSV file with:

Original URL data
Cluster: Numeric cluster ID
Page Type: Human-readable type label (Type 0, Type 1, etc.)

Console output includes top features for each cluster to help identify what each template type represents.

How It Works

Feature Extraction: For each URL, the script fetches HTML and extracts:
- Tag counts (e.g., div:15, article:1)
- CSS class names
- ID attributes
- Meta tag properties
Vectorization: Features are converted to TF-IDF vectors
Clustering: K-Means groups similar page structures
Analysis: Top features per cluster help identify template types

Author

Lee Foot - eCommerce SEO Consultant

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

Template Fingerprinting Tool

Features

Use Cases

Requirements

Usage

Configuration

Output

How It Works

Author

FilesExpand file tree

template-fingerprinting

Directory actions

More options

Directory actions

More options

Latest commit

History

template-fingerprinting

Folders and files

parent directory

readme.md

Template Fingerprinting Tool

Features

Use Cases

Requirements

Usage

Configuration

Output

How It Works

Author