Duplicate detection
Duplicate content cannibalizes rankings. Two articles on the same topic split the SEO juice; Google then picks one (often the wrong one) and demotes the rest. SeoFreshUp detects two kinds of duplicates.
Classic similarity
Token-based comparison: takes the title + first 500 words of each article, computes similarity scores, flags pairs above a threshold (default 80 %).
Catches: re-published articles, syndicated content, articles copied from one to another with minor edits.
Misses: articles covering the same topic in completely different words.
AI semantic matching
LLM-based detection: groups your articles into ~clusters and asks the AI which cover the same topic regardless of wording.
Catches: “10 best WordPress SEO plugins” + “Top WordPress SEO plugins for 2026” + “Best plugins for WP SEO” (different wording, same topic).
Cost: ~$0.005 per article analyzed via gpt-4o-mini. For a 1000-article blog: ~$5 for the full semantic scan.
How to run
In the 📑 Doublons tab:
- Click 🔍 Scan classic (free, fast — 30 sec for 1000 articles)
- Optionally click 🤖 Scan IA sémantique (paid, slower — 2-5 min for 1000 articles)
Both scans complement each other — run classic first to clean obvious duplicates, then semantic for deeper analysis.
Working with the results
Each duplicate cluster shows:
- The articles involved (title, URL, traffic, AI verdict, age)
- Similarity score or AI confidence level
- Recommended canonical (the article with the most traffic / best verdict / oldest URL)
For each cluster, choose:
- ✓ Choisir comme canonique — keep this one, redirect or NOINDEX the rest
- 🔀 Redirect — redirect 301 from the duplicates to the canonical (preserves SEO)
- 🗑 Supprimer — trash the duplicates (auto-301 to canonical)
- 🚩 Pas un doublon — mark as false positive, ignore in future scans
Why it matters
Google’s quality team explicitly mentions duplicate content as a thin/spam signal. A blog with 30 % duplicate articles gets a sitewide penalty signal that affects even your unique content’s rankings.
Cleaning duplicates is one of the highest ROI actions for an aging blog.
Performance impact
- Classic scan: cheap, fast. Run weekly if you publish a lot.
- AI semantic scan: more expensive, run quarterly or after major content additions.