Skip to content

Duplicate detection

Duplicate content cannibalizes rankings. Two articles on the same topic split the SEO juice; Google then picks one (often the wrong one) and demotes the rest. SeoFreshUp detects two kinds of duplicates.

Classic similarity

Token-based comparison: takes the title + first 500 words of each article, computes similarity scores, flags pairs above a threshold (default 80 %).

Catches: re-published articles, syndicated content, articles copied from one to another with minor edits.

Misses: articles covering the same topic in completely different words.

AI semantic matching

LLM-based detection: groups your articles into ~clusters and asks the AI which cover the same topic regardless of wording.

Catches: “10 best WordPress SEO plugins” + “Top WordPress SEO plugins for 2026” + “Best plugins for WP SEO” (different wording, same topic).

Cost: ~$0.005 per article analyzed via gpt-4o-mini. For a 1000-article blog: ~$5 for the full semantic scan.

How to run

In the 📑 Doublons tab:

  1. Click 🔍 Scan classic (free, fast — 30 sec for 1000 articles)
  2. Optionally click 🤖 Scan IA sémantique (paid, slower — 2-5 min for 1000 articles)

Both scans complement each other — run classic first to clean obvious duplicates, then semantic for deeper analysis.

Working with the results

Each duplicate cluster shows:

  • The articles involved (title, URL, traffic, AI verdict, age)
  • Similarity score or AI confidence level
  • Recommended canonical (the article with the most traffic / best verdict / oldest URL)

For each cluster, choose:

  • ✓ Choisir comme canonique — keep this one, redirect or NOINDEX the rest
  • 🔀 Redirect — redirect 301 from the duplicates to the canonical (preserves SEO)
  • 🗑 Supprimer — trash the duplicates (auto-301 to canonical)
  • 🚩 Pas un doublon — mark as false positive, ignore in future scans

Why it matters

Google’s quality team explicitly mentions duplicate content as a thin/spam signal. A blog with 30 % duplicate articles gets a sitewide penalty signal that affects even your unique content’s rankings.

Cleaning duplicates is one of the highest ROI actions for an aging blog.

Performance impact

  • Classic scan: cheap, fast. Run weekly if you publish a lot.
  • AI semantic scan: more expensive, run quarterly or after major content additions.

What’s next?