重複內容對SEO的影響

The Impact of “Duplicate Content” on SEO x The Ultimate Solution

When performing SEO optimization, many people often ignore the impact of “duplication” on SEO. Customers often ask, I put all the content on the website myself, so how can there be duplicate content? In fact, duplicate content is not uncommon. The first thing to know is that “Google doesn’t like duplicate content.”

Google has been working hard over the years to provide users with the best experience when searching. And what is the best experience? Google search engine constantly tries to understand users’ “search intent” and also tracks user behavior, including click-through rate, browsing time, bounce rate, re-search, etc., to understand whether each search result provided meets the user’s best experience.

Duplicate content will make it difficult for Google search engines to decide which pages to index in search results, resulting in low article performance and competition with each other. Listing duplicate content will also result in a poor user experience.

Duplicate content will affect your website’s SEO performance, but if you are the original owner of the website content, as long as you do not intentionally copy the content of other people’s websites, you will not be punished by Google due to technical settings issues. But if the content of other people’s websites is copied in large quantities intentionally or maliciously, it is another matter. Here’s what Google says about it .

Duplicate content due to technical reasons

Duplicate content can be caused by technical reasons. The root cause is that the way the Googlebot search engine identifies a page may be different from what you think. We humans determine whether there is duplication by reading the content, while Robots use the URL vs. content as the criteria. Therefore, when the website does not have a good basic planning and setting, it is easy to have problems with duplicate content.

1. Non-www vs. www and HTTP vs. HTTPs – No single domain defined

Usually our website needs to define a single standard domain to let Google know that we are developing based on this defined domain, and Google prefers domains with SSL certificates.

2. Duplicate content caused by URL parameters

This section usually includes the following categories:

  • URL Case : www.example.com/page ,and www.example.com/Page , considered as two different pages.
  • URL trailing slash / : For example: www.example.com/page ,as well as www.example.com/page/ , will be considered as two duplicate content pages.
  • URL with parameters : This part often occurs on e-commerce websites and is used to create different product sorting or filter different product items, colors, prices, etc. For example: https://www.example.com/shoes, ofhttps://www.example.com/shoes?color=white , treated as different pages.
  • The URL contains the session id or the utm parameter for tracking results : for example: www.example.com/products/shoes, and www.example.com/products/shoes?sessionid=123, and www.example.com/products/shoes?utm_source=fb&utm_medium=ads, will be considered different pages, but with duplicate content.

3. CMS Content Management System Settings

Many CMS platforms, including the common WordPress, will automatically create specific taxonomy pages, including tag and category pages, as well as separate pages for images or attachments, and even separate search results pages and pagination pages.

  • Category / Tag Page

List content associated with a specific category or tag separately. Each category or tag will have its own URL. If you have content associated with multiple categories or tags, this may result in multiple URLs pointing to the same content. For example, you have a blog post about “Weight Loss” that is categorized as both “Healthy Eating” and “Exercise”. Your CMS might generate URLs like: 
www.example.com/category/dietary/weight-loss , and 
www.example.com/category/exercise/weight-loss . Both URLs point to the same blog post, leading to potential duplicate content issues.

  • Pictures/Attachments Page

A CMS content management system generates a separate page for each image, which usually just displays the image on a blank page. Since this page has no other content, it is very similar to all the other image pages and could therefore also easily be considered duplicate content. Also, it is possible that your image index will be taken to this page instead of the article page you want.

  • Pagination

When a page contains a large number of items (such as a blog post, product, or review), the CMS content management system will often create paginated versions of the content. Each paginated page usually has a unique URL that only differs in the page number parameter, for example: “/page/1”, “/page/2”, “/page/3”, etc. Search engines may index each paginated page as a separate page.

  • Indexable search results pages

Many websites offer a search function that allows visitors to search for content on the website. The pages that display search results are all very similar and in most cases do not provide any value to the search engines. That’s why you don’t want them to be indexable by search engines.

4. Localized multilingual website

Duplicate content may also be a problem when you use the same content on localized multilingual websites. To cater to different regions that speak the same language, you might have localized versions of your product pages. While these product pages may differ slightly in pricing, currency, and shipping, the core product information, descriptions, and images are essentially the same. Therefore, if not configured correctly, search engines may consider these pages as duplicates. It looks like this:

USA: www.example.com/us/products/product-line

U.K: www.example.com/uk/products/product-line

Canada: www.example.com/ca/products/product-line

Australia: www.example.com/au/products/product-line

5. Printable version of the page

When a page has a print-friendly version at a separate URL, there are essentially two versions of the same content. For example:
 www.example.com/page/ and www.example.com/print/page/

Duplication of content

1. Content with similar intent

When you create more and more articles, sometimes you will create two articles with similar intent and cover similar keywords, which will also produce the phenomenon of keyword cannibalization . This situation will confuse the Google search engine, making it think that the content is duplicate and not know which page is more important. This means that there are multiple pages competing for the same keyword ranking, and ultimately only one page will be indexed by Google.

Through the rules of Google search engine, you can find out the reasons behind it. For example, when you search for “car body wrap” , the search engine results page (SERP) may show articles such as ” Evaluations of car body wrap brands ” and ” Recommendations of which car body wrap brands are good”. However, if two articles with similar intentions appear on your website and contain similar keywords, the Google search engine will be confused and don’t know which one to recommend. The two articles were also competing for rankings within the website, becoming enemies competing for traffic.

2. Duplication of content across domains

  • Articles cited by others: This is especially a problem if your site has a low domain authority and the site citing your content without attribution has a higher domain authority. Sites with higher domain authority will generally be crawled more frequently, causing the copied content to be crawled first on the site where the content was copied. They may be considered the original author and outrank you.
  • Article plagiarism : If your article is plagiarized and the same domain authority problem occurs as mentioned above, your article will also be ranked lower than the plagiarist. If you encounter such a problem, you can submit the publication time and URL of your article and file an infringement complaint with Google.
  • Posting the same product description on an e-commerce platform or posting the same article on a blog: This situation is similar to the above situation. Usually, the weight of the e-commerce or blog platform will be higher than your own website. Therefore, it is recommended that you avoid publishing the same content on different platforms. Please at least rewrite it to make a distinction, or add corresponding source links.

1. Non-www and www, HTTP and HTTPs, set up a single domain

  • Redirect all traffic to https + non-www If you have an Apache server, add the following code to the .htaccess (hidden file) file in your website’s root directory:
RewriteEngine On
RewriteCond %{HTTP_HOST} www.yourwebsitehere.com
RewriteRule (.*) https://yourwebsitehere.com/$1 [R=301,L]
  • Redirect all traffic to https + www If you have an Apache server, add the following code to the .htaccess (hidden file) file in your website’s root directory:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]

2. Canonical attribute solves URL parameter problem

The canonical URL tells search engines that, while there may be multiple URLs pointing to the same content, there is only one canonical URL which is the original URL. Generally, Google will use this URL in its results.

Canonicalize your URLs with the <rel=”canonical”> attribute. Add the following to the <head> </head>:

<link rel="canonical" href="http://www.example.com/blogs/my-blog-post" />

This tells Google that  http://www.example.com/blogs/my-blog-post the URL should be indexed even if it is displayed as :
http://www.example.com/blogs/my-blog-post?utm_source=fb or
http://www.example.com/blogs/my-blog-post?show-comments=true&page=3.

3. 301 Redirect

當您希望將所有流量導向至一個首選頁面時,請使用301 Redirect 轉址。Google 認Use a 301 Redirect when you want to direct all traffic to a preferred page. Google considers this a strong signal that the redirect target should be the canonical page. 

Assume that you can access the homepage via: 

  • https://example.com/
  • https://www.example.com 
  • https://home.example.com 

You can choose a preferred URL structure and redirect all traffic from other pages to that page. All Redirection methods have the same effect. 

4. Correctly set the search function of CMS

  • Please select a single “Category” for each article.
  • Set “Tag” and “Search” pages not to be retrieved: You can set noindex in Robots.txt to prevent them from being retrieved.
  • Redirect the “Image/Attachment” to the parent post URL, which is the original article

5. Robots.txt

Set noindex pages in the Robots.txt file to prevent some pages from being searched.

6. Hreflang Tags

The hreflang attribute is the answer to “localizing your multilingual site.” It tells Googlebot which page is for which country, so Google can show the .com site to US searchers and the co.uk site to UK searchers.

7. Create a site map

Select your canonical URL and add it to your sitemap. This is an easy way for you to let Google know which pages are important to you. If your website has fewer than a few dozen links, you can create a sitemap manually or use a sitemap generated by a CMS . Then, submit your sitemap in Google Search Console. In “Index” > “Sitemap” > “Add Sitemap”

8. How to resolve content with similar intent?

Please plan a keyword map. Through the keyword map, you can avoid keyword cannibalization and effectively plan your keywords to appear in different articles.

9. What should you do if your article is cited by others?

You can suggest that they use the Canonical tag to direct the canonical URL to your domain. If they are unwilling to do so, you can send Google a DMCA request, or take legal action.

10. What should I do if my article is plagiarized?

You can report the infringement to Google , which will review it and possibly remove the duplicate content, but this is not the same as taking legal action, although you can take legal action at the same time.

You can find duplicate content by:

1. Use Google Search Console:

You can use the Google Search Console Index Coverage report . There you can see which pages on your site Google has indexed, which pages it hasn’t, and any errors or warnings.

There are three types of problems that cause duplicate pages :

  • Duplicates without a canonical tag : These pages on your site are duplicates of another page, but they don’t have a canonical tag. Google selected another version of these pages to show in search results.
  • The canonical web page selected by Google is different from the user’s choice: The canonical URL determined by Google is different from the canonical URL pointed to by the web page (rel=”canonical”).
  • Submitted URL not selected as canonical: Similar to the above, this happens when you submit a URL to Google Search Console but Google determines that it is not canonical and Google chooses a different page to index.

You can view a list of pages affected by these issues by clicking on the message. Depending on the cause of the problem, you can follow the suggestions above to fix it, such as setting up a 301 Redirect, adding a rel=”canonical” attribute, and submitting a new sitemap to Google.

2. Use a professional platform:

If you have a larger site, you may want to use a tool like 
Screaming Frog or Copyscape to detect duplicate or near-duplicate content on the web.

Please include the source with a <dofollow> link if you share this post on your site. Thank you.

Subscribe

  • Get the latest digital marketing information
  • Get the latest MarTech and AI information