@@ -131,20 +131,66 @@ like the price and rating of a product:
131131robots.txt
132132----------
133133
134- A robots.txt file tells search engine crawlers which URLs the crawler can access on your site, to
135- index its content. This is used mainly to avoid overloading your site with requests.
134+ A ` robots.txt ` file instructs search engine crawlers which parts of a website they are permitted to
135+ access. Its primary purpose is to:
136136
137- When indexing your website, search engines take a first look at the robots.txt file. Odoo
138- automatically creates one robot.txt file available on `mydatabase.odoo.com/robots.txt `.
137+ - **Prevent overloading the website: ** By guiding crawlers away from certain sections, robots.txt
138+ helps manage server load.
139+ - **Control access to resources and detailed descriptions: ** It can prevent crawlers from accessing
140+ media files (images, videos), CSS stylesheets, and JavaScript files, and from reading the content
141+ (text) of specific pages.
142+
143+ When indexing your website, search engines first look at the robots.txt file. Odoo automatically
144+ creates one robot.txt file available on `mydatabase.odoo.com/robots.txt `.
145+
146+ .. note ::
147+ Reputable bots adhere to robots.txt; others may require blocking via
148+ :ref: `Cloudflare <domain-name/naked/cloudflare >` on your custom domain.
149+
150+ Edit robots.txt
151+ ~~~~~~~~~~~~~~~
139152
140153By editing a robots.txt file, you can control which site pages are accessible to search engine
141154crawlers. To add custom instructions to the file, go to :menuselection: `Website --> Configuration
142155--> Settings `, scroll down to the :guilabel: `SEO ` section, and click :guilabel: `Edit robots.txt `.
143156
144157.. example ::
145- If you do not want the robots to crawl the `/about-us ` page of your site, you can edit the
158+ If you do not want robots to crawl the `/about-us ` page of your site, you can edit the
146159 robots.txt file to add `Disallow: /about-us `.
147160
161+ .. important ::
162+ While `robots.txt ` prevents content from being crawled, **it does not guarantee that a page
163+ will not be indexed **. A page can still appear in search results if it is linked to from other
164+ crawled pages (indexed by "reference"). Google generally does not recommend using robots.txt to
165+ block webpages that you wish to keep out of search results entirely.
166+
167+ Prevent a page from being indexed
168+ ---------------------------------
169+
170+ To effectively prevent a page from appearing in search engine results, use one of the following
171+ methods:
172+
173+ - **noindex tag: ** Access the page's :ref: `properties <website/pages/page_properties >` and toggle
174+ the :guilabel: `Indexed ` switch off.
175+
176+ .. note ::
177+ This option is not yet available for :ref: `dynamic pages <website/pages/page_type >`.
178+
179+ - **404 or 403: ** Configure the page to return a 404 (Not Found) or 403 (Forbidden) HTTP status
180+ code. These codes signal to search engines that the page does not exist or is inaccessible,
181+ leading to its eventual removal from the index.
182+
183+ - **404: ** :ref: `Configure a 404 redirection. <website/pages/URL-redirection >`
184+ - **403: ** Access the page's :ref: `properties <website/pages/page_properties >`
185+ and toggle the :guilabel: `Visibility ` switch off or :ref: `unpublish the page <website/pages/un-publish-page >`.
186+
187+ - **Google Search Console: ** Use Google Search Console to request the removal of specific URLs from
188+ Google's index.
189+
190+ .. seealso ::
191+ - :doc: `../configuration/google_search_console `
192+ - :doc: `../pages `
193+
148194Sitemap
149195-------
150196
0 commit comments