w3c · xfq · Jul 17, 2025 · May 19, 2025 · May 20, 2025 · Jul 17, 2025
diff --git a/questions/qa-byte-order-mark-data/translations.js b/questions/qa-byte-order-mark-data/translations.js
@@ -1,8 +1,8 @@
 var trans = { }
 
-trans.versions = ['de','en', 'fr']
+trans.versions = ['en']
 
-trans.outofdatetranslations = []
+trans.outofdatetranslations = ['de', 'fr']
 
 trans.updatedtranslations = []
 

diff --git a/questions/qa-byte-order-mark.de.html b/questions/qa-byte-order-mark.de.html
@@ -20,8 +20,8 @@
 f.path = '../' // what you need to prepend to a URL to get to the /International directory 
 
 // AUTHORS AND TRANSLATORS should fill in these assignments:
-f.thisVersion = { date:'2016-04-20', time:'11:10'} // date and time of latest edits to this document/translation
-f.contributors = 'Albert Lunde, Asmus Freytag, Björn Höhrmann, Henri Sivonen, John Cowan, Leif Halvard Silli, Norbert Lindenberg' // people providing useful contributions or feedback during review or at other times
+f.thisVersion = { date:'2025-07-16', time:'11:10'} // date and time of latest edits to this document/translation
+f.contributors = 'Albert Lunde, Asmus Freytag, Björn Höhrmann, Fuqiao Xue, Henri Sivonen, John Cowan, Leif Halvard Silli, Norbert Lindenberg' // people providing useful contributions or feedback during review or at other times
 // also make sure that the lang attribute on the html tag is correct!
 f.sources = '' // describes sources of information
 
@@ -86,12 +86,13 @@ <h2>Antwort</h2>
   <section id="bomwhat">
 <h3>Was ist ein BOM?</h3>
     <div class="sidenoteGroup">
-      <p>Am Anfang einer Webseite, die eine <a class="termref" href="/International/articles/definitions-characters/Overview#unicode">Unicode</a>-<a class="termref" href="/International/articles/definitions-characters/Overview#charsets">Zeichencodierung</a> verwendet, stehen möglicherweise einige Bytes, die das Unicode-Zeichen U+FEFF <span lang="en" xml:lang="en" translate="no">BYTE ORDER MARK</span> (abgekürzt <dfn>BOM</dfn>) darstellen.</p>
+      <p>A <dfn>Byte Order Mark</dfn>, sometimes abbreviated "BOM", is a special Unicode character intended to appear at the very beginning of a text file. Its original purpose was to indicate the <q><a href="https://en.wikipedia.org/wiki/Endianness">endianness</a></q> of text that used the UTF-16 or UTF-32 character encodings of Unicode. The Byte Order Mark is U+FEFF ZERO WIDTH NON-BREAKING SPACE: the character name refers to a separate, deprecated, use of the character.</p>
+      <p>Some systems use the BOM code point at the start of a file to indicate that text files are using the UTF-8 character encoding, even though UTF-8 does not need a marker to indicate endianness.</p>
+      <p>While often invisible and intended to aid in correctly interpreting text, the presence of the BOM can sometimes cause unexpected display issues or problems with software if not handled correctly.</p>
       <div class="insideinfonote">
         <p class="info">Die Bezeichnung <span lang="en" xml:lang="en" translate="no">BYTE ORDER MARK</span> ist ein Alias für die ursprüngliche Bezeichnung <span lang="en" xml:lang="en" translate="no">ZERO WIDTH NO-BREAK SPACE</span> (ZWNBSP, nullbreites geschütztes Leerzeichen). Mit der Einführung des Zeichens U+2060 <span lang="en" xml:lang="en" translate="no">WORD JOINER</span> (Wortverbinder) besteht es keine Notwendigkeit mehr, U+FEFF in seiner ZWNSP-Funktion zu verwenden. Ab diesem Zeitpunkt und weil es einen formellen Alias gibt, ist die Bezeichnung <span lang="en" xml:lang="en" translate="no">ZERO WIDTH NO-BREAK SPACE</span> nicht mehr passend. Hier wird deswegen der Alias verwendet.</p>
       </div>
     </div>
-    <p>Das BOM ist bei korrekter Verwendung unsichtbar.</p>
     <p>Bevor UTF-8 Anfang 1993 eingeführt wurde, war der vorgesehene Weg, Unicode-Text zu übertragen, Zeichen in 16 Bit zu codieren. Die Zeichencodierung wurde UCS-2 genannt und später zu UTF-16 erweitert. Einheiten zu 16 Bit können auf zwei Arten in Bytes repräsentiert werden: das höherwertige Byte zuerst (<span class="qterm" lang="en" xml:lang="en" translate="no">big-endian</span>) oder das niederwertige Byte zuerst (<span class="qterm" lang="en" xml:lang="en" translate="no">little-endian</span>). Um anzugeben, welche Reihenfolge der Bytes verwendet wurde, wird das Zeichen U+FEFF (das BOM, <span lang="en" xml:lang="en" translate="no">byte-order mark</span>) an den Anfang des Datenstroms gesetzt – als Wundermittel, das sinngemäß nicht zum Text gehört, den der Datenstrom repräsentiert.</p>
     <p>Die folgende Abbildung zeigt die Bytes für eine Folge von Zwei-Byte-Zeichen. Jede Hexadezimalzahl mit 2 Ziffern steht für ein Byte im Datenstrom. Sie können sehen, das die Reihenfolge der beiden Bytes, die ein Zeichen repräsentieren, bei <span class="qterm" lang="en" xml:lang="en" translate="no">big-endian</span> gegenüber <span class="qterm" lang="en" xml:lang="en" translate="no">little-endian</span> umgedreht ist. Das BOM zeigt an, welche Reihenfolge gilt, damit die Anwendung den Inhalt unmittelbar decodieren kann.</p>
     <p><img src="qa-byte-order-mark-data/bom.png" alt="Bytes, die das BOM repräsentieren." /></p>

diff --git a/questions/qa-byte-order-mark.en.html b/questions/qa-byte-order-mark.en.html
@@ -20,8 +20,8 @@
 f.path = '../' // what you need to prepend to a URL to get to the /International directory 
 
 // AUTHORS AND TRANSLATORS should fill in these assignments:
-f.thisVersion = { date:'2016-04-20', time:'11:10'} // date and time of latest edits to this document/translation
-f.contributors = 'Albert Lunde, Asmus Freytag, Björn Höhrmann, Henri Sivonen, John Cowan, Leif Halvard Silli, Norbert Lindenberg, Gwendoline Clavé' // people providing useful contributions or feedback during review or at other times
+f.thisVersion = { date:'2025-07-17', time:'11:10'} // date and time of latest edits to this document/translation
+f.contributors = 'Albert Lunde, Asmus Freytag, Björn Höhrmann, Fuqiao Xue, Henri Sivonen, John Cowan, Leif Halvard Silli, Norbert Lindenberg, Gwendoline Clavé' // people providing useful contributions or feedback during review or at other times
 // also make sure that the lang attribute on the html tag is correct!
 f.sources = '' // describes sources of information
 
@@ -104,14 +104,14 @@ <h2>Answer</h2>
 <h3> What is a byte-order mark?</h3>
 
 <div class="sidenoteGroup">
-<p>At the beginning of a page that uses a <a class="termref" href="/International/articles/definitions-characters/#unicode">Unicode</a> <a class="termref" href="/International/articles/definitions-characters/#charsets">character encoding</a> you may find some bytes that represent the Unicode code point U+FEFF BYTE ORDER MARK (abbreviated as <dfn>BOM</dfn>).</p>
+<p>A <dfn>Byte Order Mark</dfn>, sometimes abbreviated "BOM", is a special Unicode character intended to appear at the very beginning of a text file. Its original purpose was to indicate the <q><a href="https://en.wikipedia.org/wiki/Endianness">endianness</a></q> of text that used the UTF-16 or UTF-32 character encodings of Unicode. The Byte Order Mark is U+FEFF ZERO WIDTH NON-BREAKING SPACE: the character name refers to a separate, deprecated, use of the character.</p>
+<p>Some systems use the BOM code point at the start of a file to indicate that text files are using the UTF-8 character encoding, even though UTF-8 does not need a marker to indicate endianness.</p>
+<p>While often invisible and intended to aid in correctly interpreting text, the presence of the BOM can sometimes cause unexpected display issues or problems with software if not handled correctly.</p>
 <div class="insideinfonote">
 <p class="info">The name BYTE ORDER MARK is an alias for the original character name ZERO WIDTH NO-BREAK SPACE (ZWNBSP). With the introduction of U+2060 WORD JOINER, there's no longer a need to ever use U+FEFF for its ZWNSP effect, so from that point on, and with the availability of a formal alias, the name ZERO WIDTH NO-BREAK SPACE is no longer helpful, and we will use the alias here.</p>
 </div>
 </div>
 
-<p>The BOM, when correctly used, is invisible.</p>
-
 <p>Before UTF-8 was introduced in early 1993, the expected way for transferring Unicode text was using 16-bit code units using an encoding called UCS-2 which was later extended to UTF-16. 16-bit code units can be expressed as bytes in two ways: the most significant byte first (<span class="qterm">big-endian</span>) or the least significant byte first (<span class="qterm">little-endian</span>). To communicate which byte order was in use,  U+FEFF (the byte-order mark) was used at the start of the stream as a magic number that is not logically part of the text the stream represents.</p>
 
 <p>The picture below shows the bytes used in a sequence of two-byte characters. Each 2-digit hexadecimal number represents a byte in the stream of text. You can see that the order of the two bytes that represent a single character is reversed for big endian vs. little endian storage. The byte-order mark indicates which order is used, so that applications can immediately decode the content.</p>
@@ -142,6 +142,34 @@ <h3> What do I need to know about the BOM?</h3>
 <p>If you use a UTF-16 encoding for your page (and we strongly recommend that you don't), there are some <a href="#additionalinfo">additional considerations</a>.</p>
 </section>
 
+<section id="whenToUseBOM">
+<h3>When to Use (and Not Use) the BOM</h3>
+<p>The necessity and recommendation for using a BOM varies significantly depending on the Unicode encoding scheme being used.</p>
+
+<h4>UTF-8</h4>
+<p>For UTF-8, the BOM is the byte sequence <code>EF BB BF</code>. Unlike UTF-16 and UTF-32, UTF-8 does not have byte order (endianness) issues, so a BOM is not needed for this purpose. Its only function in UTF-8 is to act as a "signature" to indicate that the file is UTF-8 encoded. The Unicode Standard permits the BOM in UTF-8 but does not recommend its use.</p>
+<p><strong>Recommendation:</strong> Generally, it's best to avoid using a BOM with UTF-8 files unless you have a specific reason or compatibility requirement. Always prefer UTF-8 without a BOM if possible.</p>
+
+<h4>UTF-16 (UTF-16BE & UTF-16LE)</h4>
+<p>For UTF-16, the BOM is crucial for indicating endianness if the specific endianness is not already defined by the character set label (e.g., if labeled just as "UTF-16").</p>
+<ul>
+<li><code>FE FF</code>: Indicates Big Endian (UTF-16BE).</li>
+<li><code>FF FE</code>: Indicates Little Endian (UTF-16LE).</li>
+<li>If a UTF-16 stream is read with the wrong endianness, the BOM character <code>U+FEFF</code> will appear as <code>U+FFFE</code>, which is a noncharacter.</li>
+<li>If the character set is explicitly stated as "UTF-16BE" or "UTF-16LE", a BOM should <em>not</em> be used as the byte order is already known.</li>
+<li><strong>Recommendation:</strong> Use a BOM if your UTF-16 data might be interpreted by systems with different native endianness and the specific endianness (BE or LE) is not declared by a higher-level protocol. If the specific UTF-16 encoding (LE or BE) is known and declared, omit the BOM. (However, for HTML, UTF-8 is strongly preferred over UTF-16).</li>
+</ul>
+
+<h4>UTF-32 (UTF-32BE & UTF-32LE)</h4>
+<p>Similar to UTF-16, the BOM in UTF-32 indicates endianness but UTF-32 is rarely used for transmission or web content.</p>
+<ul>
+<li><code>00 00 FE FF</code>: Indicates Big Endian (UTF-32BE).</li>
+<li><code>FF FE 00 00</code>: Indicates Little Endian (UTF-32LE).</li>
+<li><strong>Recommendation:</strong> Similar to UTF-16, use a BOM if endianness is not otherwise specified. (Again, UTF-8 is preferred for HTML).</li>
+</ul>
+</section>
+
+
 
 
 
@@ -271,18 +299,18 @@ <h3>Removing the BOM</h3>
 <section id="additionalinfo">
 <h2>Additional information</h2>
 
-<p>Here are some additional notes for those who are encoding their HTML pages using UTF-16. Note that, for HTML it's recommended that you use UTF-8 and that you avoid UTF-16. So for most people this section will be academic.</p>
+<p>This section provides further details primarily for those encoding HTML pages using UTF-16 or UTF-32. As a strong general recommendation, <strong>UTF-8 should be used for all HTML content</strong> over UTF-16 or UTF-32.</p>
 
 <div class="sidenoteGroup">
-<p>According to RFC 2718 and the Unicode Standard, if you declare the character encoding of your page using HTTP as either &quot;UTF-16LE&quot; or &quot;UTF-16BE&quot; then you should not use a byte-order mark at the beginning of the page. Only if the page is labelled in HTTP using IANA charset name &quot;UTF-16&quot; is a byte-order mark appropriate.</p>
+<p>For <strong>UTF-16</strong>, as detailed in the <a href="#whenToUseBOM">"When to Use (and Not Use) the BOM"</a> section, a BOM is appropriate if the page is simply labeled with the IANA charset "UTF-16" to indicate endianness. However, if the character encoding is declared via HTTP as specifically "UTF-16LE" or "UTF-16BE", a BOM should not be used. This guidance aligns with RFC 2718 and the Unicode Standard.</p>
 <div class="sideinfonote">
 <p class="warning">Note that this is solely about the <em>labeling</em> of the content.  Of course, the actual sequence of bytes is the same, whether you label content as UTF-16 and add a BOM, or whether you label it as UTF-16LE or UTF-16BE.</p>
 </div>
 </div>
 
-<p>The HTML5 specification currently disallows the use of any other, text-based in-document encoding declaration for pages using the UTF-16 encoding. In effect, this means that the BOM is, itself, the declaration that you have to add.</p>
+<p>The HTML5 specification currently disallows the use of any other, text-based in-document encoding declarations (like a <code class="kw" translate="no">meta</code> tag) for pages using UTF-16. In effect, if you are using the generic "UTF-16" label, the BOM itself serves as the necessary in-stream declaration of byte order.</p>
 
-<p>The byte-order mark is also used for text labeled as UTF-32, and should not be used for text labeled as UTF-32BE or UTF-32LE. The use of UTF-32 for HTML content, however, is strongly discouraged and some implementations have removed support for it, so we haven't even mentioned it until now.</p>
+<p>Similarly, for <strong>UTF-32</strong>, a BOM can be used if the content is labeled generically as "UTF-32". It should not be used if the label is specifically "UTF-32BE" or "UTF-32LE". However, the use of UTF-32 for HTML content is strongly discouraged, and some implementations have removed support for it.</p>
 </section>