Skip to content

Escape characters on xml attributes #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion cjs/interface/document.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ const {NodeList} = require('./node-list.js');
const {Range} = require('./range.js');
const {Text} = require('./text.js');
const {TreeWalker} = require('./tree-walker.js');
const {XMLAttr} = require('./xml-attr.js');

const query = (method, ownerDocument, selectors) => {
let {[NEXT]: next, [END]: end} = ownerDocument;
Expand Down Expand Up @@ -170,7 +171,7 @@ class Document extends NonElementParentNode {
return this[EVENT_TARGET];
}

createAttribute(name) { return new Attr(this, name); }
createAttribute(name) { return this[MIME].isXML ? new XMLAttr(this, name) : new Attr(this, name); }
createComment(textContent) { return new Comment(this, textContent); }
createDocumentFragment() { return new DocumentFragment(this); }
createDocumentType(name, publicId, systemId) { return new DocumentType(this, name, publicId, systemId); }
Expand Down
23 changes: 23 additions & 0 deletions cjs/interface/xml-attr.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
'use strict';
const {VALUE} = require('../shared/symbols.js');
const {emptyAttributes} = require('../shared/attributes.js');
const {escape} = require('../shared/text-escaper.js');
const {Attr} = require('./attr.js');

const QUOTE = /"/g;

/**
* @implements globalThis.Attr
*/
class XMLAttr extends Attr {
constructor(ownerDocument, name, value = '') {
super(ownerDocument, name, value);
}

toString() {
const {name, [VALUE]: value} = this;
return emptyAttributes.has(name) && !value ?
name : `${name}="${escape(value).replace(QUOTE, """)}"`;
}
}
exports.XMLAttr = XMLAttr
5 changes: 5 additions & 0 deletions cjs/shared/mime.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,31 @@ const Mime = {
'text/html': {
docType: '<!DOCTYPE html>',
ignoreCase: true,
isXML: false,
voidElements: /^(?:area|base|br|col|embed|hr|img|input|keygen|link|menuitem|meta|param|source|track|wbr)$/i
},
'image/svg+xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignoreCase could be remove in favor of isXml

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this MR is just about using isXml instead of ignoreCase ? I am not sure what I am looking at in here, but I am sure the MR could be way simpler without adding oddly-cased fields (XML is XML, not Xml) and slow/XSS-prone transformers as commented

Copy link
Contributor Author

@dt-jean-baptiste-lemee dt-jean-baptiste-lemee Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nop, this PR is about fixing this issue : #216 I think the test speaks for itself. I commented this, just to say that we could remove ignoreCase boolean since it has to ignoreCase only on non-xml so it's redondant here

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've named it as such due XHTML but again I am not sure why we need to change that ... it could be breaking if anyone out there brand-check that property.

Copy link
Contributor Author

@dt-jean-baptiste-lemee dt-jean-baptiste-lemee Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed isXml to isXML and XmlAttr to XMLAttr

Tell me if there's anything else I could improve on this MR

Copy link
Contributor Author

@dt-jean-baptiste-lemee dt-jean-baptiste-lemee Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a regression on &quot;. I've push a solution but again I'm chaining escape and a replace. I wondering why don't we escape all html entities (https://www.w3schools.com/html/html_entities.asp) on Attr.toString through the escape function ? This way we do not need XMLAttr nor Mime.isXML

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we escape all html entities

I think you are better off with JSDOM there ... it's 100% standard, and 100% slower than LinkeDOM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 I'm trying to switch from JSDom to linkedom because it is 100 000% slower (JSDom takes sometime 5minutes to manage xml document when Linkedom takes 1second on the same document !) But yes, still we need a bit more "standardization". I don't think it's a bad idea for linkedom, but your the boss, it's your choice. We can use our fork. Thank for this lib and the work, it's huge.

Copy link
Owner

@WebReflection WebReflection Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be fair, XML is not the main target here and you are suggesting XML related changes and paths that would slow the common use case by far, as example suggesting to escape all html entities ... I understand this is valuable for your business but this project is about perf ... perf for most common use cases, which is not XML, so apologies if my answers are not the most welcoming one, but I am trying to preserve the original idea of this project which is: work for most common cases and as fast as possible ... your quest feels a bit against that initial/original goal, hence my nitpicking in here. Hope you can understand, and hope your fork will work great for your use case too!

isXML: true,
voidElements
},
'text/xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'application/xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'application/xhtml+xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
}
};
Expand Down
3 changes: 2 additions & 1 deletion esm/interface/document.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import {NodeList} from './node-list.js';
import {Range} from './range.js';
import {Text} from './text.js';
import {TreeWalker} from './tree-walker.js';
import {XMLAttr} from './xml-attr.js';

const query = (method, ownerDocument, selectors) => {
let {[NEXT]: next, [END]: end} = ownerDocument;
Expand Down Expand Up @@ -170,7 +171,7 @@ export class Document extends NonElementParentNode {
return this[EVENT_TARGET];
}

createAttribute(name) { return new Attr(this, name); }
createAttribute(name) { return this[MIME].isXML ? new XMLAttr(this, name) : new Attr(this, name); }
createComment(textContent) { return new Comment(this, textContent); }
createDocumentFragment() { return new DocumentFragment(this); }
createDocumentType(name, publicId, systemId) { return new DocumentType(this, name, publicId, systemId); }
Expand Down
21 changes: 21 additions & 0 deletions esm/interface/xml-attr.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import {VALUE} from '../shared/symbols.js';
import {emptyAttributes} from '../shared/attributes.js';
import {escape} from '../shared/text-escaper.js';
import {Attr} from './attr.js';

const QUOTE = /"/g;

/**
* @implements globalThis.Attr
*/
export class XMLAttr extends Attr {
constructor(ownerDocument, name, value = '') {
super(ownerDocument, name, value);
}

toString() {
const {name, [VALUE]: value} = this;
return emptyAttributes.has(name) && !value ?
name : `${name}="${escape(value).replace(QUOTE, "&quot;")}"`;
}
}
5 changes: 5 additions & 0 deletions esm/shared/mime.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,31 @@ export const Mime = {
'text/html': {
docType: '<!DOCTYPE html>',
ignoreCase: true,
isXML: false,
voidElements: /^(?:area|base|br|col|embed|hr|img|input|keygen|link|menuitem|meta|param|source|track|wbr)$/i
},
'image/svg+xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'text/xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'application/xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'application/xhtml+xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
}
};
22 changes: 15 additions & 7 deletions test/xml/document.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,28 @@ const assert = require('../assert.js').for('XMLDocument');

const {DOMParser} = global[Symbol.for('linkedom')];

const document = (new DOMParser).parseFromString('<root></root>', 'text/xml');
{
const document = (new DOMParser).parseFromString('<root></root>', 'text/xml');

assert(document.toString(), '<?xml version="1.0" encoding="utf-8"?><root />');
assert(document.toString(), '<?xml version="1.0" encoding="utf-8"?><root />');;

assert(document.documentElement.tagName, 'root');
assert(document.documentElement.nodeName, 'root');
assert(document.documentElement.tagName, 'root');
assert(document.documentElement.nodeName, 'root');


document.documentElement.innerHTML = `
document.documentElement.innerHTML = `
<Something>
<Element>Text</Element>
<Element>Text</Element>
</Something>
`.trim();

assert(document.querySelectorAll('Element').length, 2, 'case sesntivive 2');
assert(document.querySelectorAll('element').length, 0, 'case sesntivive 0');
assert(document.querySelectorAll('Element').length, 2, 'case sensitive 2');
assert(document.querySelectorAll('element').length, 0, 'case sensitive 0');
}

{
const document = (new DOMParser).parseFromString('<root checked attr="&amp;&lt;&gt;&quot;"></root>', 'text/xml');
assert(document.toString(), '<?xml version="1.0" encoding="utf-8"?><root checked attr="&amp;&lt;&gt;&quot;" />');
}

6 changes: 6 additions & 0 deletions types/esm/interface/xml-attr.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/**
* @implements globalThis.Attr
*/
export class XMLAttr extends Attr implements globalThis.Attr {
}
import { Attr } from "./attr.js";
5 changes: 5 additions & 0 deletions types/esm/shared/mime.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,37 @@ export const Mime: {
'text/html': {
docType: string;
ignoreCase: boolean;
isXML: boolean;
voidElements: RegExp;
};
'image/svg+xml': {
docType: string;
ignoreCase: boolean;
isXML: boolean;
voidElements: {
test: () => boolean;
};
};
'text/xml': {
docType: string;
ignoreCase: boolean;
isXML: boolean;
voidElements: {
test: () => boolean;
};
};
'application/xml': {
docType: string;
ignoreCase: boolean;
isXML: boolean;
voidElements: {
test: () => boolean;
};
};
'application/xhtml+xml': {
docType: string;
ignoreCase: boolean;
isXML: boolean;
voidElements: {
test: () => boolean;
};
Expand Down
28 changes: 25 additions & 3 deletions worker.js
Original file line number Diff line number Diff line change
Expand Up @@ -4418,7 +4418,7 @@ let Node$1 = class Node extends DOMEventTarget {
}
};

const QUOTE = /"/g;
const QUOTE$1 = /"/g;

/**
* @implements globalThis.Attr
Expand Down Expand Up @@ -4451,7 +4451,7 @@ let Attr$1 = class Attr extends Node$1 {
toString() {
const {name, [VALUE]: value} = this;
return emptyAttributes.has(name) && !value ?
name : `${name}="${value.replace(QUOTE, '&quot;')}"`;
name : `${name}="${value.replace(QUOTE$1, '&quot;')}"`;
}

toJSON() {
Expand Down Expand Up @@ -11224,26 +11224,31 @@ const Mime = {
'text/html': {
docType: '<!DOCTYPE html>',
ignoreCase: true,
isXML: false,
voidElements: /^(?:area|base|br|col|embed|hr|img|input|keygen|link|menuitem|meta|param|source|track|wbr)$/i
},
'image/svg+xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'text/xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'application/xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
},
'application/xhtml+xml': {
docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
isXML: true,
voidElements
}
};
Expand Down Expand Up @@ -11442,6 +11447,23 @@ class TreeWalker {
}
}

const QUOTE = /"/g;

/**
* @implements globalThis.Attr
*/
class XMLAttr extends Attr$1 {
constructor(ownerDocument, name, value = '') {
super(ownerDocument, name, value);
}

toString() {
const {name, [VALUE]: value} = this;
return emptyAttributes.has(name) && !value ?
name : `${name}="${escape(value).replace(QUOTE, "&quot;")}"`;
}
}

const query = (method, ownerDocument, selectors) => {
let {[NEXT]: next, [END]: end} = ownerDocument;
return method.call({ownerDocument, [NEXT]: next, [END]: end}, selectors);
Expand Down Expand Up @@ -11577,7 +11599,7 @@ let Document$1 = class Document extends NonElementParentNode {
return this[EVENT_TARGET];
}

createAttribute(name) { return new Attr$1(this, name); }
createAttribute(name) { return this[MIME].isXML ? new XMLAttr(this, name) : new Attr$1(this, name); }
createComment(textContent) { return new Comment$1(this, textContent); }
createDocumentFragment() { return new DocumentFragment$1(this); }
createDocumentType(name, publicId, systemId) { return new DocumentType$1(this, name, publicId, systemId); }
Expand Down