Skip to content

pdf2json-3.1.5.tgz: 5 vulnerabilities (highest severity is: 7.5) #454

@mend-for-github-com

Description

@mend-for-github-com
Vulnerable Library - pdf2json-3.1.5.tgz

Path to dependency file: /package.json

Path to vulnerable library: /package.json

Vulnerabilities

Vulnerability Severity CVSS Dependency Type Fixed in (pdf2json version) Remediation Possible**
CVE-2026-41675 High 7.5 xmldom-0.9.7.tgz Transitive N/A*
CVE-2026-41674 High 7.5 xmldom-0.9.7.tgz Transitive N/A*
CVE-2026-41673 High 7.5 xmldom-0.9.7.tgz Transitive N/A*
CVE-2026-41672 High 7.5 xmldom-0.9.7.tgz Transitive N/A*
CVE-2026-34601 High 7.5 xmldom-0.9.7.tgz Transitive N/A*

*For some transitive vulnerabilities, there is no version of direct dependency with a fix. Check the "Details" section below to see if there is a version of transitive dependency where vulnerability is fixed.

**In some cases, Remediation PR cannot be created automatically for a vulnerability despite the availability of remediation

Details

CVE-2026-41675

Vulnerable Library - xmldom-0.9.7.tgz

A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.

Library home page: https://registry.npmjs.org/@xmldom/xmldom/-/xmldom-0.9.7.tgz

Path to dependency file: /package.json

Path to vulnerable library: /package.json

Dependency Hierarchy:

  • pdf2json-3.1.5.tgz (Root Library)
    • xmldom-0.9.7.tgz (Vulnerable Library)

Found in base branch: dev

Vulnerability Details

Summary The package allows attacker-controlled processing instruction data to be serialized into XML without validating or neutralizing the PI-closing sequence "?>". As a result, an attacker can terminate the processing instruction early and inject arbitrary XML nodes into the serialized output. Details The issue is in the DOM construction and serialization flow for processing instruction nodes. When "createProcessingInstruction(target, data)" is called, the supplied "data" string is stored directly on the node without validation. Later, when the document is serialized, the serializer writes PI nodes by concatenating "" directly. That behavior is unsafe because processing instructions are a syntax-sensitive context. The closing delimiter "?>" terminates the PI. If attacker-controlled input contains "?>", the serializer does not preserve it as literal PI content. Instead, it emits output where the remainder of the payload is treated as live XML markup. The same class of vulnerability was previously addressed for CDATA sections (GHSA-wh4c-j3r5-mjhp / CVE-2026-34601), where "]]>" in CDATA data was handled by splitting. The serializer applies no equivalent protection to processing instruction data. Affected code "lib/dom.js" — "createProcessingInstruction" (lines 2240–2246): createProcessingInstruction: function (target, data) { var node = new ProcessingInstruction(PDC); node.ownerDocument = this; node.childNodes = new NodeList(); node.nodeName = node.target = target; node.nodeValue = node.data = data; return node; }, No validation is performed on "data". Any string including "?>" is stored as-is. "lib/dom.js" — serializer PI case (line 2966): case PROCESSING_INSTRUCTION_NODE: return buf.push(''); "node.data" is emitted verbatim. If it contains "?>", that sequence terminates the PI in the output stream and the remainder appears as active XML markup. Contrast — CDATA (line 2945, patched): case CDATA_SECTION_NODE: return buf.push(g.CDATA_START, node.data.replace(/]]>/g, ']]]]><![CDATA[>'), g.CDATA_END); PoC Minimal (from @tlsbollei report, 2026-04-01) const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'r', null); doc.documentElement.appendChild( doc.createProcessingInstruction('a', '?> // ^^^^ injected element is active markup With re-parse verification (from @tlsbollei report) const assert = require('assert'); const { DOMParser, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMParser().parseFromString('', 'application/xml'); doc.documentElement.appendChild(doc.createProcessingInstruction('a', '?>". This check applies regardless of how "?>" entered the node — whether via "createProcessingInstruction" directly or a subsequent mutation (".data =", "CharacterData" methods). On "@xmldom/xmldom" ≥ 0.9.10, the serializer additionally applies the full W3C DOM Parsing §3.2.1.7 checks when "requireWellFormed: true": 1. Target check: throws "InvalidStateError" if the PI target contains a ":" character or is an ASCII case-insensitive match for ""xml"". 2. Data Char check: throws "InvalidStateError" if the PI data contains characters outside the XML Char production. 3. Data sequence check: throws "InvalidStateError" if the PI data contains "?>". On "@xmldom/xmldom" ≥ 0.8.13 (LTS), only the "?>" data check (check 3) is applied. The target and XML Char checks are not included in the LTS fix. PoC — fixed path const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'r', null); doc.documentElement.appendChild(doc.createProcessingInstruction('a', '?> // Opt-in guard: throws InvalidStateError before serializing try { new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); } catch (e) { console.log(e.name, e.message); // InvalidStateError: The ProcessingInstruction data contains "?>" } The guard catches "?>" regardless of when it was introduced: // Post-creation mutation: also caught at serialization time const pi = doc.createProcessingInstruction('target', 'safe data'); doc.documentElement.appendChild(pi); pi.data = 'safe?>'; new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); // InvalidStateError: The ProcessingInstruction data contains "?>" Why the default stays verbatim The W3C DOM Parsing and Serialization spec §3.2.1.3 defines a "require well-formed" flag whose default value is "false". With the flag unset, the spec explicitly permits serializing PI data verbatim. This matches browser behavior: Chrome, Firefox, and Safari all emit "?>" in PI data verbatim by default without error. Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in "requireWellFormed: true" flag allows applications that require injection safety to enable strict mode without breaking existing code. Residual limitation "createProcessingInstruction(target, data)" does not validate "data" at creation time. The WHATWG DOM spec (§4.5 step 2) mandates an "InvalidCharacterError" when "data" contains "?>"; enforcing this check unconditionally at creation time is a breaking change and is deferred to a future breaking release. When the default serialization path is used (without "requireWellFormed: true"), PI data containing "?>" is still emitted verbatim. Applications that do not pass "requireWellFormed: true" remain exposed.

Publish Date: 2026-04-22

URL: CVE-2026-41675

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Release Date: 2026-04-22

Fix Resolution: https://github.com/xmldom/xmldom.git - 0.9.10,https://github.com/xmldom/xmldom.git - 0.8.13

CVE-2026-41674

Vulnerable Library - xmldom-0.9.7.tgz

A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.

Library home page: https://registry.npmjs.org/@xmldom/xmldom/-/xmldom-0.9.7.tgz

Path to dependency file: /package.json

Path to vulnerable library: /package.json

Dependency Hierarchy:

  • pdf2json-3.1.5.tgz (Root Library)
    • xmldom-0.9.7.tgz (Vulnerable Library)

Found in base branch: dev

Vulnerability Details

Summary The package serializes "DocumentType" node fields ("internalSubset", "publicId", "systemId") verbatim without any escaping or validation. When these fields are set programmatically to attacker-controlled strings, "XMLSerializer.serializeToString" can produce output where the DOCTYPE declaration is terminated early and arbitrary markup appears outside it. Details "DOMImplementation.createDocumentType(qualifiedName, publicId, systemId, internalSubset)" validates only "qualifiedName" against the XML QName production. The remaining three arguments are stored as-is with no validation. The XMLSerializer emits "DocumentType" nodes as: All fields are pushed into the output buffer verbatim — no escaping, no quoting added. "internalSubset" injection: The serializer wraps "internalSubset" with " [" and "]". A value containing "]>" closes the internal subset and the DOCTYPE declaration at the injection point. Any content after "]>" in "internalSubset" appears outside the DOCTYPE in the serialized output as raw XML markup. Reported by @TharVid (GHSA-f6ww-3ggp-fr8h). Affected: "@xmldom/xmldom" ≥ 0.9.0 via "createDocumentType" API; 0.8.x only via direct property write. "publicId" injection: The serializer emits "publicId" verbatim after "PUBLIC" with no quoting added. A value containing an injected system identifier (e.g., ""pubid" SYSTEM "evil"") breaks the intended quoting context, injecting a fake SYSTEM entry into the serialized DOCTYPE declaration. Identified during internal security research. Affected: both branches, all versions back to 0.1.0. "systemId" injection: The serializer emits "systemId" verbatim. A value containing ">" terminates the DOCTYPE declaration early; content after ">" appears as raw XML markup outside the DOCTYPE context. Identified during internal security research. Affected: both branches, all versions back to 0.1.0. The parse path is safe: the SAX parser enforces the "PubidLiteral" and "SystemLiteral" grammar productions, which exclude the relevant characters, and the internal subset parser only accepts a subset it can structurally validate. The vulnerability is reachable only through programmatic "createDocumentType" calls with attacker-controlled arguments. Affected code "lib/dom.js" — "createDocumentType" (lines 898–910): createDocumentType: function (qualifiedName, publicId, systemId, internalSubset) { validateQualifiedName(qualifiedName); // only qualifiedName is validated var node = new DocumentType(PDC); node.name = qualifiedName; node.nodeName = qualifiedName; node.publicId = publicId || ''; // stored verbatim node.systemId = systemId || ''; // stored verbatim node.internalSubset = internalSubset || ''; // stored verbatim node.childNodes = new NodeList(); return node; }, "lib/dom.js" — serializer DOCTYPE case (lines 2948–2964): case DOCUMENT_TYPE_NODE: var pubid = node.publicId; var sysid = node.systemId; buf.push(g.DOCTYPE_DECL_START, ' ', node.name); if (pubid) { buf.push(' ', g.PUBLIC, ' ', pubid); if (sysid && sysid !== '.') { buf.push(' ', sysid); } } else if (sysid && sysid !== '.') { buf.push(' ', g.SYSTEM, ' ', sysid); } if (node.internalSubset) { buf.push(' [', node.internalSubset, ']'); // internalSubset emitted verbatim } buf.push('>'); return; PoC internalSubset injection const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); const doctype = impl.createDocumentType( 'root', '', '', ']><![CDATA[' ); const doc = impl.createDocument(null, 'root', doctype); const xml = new XMLSerializer().serializeToString(doc); console.log(xml); // <![CDATA[]> // ^^^^^^^^^^ injected element outside DOCTYPE publicId quoting context break const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); const doctype = impl.createDocumentType( 'root', '"injected PUBLIC_ID" SYSTEM "evil"', '', '' ); const doc = impl.createDocument(null, 'root', doctype); console.log(new XMLSerializer().serializeToString(doc)); // // quoting context broken — SYSTEM entry injected systemId injection const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); const doctype = impl.createDocumentType( 'root', '', '"sysid">', '' ); const doc = impl.createDocument(null, 'root', doctype); console.log(new XMLSerializer().serializeToString(doc)); // > // > in sysid closes DOCTYPE early; appears as sibling element Impact An application that programmatically constructs "DocumentType" nodes from user-controlled data and then serializes the document can emit a DOCTYPE declaration where the internal subset is closed early or where injected SYSTEM entities or other declarations appear in the serialized output. Downstream XML parsers that re-parse the serialized output and expand entities from the injected DOCTYPE declarations may be susceptible to XXE-class attacks if they enable entity expansion. Fix Applied «⚠ Opt-in required. Protection is not automatic. Existing serialization calls remain vulnerable unless "{ requireWellFormed: true }" is explicitly passed. Applications that pass untrusted data to "createDocumentType()" or write untrusted values directly to a "DocumentType" node's "publicId", "systemId", or "internalSubset" properties should audit all "serializeToString()" call sites and add the option.» "XMLSerializer.serializeToString()" now accepts an options object as a second argument. When "{ requireWellFormed: true }" is passed, the serializer validates the "DocumentType" node's "publicId", "systemId", and "internalSubset" fields before emitting the DOCTYPE declaration and throws "InvalidStateError" if any field contains an injection sequence: - "publicId": throws if non-empty and does not match the XML "PubidLiteral" production (XML 1.0 [12]) - "systemId": throws if non-empty and does not match the XML "SystemLiteral" production (XML 1.0 [11]) - "internalSubset": throws if it contains "]>" (which closes the internal subset and DOCTYPE declaration early) All three checks apply regardless of how the invalid value entered the node — whether via "createDocumentType" arguments or a subsequent direct property write. PoC — fixed path const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); // internalSubset injection const dt1 = impl.createDocumentType('root', '', '', ']><![CDATA['); const doc1 = impl.createDocument(null, 'root', dt1); // Default (unchanged): verbatim — injection present console.log(new XMLSerializer().serializeToString(doc1)); // <![CDATA[]> // Opt-in guard: throws InvalidStateError try { new XMLSerializer().serializeToString(doc1, { requireWellFormed: true }); } catch (e) { console.log(e.name, e.message); // InvalidStateError: DocumentType internalSubset contains "]>" } The guard also covers post-creation property writes: const dt2 = impl.createDocumentType('root', '', ''); dt2.systemId = '"sysid">'; const doc2 = impl.createDocument(null, 'root', dt2); new XMLSerializer().serializeToString(doc2, { requireWellFormed: true }); // InvalidStateError: DocumentType systemId is not a valid SystemLiteral Why the default stays verbatim The W3C DOM Parsing and Serialization spec §3.2.1.3 defines a "require well-formed" flag whose default value is "false". With the flag unset, the spec permits verbatim serialization of DOCTYPE fields. Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in "requireWellFormed: true" flag allows applications that require injection safety to enable strict mode without breaking existing deployments. Residual limitation "createDocumentType(qualifiedName, publicId, systemId[, internalSubset])" does not validate "publicId", "systemId", or "internalSubset" at creation time. This creation-time validation is a breaking change and is deferred to a future breaking release. When the default serialization path is used (without "requireWellFormed: true"), all three fields are still emitted verbatim. Applications that do not pass "requireWellFormed: true" remain exposed.

Publish Date: 2026-04-22

URL: CVE-2026-41674

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Release Date: 2026-04-22

Fix Resolution: https://github.com/xmldom/xmldom.git - 0.8.13,https://github.com/xmldom/xmldom.git - 0.9.10

CVE-2026-41673

Vulnerable Library - xmldom-0.9.7.tgz

A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.

Library home page: https://registry.npmjs.org/@xmldom/xmldom/-/xmldom-0.9.7.tgz

Path to dependency file: /package.json

Path to vulnerable library: /package.json

Dependency Hierarchy:

  • pdf2json-3.1.5.tgz (Root Library)
    • xmldom-0.9.7.tgz (Vulnerable Library)

Found in base branch: dev

Vulnerability Details

Summary Seven recursive traversals in "lib/dom.js" operate without a depth limit. A sufficiently deeply nested DOM tree causes a "RangeError: Maximum call stack size exceeded", crashing the application. Reported operations: - "Node.prototype.normalize()" — reported by @praveen-kv (email 2026-04-05) and @KarimTantawey (GHSA-fwmp-8wwc-qhv6, via "DOMParser.parseFromString()") - "XMLSerializer.serializeToString()" — reported by @Jvr2022 (GHSA-2v35-w6hq-6mfw) and @KarimTantawey (GHSA-j2hf-fqwf-rrjf) Additionally, discovered in research: - "Element.getElementsByTagName()" / "getElementsByTagNameNS()" / "getElementsByClassName()" / "getElementById()" - "Node.cloneNode(true)" - "Document.importNode(node, true)" - "node.textContent" (getter) - "Node.isEqualNode(other)" All seven share the same root cause: pure-JavaScript recursive tree traversal with no depth guard. A single deeply nested document (parsed successfully) triggers any or all of these operations. Details Root cause "lib/dom.js" implements DOM tree traversals as depth-first recursive functions. Each level of element nesting adds one JavaScript call frame. The JS engine's call stack is finite; once exhausted, a "RangeError: Maximum call stack size exceeded" is thrown. This error may not be caught reliably at stack-exhaustion depths because the catch handler itself requires stack frames to execute — especially in async scenarios, where an uncaught "RangeError" inside a callback or promise chain can crash the entire Node.js process. Parsing a deeply nested document succeeds — the SAX parser in "lib/sax.js" is iterative. The crash occurs during subsequent operations on the parsed DOM. "Node.prototype.normalize()" — reported by @praveen-kv ""lib/dom.js:1296–1308"" (https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L1296-L1308) (main): normalize: function () { var child = this.firstChild; while (child) { var next = child.nextSibling; if (next && next.nodeType == TEXT_NODE && child.nodeType == TEXT_NODE) { this.removeChild(next); child.appendData(next.data); } else { child.normalize(); // recursive call — no depth guard child = next; } } }, Crash threshold (Node.js 18, default stack): ~10,000 levels. "XMLSerializer.serializeToString()" — reported by @Jvr2022 ""lib/dom.js:2790–2974"" (https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L2790-L2974) (main): The internal "serializeToString" worker recurses into child nodes at four call sites, each passing a "visibleNamespaces.slice()" copy. The per-frame allocation causes earlier stack exhaustion than "normalize()". Crash threshold (Node.js 18, default stack): ~5,000 levels. Additional recursive entry points All five crash at ~10,000 levels on Node.js 18. | Function | Definition | Public API entry point(s) | Crash depth (Node.js 18) | |-----------------------------|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|--------------------------| | "_visitNode" | ""lib/dom.js:1529"" (https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L1529) | "getElementsByTagName()", "getElementsByTagNameNS()", "getElementsByClassName()", "getElementById()" | ~10,000 levels | | "cloneNode" (module fn) | ""lib/dom.js:3037"" (https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L3037) | "Node.prototype.cloneNode(true)" | ~10,000 levels | | "importNode" (module fn) | ""lib/dom.js:2975"" (https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L2975) | "Document.prototype.importNode(node, true)" | ~10,000 levels | | "getTextContent" (inner fn) | ""lib/dom.js:3130"" (https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L3130) | "node.textContent" (getter) | ~10,000 levels | | "isEqualNode" | ""lib/dom.js:1120"" (https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L1120) | "Node.prototype.isEqualNode(other)" | ~10,000 levels | Both active branches ("main" and "release-0.8.x") are identically affected. The unscoped "xmldom" package (≤ 0.6.0) carries the same recursive patterns from its initial commit. Browser behavior Tested with Chromium 147 (Playwright headless). Chromium's native C++ implementations of all seven DOM methods are iterative — they traverse the DOM without consuming JS call stack frames. All seven succeed at depths up to 20,000 without any crash. When "@xmldom/xmldom" is bundled and run in a browser context the same recursive JS code executes under the browser's V8 stack limit (~12,000–13,000 frames). The crash thresholds are similar to those observed on Node.js 18 (~5,000 for "serializeToString", ~10,000 for the remaining six). The vulnerability is specific to xmldom's pure-JavaScript recursive implementation, not an inherent property of the DOM operations. PoC "normalize()" (from @praveen-kv report, 2026-04-05) const { DOMParser } = require('@xmldom/xmldom'); function generateNestedXML(depth) { return '' + ''.repeat(depth) + 'text' + ''.repeat(depth) + ''; } const doc = new DOMParser().parseFromString(generateNestedXML(10000), 'text/xml'); doc.documentElement.normalize(); // RangeError: Maximum call stack size exceeded "XMLSerializer.serializeToString()" (from GHSA-2v35-w6hq-6mfw) const { DOMParser, XMLSerializer } = require('@xmldom/xmldom'); const depth = 5000; const xml = ''.repeat(depth) + ''.repeat(depth); const doc = new DOMParser().parseFromString(xml, 'text/xml'); new XMLSerializer().serializeToString(doc); // RangeError: Maximum call stack size exceeded The other methods have been verified using similar pocs. Impact Any service that accepts attacker-controlled XML and subsequently calls any of the seven affected DOM operations can be forced into a reliable denial of service with a single crafted payload. The immediate result is an uncaught "RangeError" and failed request processing. In deployments where uncaught exceptions terminate the worker or process, the impact can extend beyond a single request and disrupt service availability more broadly. No authentication, special options, or invalid XML is required. A valid, deeply nested XML document is enough. Disclosure The "normalize()" vector was publicly disclosed at 2026-04-06T11:25:07Z via "xmldom/xmldom#987" (xmldom/xmldom#987) (closed without merge). "serializeToString()" and the five additional recursive entry points were not mentioned in that PR. Fix Applied All seven affected traversals have been converted from recursive to iterative implementations, eliminating call-stack consumption on deep trees. "walkDOM" utility A new "walkDOM(node, context, callbacks)" utility is introduced. It traverses the subtree rooted at "node" in depth-first order using an explicit JavaScript array as a stack, consuming heap memory instead of call-stack frames. "context" is an arbitrary value threaded through the walk — each "callbacks.enter(node, context)" call returns the context to pass to that node's children, enabling per-branch state (e.g. namespace snapshots in the serializer). "callbacks.exit(node, context)" (optional) is called in post-order after all children have been visited. The following six operations are re-implemented on top of "walkDOM": | Operation | Public entry point(s) | |---|---| | "_visitNode" helper | "getElementsByTagName()", "getElementsByTagNameNS()", "getElementsByClassName()", "getElementById()" | | "getTextContent" inner function | "node.textContent" getter | | "cloneNode" module function | "Node.prototype.cloneNode(true)" | | "importNode" module function | "Document.prototype.importNode(node, true)" | | "serializeToString" worker | "XMLSerializer.prototype.serializeToString()", "Node.prototype.toString()", "NodeList.prototype.toString()" | | "normalize" | "Node.prototype.normalize()" | "normalize" uses "walkDOM" with a "null" context and an "enter" callback that merges adjacent Text children of the current node before "walkDOM" reads and queues those children — so the surviving post-merge children are what the walker descends into. Custom iterative loop for "isEqualNode" One function cannot use "walkDOM": "Node.prototype.isEqualNode(other)" (0.9.x only; absent from 0.8.x) compares two trees in parallel. It maintains an explicit stack of "{node, other}" node pairs — one node from each tree — which cannot be expressed with "walkDOM"'s single-tree visitor. After the fix All seven entry points succeed on trees of arbitrary depth without throwing "RangeError". The original PoCs still demonstrate the vulnerability on unpatched versions and confirm the fix on patched versions.

Publish Date: 2026-04-22

URL: CVE-2026-41673

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Release Date: 2026-04-22

Fix Resolution: https://github.com/xmldom/xmldom.git - 0.8.13,https://github.com/xmldom/xmldom.git - 0.9.10

CVE-2026-41672

Vulnerable Library - xmldom-0.9.7.tgz

A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.

Library home page: https://registry.npmjs.org/@xmldom/xmldom/-/xmldom-0.9.7.tgz

Path to dependency file: /package.json

Path to vulnerable library: /package.json

Dependency Hierarchy:

  • pdf2json-3.1.5.tgz (Root Library)
    • xmldom-0.9.7.tgz (Vulnerable Library)

Found in base branch: dev

Vulnerability Details

Summary The package allows attacker-controlled comment content to be serialized into XML without validating or neutralizing comment breaking sequences. As a result, an attacker can terminate the comment early and inject arbitrary XML nodes into the serialized output. Details The issue is in the DOM construction and serialization flow for comment nodes. When "createComment(data)" is called, the supplied string is stored as comment data through the generic character-data handling path. That content is kept as-is. Later, when the document is serialized, the serializer writes comment nodes by concatenating the XML comment delimiters with the stored "node.data" value directly. That behavior is unsafe because XML comments are a syntax-sensitive context. If attacker-controlled input contains a sequence that closes the comment, the serializer does not preserve it as literal comment text. Instead, it emits output where the remainder of the payload is treated as live XML markup. This is a real injection bug, not a formatting issue. The serializer already applies context-aware handling in other places, such as escaping text nodes and rewriting unsafe CDATA terminators. Comment content does not receive equivalent treatment. Because of that gap, untrusted data can break out of the comment boundary and modify the structure of the final XML document. PoC const { DOMImplementation, DOMParser, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'root', null); doc.documentElement.appendChild( doc.createComment('--> const reparsed = new DOMParser().parseFromString(xml, 'text/xml'); console.log(reparsed.documentElement.childNodes.item(1).nodeName); // injected Impact An application that uses the package to build XML from untrusted input can be made to emit attacker-controlled elements outside the intended comment boundary. That allows the attacker to alter the meaning and structure of generated XML documents. In practice, this can affect any workflow that generates XML and then stores it, forwards it, signs it, or hands it to another parser. Realistic targets include XML-based configuration, policy documents, and message formats where downstream consumers trust the serialized structure. Disclosure This vulnerability was publicly disclosed at 2026-04-06T11:25:07Z via "xmldom/xmldom#987" (https://github.com/xmldom/xmldom/pull/987), which was subsequently closed without being merged. Fix Applied «⚠ Opt-in required. Protection is not automatic. Existing serialization calls remain vulnerable unless "{ requireWellFormed: true }" is explicitly passed. Applications that pass untrusted data to "createComment()" or mutate comment nodes with untrusted input (via "appendData", "insertData", "replaceData", ".data =", or ".textContent =") should audit all "serializeToString()" call sites and add the option.» "XMLSerializer.serializeToString()" now accepts an options object as a second argument. When "{ requireWellFormed: true }" is passed, the serializer throws "InvalidStateError" before emitting a Comment node whose ".data" would produce malformed XML. On "@xmldom/xmldom" ≥ 0.9.10, the full W3C DOM Parsing §3.2.1.4 check is applied: throws if ".data" contains "--" anywhere, ends with "-", or contains characters outside the XML Char production. On "@xmldom/xmldom" ≥ 0.8.13 (LTS), only the "-->" injection sequence is checked. The "0.8.x" SAX parser accepts comments containing "--" (without ">"), so throwing on bare "--" would break a previously-working round-trip on that branch. The "-->" check is sufficient to prevent injection. PoC — fixed path const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'root', null); doc.documentElement.appendChild(doc.createComment('--> // Opt-in guard: throws InvalidStateError before serializing try { new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); } catch (e) { console.log(e.name, e.message); // InvalidStateError: The comment node data contains "--" or ends with "-" (0.9.x) // InvalidStateError: The comment node data contains "-->" (0.8.x — only --> is checked) } Why the default stays verbatim The W3C DOM Parsing and Serialization spec §3.2.1.4 defines a "require well-formed" flag whose default value is "false". With the flag unset, the spec explicitly permits serializing ill-formed comment content verbatim — this is also the behavior of browser implementations (Chrome, Firefox, Safari): "new XMLSerializer().serializeToString(doc)" produces the injection sequence without error in all major browsers. Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in "requireWellFormed: true" flag allows applications that require injection safety to enable strict mode without breaking existing deployments. Residual limitation The fix operates at serialization time only. There is no creation-time check in "createComment" — the spec does not require one for comment data. Any path that leads to a Comment node with "--" in its data ("createComment", "appendData", ".data =", etc.) produces a node that serializes safely only when "{ requireWellFormed: true }" is passed to "serializeToString".

Publish Date: 2026-04-22

URL: CVE-2026-41672

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Release Date: 2026-04-22

Fix Resolution: https://github.com/xmldom/xmldom.git - 0.8.13,https://github.com/xmldom/xmldom.git - 0.9.10

CVE-2026-34601

Vulnerable Library - xmldom-0.9.7.tgz

A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.

Library home page: https://registry.npmjs.org/@xmldom/xmldom/-/xmldom-0.9.7.tgz

Path to dependency file: /package.json

Path to vulnerable library: /package.json

Dependency Hierarchy:

  • pdf2json-3.1.5.tgz (Root Library)
    • xmldom-0.9.7.tgz (Vulnerable Library)

Found in base branch: dev

Vulnerability Details

xmldom is a pure JavaScript W3C standard-based (XML DOM Level 2 Core) "DOMParser" and "XMLSerializer" module. In xmldom versions 0.6.0 and prior and @xmldom/xmldom prior to versions 0.8.12 and 0.9.9, xmldom/xmldom allows attacker-controlled strings containing the CDATA terminator ]]> to be inserted into a CDATASection node. During serialization, XMLSerializer emitted the CDATA content verbatim without rejecting or safely splitting the terminator. As a result, data intended to remain text-only became active XML markup in the serialized output, enabling XML structure injection and downstream business-logic manipulation. This issue has been patched in xmldom version 0.6.0 and @xmldom/xmldom versions 0.8.12 and 0.9.9.

Publish Date: 2026-04-02

URL: CVE-2026-34601

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Release Date: 2026-04-01

Fix Resolution: https://github.com/xmldom/xmldom.git - 0.9.9

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions