Skip to content

fix(html): correctly handle <caption> element in HTML table deserialization#672

Open
Satvik77777 wants to merge 1 commit into
accordproject:mainfrom
Satvik77777:fix/html-table-caption-parsing
Open

fix(html): correctly handle <caption> element in HTML table deserialization#672
Satvik77777 wants to merge 1 commit into
accordproject:mainfrom
Satvik77777:fix/html-table-caption-parsing

Conversation

@Satvik77777
Copy link
Copy Markdown

Fixes #635

Problem

HTML tables with a <caption> element produced broken Markdown output — each cell rendered on its own line instead of a valid table row:

Before (broken): Employee Details
| | Name
| Role
|
| Alice
| Engineer
|

Root Cause

The <caption> element was being processed as a regular table child node, corrupting the TableHead/TableBody structure during deserialization. This caused the table rows to break apart into individual lines.

Fix

  • Extract <caption> directly from the DOM before processing table children — no leaking intermediate nodes
  • Emit caption as a standalone Paragraph block before the table (plain text, no Strong wrapping of arbitrary content, no \n\n text injection)
  • Filter table children to TableHead/TableBody only, ignoring other child nodes
  • Promote first body row to header when it contains th cells and no thead exists
  • Normalize whitespace in table cells via cleanTableNodes() to remove Softbreak nodes and merge adjacent Text nodes
  • Add regression tests covering: caption present, no caption (no regression), whitespace normalization in caption

Note on #640

This PR addresses the same issue as #640 but resolves all review feedback that blocked that PR from merging:

  • ✅ Tests included (missing in fix(table): handle HTML table captions correctly #640)
  • ✅ Caption extracted from DOM directly — no leaking type: 'caption' nodes
  • ✅ No Strong wrapping of arbitrary block content
  • ✅ No \n\n text node hack inside Paragraph
  • ✅ 4-space indentation throughout (ESLint compliant)

Checklist

  • DCO sign-off provided (--signoff)
  • Regression tests added
  • Commits follow AP conventional commits format
  • No documentation changes required
  • Merging to main from fork

Signed-off-by: Satvik <satviksaini02@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML Transform: Erronous HTML Table Parsing

1 participant