Skip to content

[BUG] Streaming WorkbookReader _parseSharedStrings doesn't handle rich text within shared string nodes #1431

Closed
@rheidari

Description

@rheidari

🐛 Bug Report

Lib version: 4.1.1

Rich Text nodes contained within the internal sharedStrings.xml file results in each node within the rich text shared string to be emitted / pushed separately into the cached sharedStrings array, resulting in sharedString indexes to be incorrect.

Steps To Reproduce

const workbook = new Excel.stream.xlsx.WorkbookWriter({
    filename: "./test.xlsx",
    useSharedStrings: true,
});

const sheet = workbook.addWorksheet("data");

const rowData = [
    {
        richText: [
            { font: { bold: true }, text: "This should " },
            { font: { italic: true }, text: "be one shared string value" },
        ],
    },
    "this should be the second shared string",
];

sheet.addRow(rowData);

await workbook.commit();

const workbookReader = new Excel.stream.xlsx.WorkbookReader("./test.xlsx", {
    entries: "emit",
    hyperlinks: "cache",
    sharedStrings: "cache",
    styles: "cache",
    worksheets: "emit",
});

for await (const worksheetReader of workbookReader) {
    for await (const row of worksheetReader) {
        // actual: 'This Should '
        expect(row.values[1]).toEqual(rowData[0]);
        // actual: 'be one shared string value'
        expect(row.values[2]).toEqual(rowData[1]);
    }
}

The expected behaviour:

The sharedString value should be the entire rich text object rather being split into separate pieces.

Possible solution (optional, but very helpful):

async *_parseSharedStrings(entry) {
  this._emitEntry({type: 'shared-strings'});
  switch (this.options.sharedStrings) {
    case 'cache':
      this.sharedStrings = [];
      break;
    case 'emit':
      break;
    default:
      return;
  }

  let text = null;
  let richText = [];
  let index = 0;
  let font = null;
  for await (const events of parseSax(iterateStream(entry))) {
    for (const {eventType, value} of events) {
      if (eventType === 'opentag') {
        const node = value;
        switch (node.name) {
          case 'b':
            font = font || {};
            font.bold = true;
            break;
          case 'charset':
            font = font || {};
            font.charset = parseInt(node.attributes.charset, 10);
            break;
          case 'color': 
            font = font || {};
            font.color = {};
            if (node.attributes.rgb) {
              font.color.argb = node.attributes.argb;
            }
            if (node.attributes.val) {
              font.color.argb = node.attributes.val;
            }
            if (node.attributes.theme) {
              font.color.theme = node.attributes.theme;
            }
            break;
          case 'family':
            font = font || {};
            font.family = parseInt(node.attributes.val, 10);
            break;
          case 'i':
            font = font || {};
            font.italic = true;
            break;
          case 'outline':
            font = font || {};
            font.outline = true;
            break;
          case 'rFont':
            font = font || {};
            font.name = node.value;
            break;
          case 'si':
            font = null;
            richText = [];
            text = null;
            break;
          case 'sz':
            font = font || {};
            font.size = parseInt(node.attributes.val, 10);
            break;
          case 'strike':
            break;
          case 't':
            text = null;
            break;
          case 'u':
            font = font || {};
            font.underline = true;
            break;
          case 'vertAlign':
            font = font || {};
            font.vertAlign = node.attributes.val
            break;
        }
      } else if (eventType === 'text') {
        text = text ? text + value : value;
      } else if (eventType === 'closetag') {
        const node = value;
        switch (node.name) {
          case 'r':
            richText.push({
              font,
              text
            });

            font = null;
            text = null;
            break;
          case 'si':
            let data = text;
            if (richText.length) {
              data = { richText };
            }
            if (this.options.sharedStrings === 'cache') {
              this.sharedStrings.push(data);
            } else if (this.options.sharedStrings === 'emit') {
              yield { index: index++, text: data };
            }

            richText = [];
            font = null;
            text = null;
            break;
        }
      }
    }
  }
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy