Skip to content

Infinite loop in Parser when input object has empty strings as keys and flatten is used #76

@mikkopiu

Description

@mikkopiu

Description

The base Parser (or BaseParser) from @json2csv/plainjs will enter an infinite loop when attempting to parse an array of objects where at least one contains empty string keys like:

[
  {
    "a": {
      "": {
        "b": 1
      }
    }
  },
  {
    "anything": 2
  }
]

and flatten is used, so the headers end up like "a..b".

The current regular expression

const rePropName = RegExp(
  // Match anything that isn't a dot or bracket.
  '[^.[\\]]+' +
    '|' +
    // Or match property names within brackets.
    '\\[(?:' +
    // Match a non-string expression.
    '([^"\'][^[]*)' +
    '|' +
    // Or match strings (supports escaping characters).
    '(["\'])((?:(?!\\2)[^\\\\]|\\\\.)*?)\\2' +
    ')\\]' +
    '|' +
    // Or match "" as the space between consecutive dots or empty brackets.
    '(?=(?:\\.|\\[\\])(?:\\.|\\[\\]|$))',
  'g',
);

ends up forever testing true against strings like a..b, causing castPath's while loop:

  while ((match = rePropName.exec(value))) {
    result.push(match[3] ?? match[1]?.trim() ?? match[0]);
  }

to never finish until the result array blows past whatever the runtime's Array size limit is, throwing RangeError: Invalid array length (V8's error message).


Obviously this is an extreme edge case and you might wonder why one would even have empty strings as keys but that's the data I've found myself with 😅

This seems to have regressed in v7.0.4 with the introduction of the above regular expression rePropName in packages/plainjs/src/utils.ts, based on some unit tests that pass with v7.0.3. Unfortunately the expression is a bit too complex for me to confidently point to a certain solution.


As a sidenote, it also seems that the above regular expression would only work with .-separated paths but I haven't tested that enough yet to be sure.

Reproduction

Simple test script
import { Parser } from '@json2csv/plainjs';
import { flatten } from '@json2csv/transforms';
const parser = new Parser({
  delimiter: ';',
  transforms: [flatten({ arrays: true, objects: true, separator: '.' })],
});
parser.parse([
  {
    a: { '': { b: 1 } },
  },
  {
    anything: 2, // This can be literally anything, even an empty object
  },
]);

which when run, will enter the infinite loop and soon crash after maxing out the V8 Array size limit (1 GB):

Stack trace
/my/dir/node_modules/@json2csv/plainjs/dist/mjs/utils.js:21
        result.push((_c = (_a = match[3]) !== null && _a !== void 0 ? _a : (_b = match[1]) === null || _b === void 0 ? void 0 : _b.trim()) !== null && _c !== void 0 ? _c : match[0]);
               ^

RangeError: Invalid array length
    at Array.push (<anonymous>)
    at castPath (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/utils.js:21:16)
    at getProp (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/utils.js:30:56)
    at Object.value (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:65:37)
    at JSON2CSVParser.processCell (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:133:44)
    at file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:118:71
    at Array.map (<anonymous>)
    at JSON2CSVParser.processRow (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:118:47)
    at file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/Parser.js:62:48
    at Array.map (<anonymous>)

Node.js v20.11.0

Workaround

An extremely silly workaround using null terminators and find-and-replace

As a temporary workaround, one silly solution that happens to work in my case is to use null terminators and a custom formatter; with all else equal in the repro above, we can change the Parser configuration to be:

const parser = new Parser({
  delimiter: ';',
  transforms: [flatten({ objects: true, arrays: true, separator: '\0' })],
  formatters: {
    header: (h) => h.replaceAll('\0\0', '."".').replaceAll('\0', '.'),
  },
})

which will work around the issue of only supporting . as a separator without accidentally replacing non-separator .s while also working around this issue by making the empty string sections explicit as ."". (instead of ..).

Entirely silly but appears to work. 🤡

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions