brent brent - 5 months ago 26
YAML Question

Processing JSON with YAML Parser; throws on tab whitespace

I'm a little unsure exactly where to point the finger (other than at myself of course)

  1. JSON is a subset of YAML 1.2
    "every JSON file is also a valid YAML file"

  2. JSON can have tabs as 'insignificant whitespace' - including tabs
    "Insignificant whitespace is allowed ..."

  3. YAML does not allow tabs for indentation
    "tab characters must not be used in indentation"

So using my YAML parser to process the below JSON

\t"result" : "success",

NOTE: the \t is just to visualize, the input contains a real tab character.

Hits an error 'not allowed to use tab for indenting' <- which seems correct.

But then how does the "every JSON file is also a valid YAML file" rule hold; as my file is valid JSON?

As the tab character is meaningless should I just run a pre-processing step to strip out all tabs? As the only whitespace that is allowed in strings is 'space'- it should be safe to just strip out all tabs in the file.


Hits an error 'not allowed to use tab for indenting' <- which seems correct.

It is not.

This is the relevant production in the Spec:

[140]   c-flow-mapping(n,c) ::= “{” s-separate(n,c)?
                                ns-s-flow-map-entries(n,in-flow(c))? “}”

s-separate(n,c) resolves to s-separate-lines(n) here (because we are not inside block-key or flow-key). Skipping some steps, we reach s-separate-in-line which allows tab characters.

The bottom line is that this tab character in your JSON is not indentation. Indentation is only relevant in block style (i.e. not using [ or { for sequences and mappings respectively). In Flow style, whitespace is only for separation.

Edit: Removed example link because it was somewhat misleading.

Edit 2: To answer your second question: No, do not strip tabs. They may be content inside scalars! See this example where a tabular actually determines the indentation of a block scalar.