brent brent - 1 month ago 6
YAML Question

Processing JSON with YAML Parser; throws on tab whitespace

I'm a little unsure exactly where to point the finger (other than at myself of course)


  1. JSON is a subset of YAML 1.2
    http://www.yaml.org/spec/1.2/spec.html
    "every JSON file is also a valid YAML file"

  2. JSON can have tabs as 'insignificant whitespace' - including tabs
    http://www.ietf.org/rfc/rfc4627.txt
    "Insignificant whitespace is allowed ..."

  3. YAML does not allow tabs for indentation
    http://www.yaml.org/spec/1.2/spec.html
    "tab characters must not be used in indentation"



So using my YAML parser to process the below JSON

{
\t"result" : "success",
}


NOTE: the \t is just to visualize, the input contains a real tab character.

Hits an error 'not allowed to use tab for indenting' <- which seems correct.

But then how does the "every JSON file is also a valid YAML file" rule hold; as my file is valid JSON?

As the tab character is meaningless should I just run a pre-processing step to strip out all tabs? As the only whitespace that is allowed in strings is 'space'- it should be safe to just strip out all tabs in the file.

Answer

Hits an error 'not allowed to use tab for indenting' <- which seems correct.

It is not.

This is the relevant production in the Spec:

[140]   c-flow-mapping(n,c) ::= “{” s-separate(n,c)?
                                ns-s-flow-map-entries(n,in-flow(c))? “}”

s-separate(n,c) resolves to s-separate-lines(n) here (because we are not inside block-key or flow-key). Skipping some steps, we reach s-separate-in-line which allows tab characters.

The bottom line is that this tab character in your JSON is not indentation. Indentation is only relevant in block style (i.e. not using [ or { for sequences and mappings respectively). In Flow style, whitespace is only for separation.

Edit: Removed example link because it was somewhat misleading.

Edit 2: To answer your second question: No, do not strip tabs. They may be content inside scalars! See this example where a tabular actually determines the indentation of a block scalar.

Comments