chan.dev / posts

Comprehending YAML

YAML is a little clever for my taste but I’m starting to get it.

Before yesterday’s post, I had no idea how it related to JSON. In that post I personalized a few examples from the YAML docs and manually converted them.

And after sleeping on my exercise, I have a better picture of what’s going on.

YAML infers complex types

As much as possible, YAML infers complex data structures by the data it composes.

- Evermore
- Folklore
- Lover

This is a sequence and the containing Array structure is implied. Here’s how it looks in JSON:

["Evermore", "Folklore", "Love"]

Compare this to a YAML mapping, where the root structure is implied to be an object.

Evermore: 2020
Folklore: 2020
Lover: 2019

Again, here’s what that looks like in JSON.

{
"Evermore": 2020,
"Folklore": 2020,
"Lover": 2019
}

So what happens if we try to mix sequences and mappings at the root level?

-Evermore
Folklore: 2020
Lover: 2019

It breaks:

Terminal window
YAMLException: end of the stream or a document separator is expected at line 2, column 9: Folklore: 2020

This tracks because something can’t be both an object AND an array.

Understanding inference

Indentifying when you’re describing an Array (sequence) and when you’re describing an Object (mapping) is critically important. And it’s not always clear.

Can you guess what the JSON equivalent for this YAML is?

- Evermore: 2020
- Folklore: 2020
- Lover: 2019

The dashes - indicate that the root structure is an array (sequence). But each array item comprises a descrete object (mapping) with a key-value pair. This is inferred from the colon : between values in each array item.

So the JSON output for the YAML above is this:

[
{
"Evermore": 2020
},
{
"Folklore": 2020
},
{
"Lover": 2019
}
]

An array of objects (or “sequence of mappings”).

Now that I understand it, I see the dash (-) like list-items in Markdown.

YAML infers simple types

Let’s take an array (sequence) of objects (mappings).

- Evermore: 2020
- Folklore: 2020
- Lover: 2019

The keys are strings and the values are numbers.

Evermore becomes "Evermore".
2020 stays 2020.
Taylor Swift becomes "Taylor Swift".

Takeaways

After looking at so much JavaScript and JSON, this is a little unsettling but — in simple examples like this — there’s a simplicity to the representation.

YAML shorthand can be confusing

Below we have an array (sequence) of objects (mappings).

- name: Taylor Swift
- name: The National

JSON looks like this.

[
{
"name": "Taylor Swift"
},
{
"name": "The National"
}
]

What does it look like to add more properties to the these objects?

YAML allows us to use JSON object syntax.

- {name: Taylor Swift, album_count: 9}
- {name: The National, album_count: 8}

But it’s not super YAML-y. So there’s an alternative that uses newlines.

- name: Taylor Swift
album_count: 9
- name: The National
album_count: 8

This is challenging for me to interpret. Because the dash (-) is separating discrete objects (mappings) in the array (sequence). So it feels like the dash (-) is a directive for the object (mapping). But it’s not, it’s communicating that the containing structure is an array.

Pressing into the confusion, consider this array (sequence) containing a string, number, object, and array.

- Taylor Swift
- 1989
- album_count: 9
nationality: American
- - Big Machine
- Republic

This is that same file in JSON.

[
"Taylor Swift",
1989,
{
"album_count": 9,
"nationality": "American"
},
["Big Machine", "Republic"]
]

Takeaways

The presence of dash (-) and colon (:) describe the surrounding structure.

In the case of arrays (sequences) of objects (mappings), with multiple key-value pairs, this terseness can be unclear. At least until you’ve trained ourselves to see the invisble structures that (-) and (:) represent.

- name: Taylor Swift
album_count: 9

For me, learning to interpret the above line as an object in an array has made the biggest difference in my ability to quickly parse YAML.

YAML patterns

Identifying patterns is helpful.

Here are a few complex handoffs that I had trouble with.

Array of arrays : sequence of sequences

- - Evermore
- Folklore
- Lover
- - I Am Easy to Find
- Sleep Well Beast
- Trouble Will Find Me

This YAML file is an array (sequence) containing two arrays (sequences) each with three strings (scalars).

Array of objects : sequence of mappings

- name: Taylor Swift
album_count: 9
- name: The National
album_count: 8

This YAML file is an array (sequence) referencing two objects (mappings) each with two key-value pairs.

Object of arrays : mapping of sequences

Taylor Swift:
- Evermore
- Folklore
- Lover
The National:
- I Am Easy to Find
- Sleep Well Beast
- Trouble Will Find Me

This YAML file is an object (mapping) with two key-value pairs, each key referencing an array (sequence) of strings (scalars).

Object of objects : mapping of mappings

Taylor Swift:
album_count: 9
label: Republic
The National:
album_count: 8
label: 4AD

This YAML file is an object (mapping) with two key-value pairs, each key referencing another object with two key-value pairs with mixed strings and numbers (scalars) as values.

All mixed up

Parse isolated patterns is a good start but the big game is reading entire YAML files.

Look at this GitHub Actions workflow.

name: Netlify Rebuild
on:
schedule:
- cron: '0 21 * * MON-FRI'
jobs:
build:
name: Netlify Rebuild
runs-on: ubuntu-latest
steps:
- name: Curl request
run: curl -X POST -d {} https://api.netlify.com/build_hooks/601321b7879709a8b8874175

Here’s what we can evaluate.

  • The root structure is an object with three properties
  • name references a string
  • on references an object with one property
    • schedule references an array with one object containing one property
      • cron references a string
  • jobs referencs an object with one property
    • build references an object with three properties
      • name references a string
      • runs-on references a string
      • steps references array with an object containing two properties
        • name references a string
        • run references a string

Conclusion

I think I understand YAML enough to move on with my life. I hope you feel the same way.

Learning how to identify the implied structures has made all the difference.

Keep in touch