Comprehending YAML
YAML is a little clever for my taste but I’m starting to get it.
Before yesterday’s post, I had no idea how it related to JSON. In that post I personalized a few examples from the YAML docs and manually converted them.
And after sleeping on my exercise, I have a better picture of what’s going on.
- YAML infers complex types
- YAML infers simple types
- Takeaways
- YAML shorthand can be confusing
- YAML patterns
- All mixed up
- Conclusion
- Keep in touch
YAML infers complex types
As much as possible, YAML infers complex data structures by the data it composes.
This is a sequence and the containing Array structure is implied. Here’s how it looks in JSON:
Compare this to a YAML mapping, where the root structure is implied to be an object.
Again, here’s what that looks like in JSON.
So what happens if we try to mix sequences and mappings at the root level?
It breaks:
This tracks because something can’t be both an object AND an array.
Understanding inference
Indentifying when you’re describing an Array (sequence) and when you’re describing an Object (mapping) is critically important. And it’s not always clear.
Can you guess what the JSON equivalent for this YAML is?
The dashes -
indicate that the root structure is an array (sequence). But each array item comprises a descrete object (mapping) with a key-value pair. This is inferred from the colon :
between values in each array item.
So the JSON output for the YAML above is this:
An array of objects (or “sequence of mappings”).
Now that I understand it, I see the dash (-
) like list-items in Markdown.
YAML infers simple types
Let’s take an array (sequence) of objects (mappings).
The keys are strings and the values are numbers.
Evermore
becomes "Evermore"
.
2020
stays 2020
.
Taylor Swift
becomes "Taylor Swift"
.
Takeaways
After looking at so much JavaScript and JSON, this is a little unsettling but — in simple examples like this — there’s a simplicity to the representation.
YAML shorthand can be confusing
Below we have an array (sequence) of objects (mappings).
JSON looks like this.
What does it look like to add more properties to the these objects?
YAML allows us to use JSON object syntax.
But it’s not super YAML-y. So there’s an alternative that uses newlines.
This is challenging for me to interpret. Because the dash (-
) is separating discrete objects (mappings) in the array (sequence). So it feels like the dash (-
) is a directive for the object (mapping). But it’s not, it’s communicating that the containing structure is an array.
Pressing into the confusion, consider this array (sequence) containing a string, number, object, and array.
This is that same file in JSON.
Takeaways
The presence of dash (-
) and colon (:
) describe the surrounding structure.
In the case of arrays (sequences) of objects (mappings), with multiple key-value pairs, this terseness can be unclear. At least until you’ve trained ourselves to see the invisble structures that (-
) and (:
) represent.
For me, learning to interpret the above line as an object in an array
has made the biggest difference in my ability to quickly parse YAML.
YAML patterns
Identifying patterns is helpful.
Here are a few complex handoffs that I had trouble with.
Array of arrays : sequence of sequences
This YAML file is an array (sequence) containing two arrays (sequences) each with three strings (scalars).
Array of objects : sequence of mappings
This YAML file is an array (sequence) referencing two objects (mappings) each with two key-value pairs.
Object of arrays : mapping of sequences
This YAML file is an object (mapping) with two key-value pairs, each key referencing an array (sequence) of strings (scalars).
Object of objects : mapping of mappings
This YAML file is an object (mapping) with two key-value pairs, each key referencing another object with two key-value pairs with mixed strings and numbers (scalars) as values.
All mixed up
Parse isolated patterns is a good start but the big game is reading entire YAML files.
Look at this GitHub Actions workflow.
Here’s what we can evaluate.
- The root structure is an object with three properties
name
references a stringon
references an object with one propertyschedule
references an array with one object containing one propertycron
references a string
jobs
referencs an object with one propertybuild
references an object with three propertiesname
references a stringruns-on
references a stringsteps
references array with an object containing two propertiesname
references a stringrun
references a string
Conclusion
I think I understand YAML enough to move on with my life. I hope you feel the same way.
Learning how to identify the implied structures has made all the difference.