This is an introductory post about the Inventory in Ansible where I’m looking at the current design and implementation, some of it internals, and where hope to yield some discussion and ideas on how it could be improved, be extended. A recent discussion at Devopsdays Ghent last week, re-spawned my interest in this topic, with some people actively showing interest to participate. Part of that interest is about building some standard inventory tool with some API and frontend, similar to what Vincent Van der Kussen started (and also lots of other often now abandoned projects) but going way further. Of course, that exercise would be pointless when not looking at what parts need to happen upstream, and what parts are more fitting in a separate project, or not. That’s also why my initial call to people interested in this didn’t focus on immediately bringing that discussion on one of the Ansible mailing lists.
In it’s current state, the Ansible Inventory – at the time of writing 2.2 was just released – hasn’t changed since it’s initial inception on how it was modelled during it’s early 0.x releases. This post tries to explain a bit how it was designed, how it works, and what might be its limits. I might oversimplify some details whilst focusing on the internal model and how data is kept in data structures.
Whilst most people tend to see the inventory as just the host list, and a way to group them, it is much more than that, as parameters to each host, inventory variables, are also part of the inventory. Initially, those group and host variables where implemented as vars plugins, and whilst the documentation still seems to imply this, this hasn’t been true since around a major bugfix and update to the inventory in the 1.7 release, now over two years ago, where this part now is a fixed part of the ansible.inventory code. As far as I know, nobody seems to use custom vars plugins. I’d argue that the part where one manages variables, parameters to playbooks and roles, is the most important part in inventory. Structurally, the inventory model comes down to this:
The inventory basically is a list of hosts, which can be member of one or more groups, and where each group can be a child of one or more other (parent) groups. One can define variables on each of those hosts and groups, and define different values for all of those hosts. Groups and hosts inherit variables from their parents.
The inventory is mostly pre-parsed at the beginning of an Ansible run. After that, you can consider the inventory as being a set of groups, where hosts live, and each host has a set of variables with one specific value attached to it. An often made misunderstanding that comes up on the mailing list every now and then, is thinking a host can have a different value for a specific variable, depending on which group was used to target that host in a playbook. Ansible doesn’t work like that. Ansible takes a hosts: definition and calculates a list of hosts. In the end which exact group was used to get to that host, doesn’t matter anymore. Before hosts are touched, and variables are used, those variables always are calculated down to the host. Assigning different values to different groups, is how you can manage those, but in the end, you could choose to never use group_vars, and put everything yourself, manually, in host_vars, and get the same end result.
Now, if a host is member of multiple groups, and the same variable is defined in (some of) those groups, the question is, which value will prevail? Variable precedence in Ansible is quite a beast, and can be quite complex or at least daunting to both new and experienced users. The Ansible docs overview doesn’t explain alle the nifty details, and I couldn’t even find an explanation how precedence works within group_vars. Now, the short story here is, the more specific and down the tree wins. A child group wins over a parent group, host vars always win over group vars.
When host kid is member of a father group, and that father group is member of a grandfather group, then the kid will inherit variables from grandfather and father. Father could overrule a value from grandfather, and kid can overrule his father and grandfather if he wants. Modern family values.
There are also two special default groups: all and ungrouped. The former contains all groups that are not defined as a child group of another group, the latter contains all hosts that are not made member of a group.
But what if I have two application groups, app1 and app2, which are not parent-child related, and both define the same variable? In this case, both groups app1 and app2 live on the same level, and have the same ‘depth‘. Which one will prevail depends on the alphanumerical sorting of both names – IIRC – but I’m not even sure of the details.
That depth parameter is actually an internal parameter of the Group object. Each time a group is made member of a parent group, that group gets the depth from its parent + 1 unless if that group’s depth was already bigger than that newly calculated depth. The special ALL group has a depth of 0, app1 and app2 both have a depth of 1, and app21 got a depth of two. For a variable defined in all those groups, the value in app21 will be inherited by node2, whilst node1 will get the value from either app1 or app2, which is more or less undefined. That’s one of the reasons why I recommend to not define the same variables in multiple group “trees”, where a group tree is one big category of groups. It’s already hard to manage groups within a specific sub tree, whilst keeping precedence (and hence depth) in mind, it’s totally impractical to track that amongst groups from different sub trees.
Oh, if you want to generate a nice graphic of your inventory, to see how it lays out, Will Thames manages a nice project (https://github.com/willthames/ansible-inventory-grapher) that does just that. As long as graphviz manages to generate a small enough graph that fits on a sheet of paper, whilst remaining readable, you probably have a small inventory.
Unless you write your own application to manage all this, and write a dynamic inventory script to export data to Ansible, one is stuck with mostly the INI style hosts files, and the yaml group_vars and host_vars files to manage those groups and variables. If you need to manage lots of hosts, lots of applications, you probably end up with lots of categories to organise these groups, and then it’s very easy to lose any overview in how those groups are structured, and how variables inherit over those groups. It becomes hard to predict which value will prevail for a particular host, which doesn’t help to ensure consistency. If you were to change default values in the all group, whilst some hosts have an overruled value defined in child groups, but not all, you suddenly change values for a bunch of hosts. That might be your intention, but when managing hundreds of different groups, you know such a mistake might easily happen when all you have are the basic Ansible inventory files.
Ansible tends (or at least often used to) to recommend not doing such complicated things, “better remodel how you structure your groups” – but I think managing even moderately large infrastructures can quickly have complex demands on how to structure your data model. Different levels of organisation (sub trees) are often needed in this concept. Many of us need to manually create intersection of groups such as app1 ~ development => app1-dev to be able to manage different parameters for different applications in different environments. At scale this quickly becomes cumbersome with the ini and yaml file based inventory, as that doesn’t scale. Maybe we need a good pattern to handle dynamic intersections implemented upstream? Yes, hosts: app1&dev, but that is parsed at run time, and you can’t assign vars to such an intersection. An interesting approach is how Saltstack – which doesn’t have the notion of groups in the way Ansible does – lets you target hosts using grains, a kind of dynamic groups filtering hosts based on facts. Perhaps this project does something similar?
Putting more logic on how we manage the inventory, could be beneficial in several ways. Some ideas.
Besides scaling the current model by having a better tooling, it could for example allow to write simpler roles/playbooks, e.g. where the only lookup plugin we need to use for with_ iterations, would be the standard with_items, as the other ones are actually used to handle more complex data and generate lists from it. That could be part of the inventory, where the real infrastructure-as-data is modelled, doing a better decoupling of code (roles) and config (inventory). Possibly a way to really describe infrastructure as a higher level data model, describing a multi-tier application, that gets translated into specific configurations and action on the node level.
How about tracking the source of (the value for) a variable, allowing to make a difference between having a variable inherit a default and updating from changing that default from a more generic group (e.g. the dns servers for the DC), as opposed to only instantiate a variable from a default, and not letting it change afterwards by inheriting that default (e.g. the Java version development starts out with during the very first development deploy).
Where are gathered facts kept? For some, just caching those as it currently happens, is more than enough. For others, especially more specific custom facts – I’d call those run time values – that can have a direct relationship with inventory parameters, it might make more sense to keep them in the inventory. Think about the version of your application: you configure that as a parameter, but how do you know if that version is also the one that is currently deployed? One might need to keep track of the deployed version (runtime fact) and compare that to what was configured to be deployed (inventory parameter).
An often needed pattern when deploying clusters, is the concept of a specific cluster group. I’ve seen roles acting on the master node in a cluster by conditionally running when: inventory_hostname = myapplicationcluster.
How many of you are aware that the order of the node list of a group was reversed somewhere between two 1.x releases? It was initially reverse alphanumerically ordered.
This should nicely illustrate the problem of relying on undocumented and untested behaviour. This pattern also makes you hard coding an inventory group name, in a role, which I think is ugly, and makes your role less portable. Doing a run_once can solve a similar problem, but what if your play targets a group of multiple clusters, instead of a specific cluster where you need to run_once per cluster? Perhaps we need to introduce a special kind of group that can handle the concept of a cluster? Metadata to groups, hosts and their variables?
Another hard pattern to implement in Ansible, is what Puppet solves with their exported resources. Take a list of applications that are deployed on multiple (application) hosts, and put that in a list so we can deploy a load balancer on 1 specific node. Or monitoring, or ACL to access those different applications. As Ansible in the end manages just vars per hosts, doing things amongst multiple hosts is hard. Can’t solve everything with a delegate_to.
At some point, we might want to parse the code (the roles) that will be run, as to import a list of variables that are needed, and build some front-end, so the user can instantiate his node definitions and parameterize the right vars in inventory.
How about integrating a well managed inventory, with external sources? Currently, that happens by defining the Ansible inventory as a directory, and combining ini files with dynamic inventory scripts. I won’t get started on the merits of dir.py, but let’s say we really need to redesign that into something more clever, that integrates multiple sources, keeps track of metadata, etc. William Leemans started something on this after some discussion at Loadays in 2014, implementing specific external connectors.
Perhaps on a side note, I have also been thinking a lot on versioning. Versioning of the deploy code, and especially roles here. I want to be able to track the life cycle of an application, its different versions, and the life cycle and versions of it’s deployment. Imagine I start a business where I offer hosted Gitlab for multiple customers. Each of those customers potentially has multiple organisations, and hence Gitlab instances. Each of them potentially want to always run the latest version, whilst some want to remain ages on the same enterprizy first install, and others are somewhere in between. Some might even have different DTAP environments – Gitlab might be a bad example here as not all customers will do custom Gitlab development, but you get the idea. In the end you really have LOTS of Gitlab instances to manage. Will you manage them all with one single role? At some point the role needs development. Needs testing. A new version of the role needs to be used in production, for a specific customer’s instance. How can I manage these different role version in the Ansible eco-system? Doing that in the inventory sounds like the central source of information?
Lots of ideas. Lots of things to discuss. I fully expect to hear about many other use cases I never thought of, or never even would need. And as to not re-invent the wheel, insights from other tools are very welcome here!