Node Task Registry Design

Node Tasks

Basic Task Design

Warble Nodes can have one or more (or all) tasks assigned to it. Each task consists of a target to test, as well as what to test and how to go about that, encapsulated in a payload object. Each check you wish to perform requires an associated task, but may be performed by multiple nodes. Thus, testing whether your main web site works on port 80 requires a task, as does a test for https on port 443, as they are technically two distinct targets. Specific tasks may have optional tests built into them,for instance a SSL certificate check on a https site.

Task status

A task can be either enabled, disabled, or muted. Disabling a task prevents it from running on nodes, whereas muting a task will still cause nodes to perform it, but alerting will be silenced. Muting can be used for when you still need to monitor a situation, but you don’t need to be reminded whenever the test results changes.

Task sensitivity

A task can also have a specific sensitivity set. Sensitivity denotes how failures are treated, and when to alert about state changes:

  • low: Alerting only happens if all currently active nodes agree that the test has failed, e.g. the service is down completely.
  • default: Alerting happens if a majority of nodes agree that the test has failed. This is the default behavior and balances out the need for speedy alerting versus the need for fewer false positives.
  • high: Alerting happens if more than one node sees failures. While more sensitive than the default, it still removes a fair bit of false positives by requiring confirmation of a reported failure by at least one other node.
  • twitchy: Alerting happens if any node registers a failure. This may be useful for services that have guaranteed service level agreements, but can lead to a lot of false positives.

It should be noted that if you run a setup of Warble with only one, or very few nodes attached, the sensitivity levels may differ very little in terms of when alerting happens, as the definition of quorum changes based on how many active nodes you have at any given time.

Task Categories

Each task is assigned a task category, which helps you separate tasks into easily recognizable groups and access definitions.

Each task category has a distinct alerting and escalation path, meaning you can assign different teams to different categories, and have alerts go to that team, independent of other task categories. This can be useful for having front-end issues go to a specific team, while back-end issues go to another team.

Task Category Access

Users can be assigned the following access levels to categories, on a per-user basis:

  1. Read-only access: The user can read and analyze test results, but cannot edit or remove tasks, nor see the specific payload details (thus, if you add a test with credentials, users with read-only access cannot see the credentials)
  2. Read/write access: The user can read, modify, and remove existing tests. They can also add new tests to the category.
  3. Admin access: The user can, besides permissions listed above, also modify or remove the category altogether or change its alerting options. This access level should generally be reserved for power users only.

It should be noted that super users on the system (such as the account you create at setup) can freely access and modify any aspect of the tasks/categories.