This week VictorOps Inc., a real-time incident management company producing DevOps solutions, announced the release of the Incident Automation Engine. This new product displays a set of automation features for on-call teams designed to increase mean time to resolution (MTTR) by using intelligent alerts, sophisticated routing and smarter system outputs.
It has always been one of the bugbears of operations that when an incident happens it takes a certain amount of time for that incident to be discovered (often via customers having trouble), extra time for the on-call team to review the reports and logs and finally even more time to affect a proper response that may involve the development team to provide and deploy a patch to resolve the incident. Even more problematic is that part of the above lifecycle could be run through on a false alarm of an incident that didn’t happen, wasting the time of the DevOps team who need to triage every potential problem.
“More and more often, the best way to speed organizational innovation is by enabling developers to move quickly,” said Todd Vernon, VictorOps CEO and co-founder. “When it comes to a company’s uptime, this same concept applies. Today valuable time is lost with archaic alerting systems and manual processes. The [Incident Automation] Engine now becomes a foundation to automate out these inefficiencies, gain clearer access to actionable information and continuously evaluate and improve the inputs driving on-call processes.”
VictorOps expects that the Automated Incident Engine will help reduce the total amount of noise incidents by delivering its own automated triage of any given incident trigger before it is delivered to the DevOps on-call team.
And, when the Engine sees a problem that fits all the criteria of a proper incident, it delivers documentation about similar past solutions, documentation on the systems involved and all communication involved in incident resolution. This means when something goes wrong, everyone who gets involved has a full history of what has been done to resolve it (to date) and can quickly get an idea of who needs to know what.
Key features in the VictorOps Incident Automation Engine
With this release VictorOps described five key features of the Incident Automation Engine: alert automation, alert annotations, outbound webhooks, an API and a post-mortem report generator.
The platform includes alert automation, which automatically routes specific alerts to the correct team members according to who is responsible for what and when it discovers unactionable (or false) alerts it quiets them.
The system provides alert annotations, that automatically appends specialized information to alerts including: runbooks, monitoring graphs, logs and other relevant information in order to provide remediation solutions and better visibility on the incident.
With outbound webhooks the VictorOps platform’s data can be exported into other systems and integrate VictorOps incidents into other service dashboards; while an API is available to extend VictorOps to existing legacy tools and third-party reporting systems.
Finally, there is a post-mortem report generator that automatically pulls a snapshot of all activities (alerts, conversations, and remediation actions) associated with an incident. This provides a DevOps team something to review during their next meeting or a way to get insights into what went well or what when wrong when responding to an incident.
Continued enhancements and investment in VictorOps from the industry
VictorOps continues to produce incident-related platform intelligence for DevOps teams and November last year received $10.6 million USD from The Foundry Group and Costanoa Venture Capital to enhance development of products such as the Automation Incident Engine.
Also this year VictorOps released enhancements to its incident management mobile app platform designed to ease team friction and burnout.