How Unomaly detects anomalies
At a high level, to be able to detect anomalies in log events, Unomaly needs to learn how the source (system) that produces the logs behaves. A system may be an application, container image, network device or other source of log data. Anomalies are those log events that do not match the learned pattern of log events that the system normally emits and represents changes in the system’s behavior.
For the most part, systems generate large volumes of log events that look very similar but are almost never exactly the same. Unomaly’s anomaly detection works on any kind of log data, so it needs to be able to learn behavior and detect anomalies without taking the meaning of the logs into account.
As soon as Unomaly starts receiving log events from a system, Unomaly starts learning the behavior of the system. Since at this point all log events are new, meaning that they are all anomalies, Unomaly labels the system as “In training”. Unomaly analyzes the data but does not display anomalies or situations and does not generate notifications.
To learn the baseline, Unomaly builds a learnings database by analyzing the structure of the incoming log events and the variations of the contents inside that structure. The algorithm merges log event structures (profiles) that have similar, but slightly different, content with each other to build higher level profiles which represent patterns of behavior. The sum of all the profiles seen for a system is referred to as the system profile and describes how the system behaviors under normal conditions.
For each event, Unomaly keeps a profile that holds all the relevant information about the event that Unomaly uses for anomaly detection, including:
- Timestamps for the first and last seen occurence when Unomaly received this event.
- The number of times this event was seen on a system. Unomaly uses this number and the time between occurrences to statistically determine the frequency of the event.
- Which parameters in the event are dynamic or continuously changing. Unomaly uses these changing parameters to create summaries and to aggregate events.
- A time series data store that lets Unomaly store a history of the occurrences of each event profile. Unomaly uses this time series data to detect frequency spikes.
- Metrics that allows Unomaly to detect whether events are periodic or not. When an event is periodic, (such as, the output of a cron job), Unomaly can predict when the next occurrence of the event should happen. This learning is used to detect when events stop happening.
The time needed for Unomaly to learn the system depends on:
- The amount of log events emitted by the system, more is faster.
- The diversity of log event structures, less is faster.
- The diversity of log event content with similar structures, less is faster.
During training, Unomaly checks that the system produces logs on a regular basis. If Unomaly has not detected any anomalies for 6 hours, which means 6 hours of continuously normal behavior, the system is “learned” an taken out of training.
In some cases, training completes within 24-36 hours. Systems with low log event volumes and large diversity of log structures may take up to 2 weeks for Unomaly to learn the baseline of behavior.
You can enable Unomaly to show anomalies detected during training. This is not recommended because it means that the anomalies that you see during this period may be normal events that Unomaly has seen for the first time.
After training is complete, Unomaly automatically moves the system into an “Active” state. Unomaly continues to receive and analyze the log event stream and will show any new anomalies it detects on the system and, if configured to do so, send notifications.
|Never before seen||Events that are new in the entire IT environment that Unomaly is monitoring.|
|New in system||Events that are new in a system but may have occurred in other systems.|
|Parameter change||Events that match previously detected anomalies but have different parameter values.|
|System away||Events indicating that Unomaly has not received data from the system for a certain amount of time.|
|Frequency spike||Anomalies where an event is produced at a significantly greater rate than previously seen.|
|Event stop||Anomalies where a periodic log event (that is an event that was seen regularly) is no longer produced.|
A detected anomaly is not necessarily bad or a sign of brewing incident. For example, a deploy of new features will result in a set of anomalies that Unomaly will detect. These anomalies are expected to be positive changes that you set out to make. But if not, you can use Unomaly to review the detected anomalies and investigate what is going wrong.