At a high level, to be able to detect anomalies in log events, Unomaly needs to learn how the source (system) that produces the logs behaves. A system may be an application, container image, network device or other source of log data. Anomalies are those log events that do not match the learned pattern of log events that the system normally emits and represents changes in the system’s behavior.
For the most part, systems generate large volumes of log events that look very similar but are almost never exactly the same. Unomaly’s anomaly detection works on any kind of log data, so it needs to be able to learn behavior and detect anomalies without taking the meaning of the logs into account.
Training the algorithm
As soon as Unomaly starts receiving log events from a system, Unomaly starts learning the behavior of the system. Since at this point all log events are new, meaning that they are all anomalies, Unomaly labels the system as “In training”. Unomaly analyzes the data but does not display anomalies or situations and does not generate notifications.
To learn the baseline, Unomaly builds a learnings database by analyzing the structure of the incoming log events and the variations of the contents inside that structure. The algorithm merges log event structures (profiles) that have similar, but slightly different, content with each other to build higher level profiles which represent patterns of behavior. The sum of all the profiles seen for a system is referred to as the system profile and describes how the system behaviors under normal conditions.
Learning frequency patterns
For each event, Unomaly keeps a profile that holds all the relevant information about the event that Unomaly uses for anomaly detection, including:
- Timestamps for the first and last seen occurence when Unomaly received this event.
- The number of times this event was seen on a system. Unomaly uses this number and the time between occurrences to statistically determine the frequency of the event.
- Which parameters in the event are dynamic or continuously changing. Unomaly uses these changing parameters to create summaries and to aggregate events.
- A time series data store that lets Unomaly store a history of the occurrences of each event profile. Unomaly uses this time series data to detect frequency spikes.
- Metrics that allows Unomaly to detect whether events are periodic or not. When an event is periodic, (such as, the output of a cron job), Unomaly can predict when the next occurrence of the event should happen. This learning is used to detect when events stop happening.
How long does training take?
The time needed for Unomaly to learn the system depends on:
- The amount of log events emitted by the system, more is faster.
- The diversity of log event structures, less is faster.
- The diversity of log event content with similar structures, less is faster.
During training, Unomaly checks that the system produces logs on a regular basis. If Unomaly has not detected any anomalies for 6 hours, which means 6 hours of continuously normal behavior, the system is “learned” an taken out of training.
In some cases, training completes within 24-36 hours. Systems with low log event volumes and large diversity of log structures may take up to 2 weeks for Unomaly to learn the baseline of behavior.
You can enable Unomaly to show anomalies detected during training. This is not recommended because it means that the anomalies that you see during this period may be normal events that Unomaly has seen for the first time.
After training is complete, Unomaly automatically moves the system into an “Active” state. Unomaly continues to receive and analyze the log event stream and will show any new anomalies it detects on the system and, if configured to do so, send notifications.
Unomaly detects anomalies based on the log event structure that it parses and based on the frequency changes or the stops of periodic log events.
|Never before seen||Events that are new in the entire IT environment that Unomaly is monitoring.|
|New in system||Events that are new in a system but may have occurred on other systems.|
|Parameter change||Events that match previously detected anomalies but have different parameter values.|
|System away||Events indicating that Unomaly has not received data from the system for a certain amount of time.|
|Frequency spike||Anomalies where an event is produced at a significantly greater rate than previously seen.|
|Event stop||Anomalies where a periodic log event (that is an event that was seen regularly) is no longer produced.|
A detected anomaly is not necessarily bad or a sign of brewing incident. For example, a deploy of new features will result in a set of anomalies that Unomaly will detect. These anomalies are expected to be positive changes that you set out to make. But if not, you can use Unomaly to review the detected anomalies and investigate what is going wrong.
Detecting accelerating events
The acceleration detector focuses on identifying events that are accelerating at a rate that exceeds the normal pattern significantly. Each event profile has a rolling window time series containing the number of events over time (that is, the rate). Unomaly compares the current rate to the mean historic rate to determine if an event profile is accelerating.
Detecting stopping events
The stop detector focuses on identifying event profiles that should be receiving data but isn’t. The detector only works on periodic events, which are events that always occur with the same interval (such as the normal events that make up the system profile).
Sometimes events are dropped before they reach Unomaly (such as due to a network issue). For events that have a short interval (approximately 1-10 minutes), Unomaly checks for multiple missing events before a stop is reported. For longer intervals (greater than 10 minutes), a single missing event will be reported as a stop.
Continuously update existing learnings
Unomaly continues learning after training is complete. Every log event that Unomaly analyzes updates or adds to its learning database of profiles. Anomalies are learned, just like any other log event. This means that after some time, repeated anomalies become part of the normal behavior of the system. If this is not what you want you can define a log event as a known “bad” behavior. This will influence how Unomaly handles the log event, in this case causing the algorithm to always treat these log events as anomalies. See Define knowns to highlight log events.
Raw log events and storage
Unomaly retains only a small buffer of raw incoming log events. These log events are deleted as soon as that buffer is full. For high volume log environments, we recommend turning off raw event storage. Unomaly retains the anomalies and situations its learned from the log data for as long as possible, managing its storage based on the available storage space. If space is limited, Unomaly will remove anomalies older than 1 year.
Did this article help you?
Thank you for the feedback!