TimeSeries: Anomaly detection in a time series
In this example I am going to explain how to detect a type of anomaly in a time series. The time series is composed by a slow varying background signal with gaussian noise on top of which we simulate a anomaly (aka feature) defined as a set of continuous values above the average.
For a Gaussian distribution the probability of a value falling out of 3σ is 0.3%. That means that the probability of three consecutive values above the 3σ limit is 0.0033=2.7−8. This is extemely rare. We can use this reasoning to lower our detection threshold so that for instance the probability of a value above 2σ is 5%. This means that out of 20 times, 1 will be above the threshold. However the probability of three consecutive values above 2σ is 0.053=0.000125
i.e. about 1 in 10,000.
In this example we are then going to detect features that occur when at least three consecutive values are above a 2σ
threshold, discarding non connected values.
Our 1D signal has a slow varying sinusoidal component representing a count of events over time plus an anomaly defined as a set of continuous bins with a count higher than the average. Additionally we add some abnormal counts to isolated bins and some random noise.
Below is the signal with all its components.
Time Series Background removal
The first step in the analysis is to remove the slow varying background. We do this using a combination of a median and a linear filter over scales larger than the features that we want to keep. The figure below shows the original signal with the fitted background (top panel) and the signal after the background has been removed (bottom panel). The red dashed line represents the robust 2σ
Time Series Filtered Morphological operations
As it is clear from the figure above if we selected all features above 2σ we would detect a lot of anomalies. The trick here is to use a couple of morphology operations to select only the features we are interested in. First we create a binary mask with all values larger than 2σ
set to True and False everywhere else. Then we define a binary structure of 3 elements, all set to True and use it for the first binary operation called erosion. This will set to True all pixels that are True and surrounded by True. In other words, erosion will remove isolated detections. After this step we perform the inverse operation, dilation, to restore the detection to its original state.
The figure below shows the process and result.
Erosion and dilation
For more information on these two operations:
Erosion is a mathematical morphology operation that uses a structuring element for shrinking the shapes in an image. The binary erosion of an image by a structuring element is the locus of the points where a superimposition of the structuring element centered on the point is entirely contained in the set of non-zero elements of the image.
Dilation is a mathematical morphology operation that uses a structuring element for expanding the shapes in an image. The binary dilation of an image by a structuring element is the locus of the points covered by the structuring element, when its center lies within the non-zero points of the image.
As for pseudo-code in Python:
# Create the signal signal = create_signal() # Substract the backgroud filtered_signal = substract_background(signal) # Calculate MAD mad = astropy.stats.median_absolute_deviation(filtered_signal) # Apply erosion and dilation to keep only # important features sT = (filtered_signal >= 2*mad) neighborhood = sn.generate_binary_structure(1, 1) sTe = sn.binary_erosion(sT, structure=neighborhood, border_value=0) sTd = sn.binary_dilation(sTe, structure=neighborhood, border_value=0)
The resulting binary mask contains all elements of the feature set to True. As a final extra we can use the code below to only produce one detection per feature by only flagging the peak:
sF = sn.filters.gaussian_filter(sTd*s, 1) neighborhood = np.ones(3) local_max = sn.maximum_filter(sF, footprint=neighborhood)==sF background = (sF==0) eroded_background = sn.binary_erosion(background, structure=neighborhood, border_value=1) detected_peaks = local_max ^ eroded_background
The result is shown in the figure below:
Note that while this example deals with 1D data in the shape of a time series, the same method can be applied to an image in 2D.