Home > Analytics > Anomaly Detection in R

The World of Anomalies

Detecting anomalies — unusual or unexpected events in data — is critical across industries such as BFSI, healthcare, and operations. This article explains the concept, demonstrates practical R packages and workflows (AnomalyDetection and anomalize), and shows how to interpret anomaly outputs for decision-making.

Problem

Imagine a credit-card issuer that expects a customer to spend about $25 per week. If the same customer suddenly makes a one-off purchase of $700, this deviation is an anomaly: the behaviour has altered enough to warrant attention. Anomalies like this can indicate fraud, operational issues, or real-world changes (e.g., a household change or an event), so correctly identifying and classifying them is essential.

Anomaly detection is therefore a common offering from an AI consulting company, combining analytics, visualization (Power BI, Tableau), and ML to deliver real-time monitoring and alerts.

Approach — How we detect anomalies

Anomalies are outliers relative to the expected behavior in a time series or dataset. While this article focuses on time-series methods, another way to identify outliers in a dataset is through clustering. You can read more about how to perform hierarchical clustering in R to group similar data points. Detecting anomalies robustly requires handling seasonality, trends, and noise. In this article we demonstrate:

Using the AnomalyDetection package (Seasonal Hybrid ESD) to find global and local anomalies in time series.
Using the anomalize workflow (tidyverse-friendly) to decompose series, remove seasonality and trend, and identify anomalous points.

We will illustrate with two example datasets: Wikipedia pageviews for “FIFA” and historic Bitcoin prices.

Data preparation

The first step is to prepare time-series data with a date column and a numeric measure (e.g., page views, transaction amount, price). For the Wikipedia example we use the wp_trend() function (wikipediatrend) to retrieve page-view counts; for Bitcoin we pull historic price data via the coindeskr package.

#Install the devtools package then github packages
install.packages("devtools")
install.packages("Rcpp")
library(devtools)
install_github("petermeissner/wikipediatrend")
install_github("twitter/AnomalyDetection")

#Loading the libraries
library(Rcpp)
library(wikipediatrend)
library(AnomalyDetection)

Example: download FIFA pageviews (daily) starting 2013-03-18:

#Download wikipedia webpage "fifa" 
fifa_data_wikipedia = wp_trend("fifa", from="2013-03-18", lang = "en")

Inspect and retain only the columns required for anomaly detection (date and views):

# Keep only date & page views and discard all other variables
columns_to_keep = c("date","views")
fifa_data_wikipedia = fifa_data_wikipedia[, columns_to_keep]

Example 1 — Wikipedia pageviews (FIFA)

Plot the time series to inspect spikes and seasonal patterns. The example below shows visible spikes in the observed data — candidate anomalies.

Anomaly detection - timeseries — Observed page views (FIFA) — initial visual inspection.

Apply Seasonal Hybrid ESD via AnomalyDetectionTs(). Example code:

#Apply anomaly detection and plot the results
anomalies = AnomalyDetectionTs(fifa_data_wikipedia, direction="pos", plot=TRUE)
anomalies$plot

Anomaly detection - results — AnomalyDetectionTs results — positive-direction anomalies highlighted.

The function returns both a plot and a table of anomaly timestamps and values. Example output (truncated) lists dates and anomalous values:

# Look at the anomaly dates
anomalies$anoms
   timestamp   anoms
1  2015-07-01   269
2  2015-07-02   233
...
58 2017-10-14   373

Interpretation: each anomalous date must be investigated (e.g., matches, news, page edits). If no event explains a spike, treat it as a true anomaly and escalate (fraud check, manual review, alert).

Example 2 — Bitcoin price (anomalize workflow)

The anomalize package provides a tidyverse-friendly approach: decompose the series (trend + seasonality + remainder), detect anomalies in residuals, and recompose for visualization.

#Installing anomalize
install.packages('anomalize')
#Update from github if needed
library(devtools)
install_github("business-science/anomalize")
#Load the package
library(anomalize)
library(tidyverse)
library(coindeskr)

#Get bitcoin data from 2017-01-01
bitcoin_data = get_historic_price(start = "2017-01-01")

#Convert bitcoin data to a time series
bitcoin_data_ts = bitcoin_data %>% rownames_to_column() %>% as.tibble() %>% mutate(date = as.Date(rowname)) %>% select(-one_of('rowname'))

Decompose, anomalize, and plot anomaly decomposition (STL method):

#Decompose and detect anomalies
bitcoin_data_ts %>%
  time_decompose(Price, method = "stl", frequency = "auto", trend = "auto") %>%
  anomalize(remainder, method = "gesd", alpha = 0.05, max_anoms = 0.1) %>%
  plot_anomaly_decomposition()

Anomaly decomposition plot — Decomposition: observed, season, trend, remainder — anomalies shown on remainder.

For clearer anomaly visualization after recomposition:

#Recompose and visualize anomalies
bitcoin_data_ts %>%
  time_decompose(Price) %>%
  anomalize(remainder) %>%
  time_recompose() %>%
  plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.5)

Recomposed anomalies plot — Recomposed anomalies: the expected trend is shaded; anomalous points are highlighted.

Extract the identified anomalies to review specific rows:

#Extract anomalies
anomalies = bitcoin_data_ts %>%
  time_decompose(Price) %>%
  anomalize(remainder) %>%
  time_recompose() %>%
  filter(anomaly == 'Yes')

Solution patterns & operational recommendations

From the examples above, here are common patterns and practical next steps when operationalizing anomaly detection:

Data hygiene and feature selection: ensure date/time consistency, remove irrelevant columns, and create features (hour, weekday, user segment) that improve detection quality. Handling this stage correctly may require a solution for missing data imputation.
Choose detection method to match problem:
- Use SH-ESD (AnomalyDetection) for robust detection in seasonal time series.
- Use anomalize for an end-to-end tidy workflow with decomposition + detection + visualization.
- Use multivariate or model-based approaches (Isolation Forest, Autoencoders, Mahalanobis distance, PCA-based) where anomalies depend on multiple correlated features.
Directionality & thresholds: explicitly set whether you detect positive spikes, negative dips, or both; set max_anoms to cap percent flagged.
Contextual validation: enrich anomaly records with external data (news, marketing events, maintenance windows) to reduce false positives.
Operationalize with alerts & playbooks: integrate anomaly outputs into alerting systems (email, ticketing), and create triage playbooks (automated blocking, manual review, customer confirmation).
Monitoring & feedback: track precision/recall of anomaly detection and add a feedback loop so analysts can label results and the system improves over time.

Impact — why this matters

Proper anomaly detection reduces financial losses (fraud), improves operational resilience (spotting incidents early), and supports business growth through detection of emerging customer behaviours. Anomaly detection turns raw signals into actions (alerts, blocks, follow-ups), and—when paired with investigation workflows—creates measurable business value.

Full example code

The full runnable code used in this article (installation, data download, decomposition, detection, and plotting) is provided below. Use it as a starting point and adapt parameters (alpha, max_anoms, frequency) to your dataset and business needs.

#Install the devtools package then github packages
install.packages("devtools")
install.packages("Rcpp")
library(devtools)
install_github("petermeissner/wikipediatrend")
install_github("twitter/AnomalyDetection")

#Loading the libraries
library(Rcpp)
library(wikipediatrend)
library(AnomalyDetection)

# Download wikipedia webpage "fifa"
fifa_data_wikipedia = wp_trend("fifa", from="2013-03-18", lang = "en")
#First_look
fifa_data_wikipedia

# Plotting data
library(ggplot2)
ggplot(fifa_data_wikipedia, aes(x=date, y=views, color=views)) + geom_line()

# Keep only date & page views and discard all other variables
columns_to_keep=c("date","views")
fifa_data_wikipedia=fifa_data_wikipedia[,columns_to_keep]

#Apply anomaly detection and plot the results
anomalies = AnomalyDetectionTs(fifa_data_wikipedia, direction="pos", plot=TRUE)
anomalies$plot

# Look at the anomaly dates
anomalies$anoms

#Installing anomalize
install.packages('anomalize')
#Update from github
library(devtools)
install_github("business-science/anomalize")
#Load the package
library(anomalize)
# We will also use tidyverse package for processing and coindeskr to get bitcoin data
library(tidyverse)
library(coindeskr)

#Get bitcoin data from 1st January 2017
bitcoin_data = get_historic_price(start = "2017-01-01")

#Convert bitcoin data to a time series
bitcoin_data_ts = bitcoin_data %>% rownames_to_column() %>% as.tibble() %>% mutate(date = as.Date(rowname)) %>% select(-one_of('rowname'))

#Decompose data using time_decompose() function in anomalize package. We will use stl method which extracts seasonality
bitcoin_data_ts %>%
  time_decompose(Price, method = "stl", frequency = "auto", trend = "auto") %>%
  anomalize(remainder, method = "gesd", alpha = 0.05, max_anoms = 0.1) %>%
  plot_anomaly_decomposition()

#Plot the data again by recomposing data
bitcoin_data_ts %>%
  time_decompose(Price) %>%
  anomalize(remainder) %>%
  time_recompose() %>%
  plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.5)

#Extract the anomalies
anomalies = bitcoin_data_ts %>%
  time_decompose(Price) %>%
  anomalize(remainder) %>%
  time_recompose() %>%
  filter(anomaly == 'Yes')

Anomaly Detection in R