Hierarchical Clustering with Python Part 1: Introduction

Don't make the same mistake I made by ignoring cluster analysis. It's wildly useful for ANY professional!

Apr 21, 2026

There’s an unfortunate reality when it comes to how data science is defined in social media and most organizations:

Data science == machine learning.
Machine learning == predictive ML models.
Predictive ML models == production deployments.

Before I get a bunch of 🔥 comments and email replies, let me state something for the record.

When done well, the business value of production ML predictive models can be substantial.

However, these situations are typically the exception rather than the rule. This has been my hands-on experience with my clients and is also reported in industry data collected by TDWI, Forrester, and Gartner.

For example, the percentage of ML projects intended for production but never make it is very high.

This is unfortunate, because what often gets lost in the discussions about data science is that there are two forms of ML commonly used in business analytics:

Supervised Learning: The machine learns from labeled examples.
Unsupervised Learning: The machine learns from unlabeled examples.

Supervised Learning is how you craft ML predictive models, such as decision trees and random forests. These models learn from datasets in which each row contains an outcome of interest (i.e., the label).

For example, you work for a governmental agency and want to craft an ML model to predict claims fraud. Every row of your historical dataset needs a label indicating whether a claim was fraudulent.

Supervised Learning gets all the love in social media, but there’s a problem.

Most of the world’s data is unlabeled - including the data in your organization.

So what do you do?

You use Unsupervised Learning.

Introducing Cluster Analysis

More specifically, you use a form of Unsupervised Learning called cluster analysis. Here’s a definition from my favorite machine learning textbook:

“Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships.

The goal is that the objects within a group be similar (or related) to one another and different from (or unrelated to) the objects in other groups.

The greater the similarity (or homogeneity) within a group and the greater the difference between groups, the better or more distinct the clustering.”

Because so much real-world data is unlabeled, cluster analysis is a widely used tool in analytics for discovering structure and generating new insights.

While many forms of cluster analysis have been invented over the years, the three clustering algorithms that are most used in business analytics are:

K-means clustering
DBSCAN clustering
Hierarchical clustering

The third is the subject of this newsletter tutorial series.

If you’re serious about building analytics skills, my Cluster Analysis with Python online course will teach you how to use k-means and DBSCAN in a weekend.

Introducing Hierarchical Clustering

Based on the above definition, hierarchical clustering mines groupings from unlabeled datasets. What distinguishes hierarchical clustering is how the mined groupings are defined.

The easiest way to intuit how hierarchical clustering works is to see a typical real-world example:

The image above is a typical representation of a company - an org chart. This is an example of hierarchical clustering. Organizations worldwide cluster employees based on management hierarchies.

BTW - In machine learning terminology, the diagram above is known as a dendrogram and is commonly used to visualize hierarchical clustering results.

Hierarchical clustering can take an unlabeled dataset and mine a hierarchical structure (often referred to as a taxonomy) directly from the data.

You can then analyze the hierarchical clustering to derive new insights based on your business/processes.

For example, consider the highlighted portion of the dendrogram below:

Let’s assume you’re unfamiliar with the above organization and its people. You can use hierarchical clustering to derive insights like:

“The lower left cluster comprises observations (i.e., employees) with titles indicative of supply chain management functions.”
“The lower right cluster comprises observations with titles indicative of manufacturing functions.”
“The upper cluster appears to represent the organization’s manufacturing and supply chain division.”

While a contrived example to be sure, the above illustrates that cluster analysis is a universally applicable skill: