Apr 21

Don't make the same mistake I made by ignoring cluster analysis. It's wildly useful for ANY professional!

2 Comments

As it is computationally expensive, what sort of solution is there for massive datasets? Would / could you use a partitioning system, so to speak, in a hybrid fashion?

Unfortunately, the answer depends on what you mean by “massive,” as that often varies widely by organization and/or use case.

For the sake of argument, I will assume “massive” means so large that it’s stored in a distributed processing environment like Databricks or Snowflake.

These “big data” platforms provide distributed algorithms to work across the cluster at scale.

Reply

Share

The DIY Data Scientist

Hierarchical Clustering with Python Part 1…