2 Comments
User's avatar
Dan Mccoy's avatar

As it is computationally expensive, what sort of solution is there for massive datasets? Would / could you use a partitioning system, so to speak, in a hybrid fashion?

David Langer's avatar

Unfortunately, the answer depends on what you mean by “massive,” as that often varies widely by organization and/or use case.

For the sake of argument, I will assume “massive” means so large that it’s stored in a distributed processing environment like Databricks or Snowflake.

These “big data” platforms provide distributed algorithms to work across the cluster at scale.