Cardinality Estimation: Difference between revisions

From Algorithm Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 38: Line 38:
[[File:Cardinality Estimation - Space.png|1000px]]
[[File:Cardinality Estimation - Space.png|1000px]]


== Pareto Frontier Improvements Graph ==  
== Space-Time Tradeoff Improvements ==  


[[File:Cardinality Estimation - Pareto Frontier.png|1000px]]
[[File:Cardinality Estimation - Pareto Frontier.png|1000px]]

Revision as of 15:36, 15 February 2023

Description

Given a multiset of (possibly hashed) values, estimate the number of distinct elements of the multiset. Of interest is minimizing storage usage.

Parameters

N: number of values in multiset

n: cardinality of multiset (not known)

Table of Algorithms

Name Year Time Space Approximation Factor Model Reference
Naive solution 1940 $O(N)$ $O(n)$ Exact Deterministic
Flajolet–Martin algorithm 1984 $O(N)$ $O(log n)$ Randomized Time & Space
LogLog algorithm 2003 $O(N)$ $O(log(log(n)$)) Randomized Time & Space
HyperLogLog algorithm 2007 $O(N)$ $O(eps^{-2}*log(log(n)$))+log(n)) Randomized Time & Space
HyperLogLog++ 2014 $O(N)$ $O(eps^{-2}*log(log(n)$))+log(n)) Randomized Time

Time Complexity Graph

Cardinality Estimation - Time.png

Space Complexity Graph

Cardinality Estimation - Space.png

Space-Time Tradeoff Improvements

Cardinality Estimation - Pareto Frontier.png