The cardinality of a Set refers to the number of elements it contains.

In math notation, we represent the cardinality of a set SS as S|S|. For example, the set S={1,2,3}S = \{1, 2, 3\} has a cardinality of 3, expressed as S=3|S| = 3.

In machine learning, the "cardinality of a feature" denotes the number of unique elements or categories within that feature. High-cardinality features may require feature engineering or be excluded entirely (for example, user_id).

Use of the vertical bar |A| notation

Initially it seemed confusing to me that mathematical notation employs the vertical bar symbol for different purposes.

For instance:

  • The absolute value of a number aa is expressed as a|a|.
  • The Matrix Determinate of a matrix M\mathbf{M} is expressed as M|\mathbf{M}|:

    M=[ABCD],M=ADBC \begin{aligned} \mathbf{M} = \begin{bmatrix} A & B \\ C & D \end{bmatrix} ,\quad |\mathbf{M}| = AD - BC \end{aligned}

However, these notations share a common theme of representing size or magnitude:

  • In set theory, cardinality describes the size of a set by the number of elements it contains.
  • For numbers, the absolute value captures the distance from zero on the number line.
  • In linear algebra, the Matrix Determinate determinant describes how much a Matrix Transformation scales space.