Types of data and data types#
Please note that this page appears in both the chapters “Statistics — Basics & location” and “Pandas Basics”. This is because the topic is relevant to both the general understanding of data and the specific implementation in Pandas.
Learning objectives#
After working through this topic, you should be able to:
determine the correct description of a given data column
nominal
cardinal
ordinal
explain how these categories map into Pandas data types:
unordered categorical
floating point
integer
ordered categorical
Materials#
Video with English subtitles:
Download the slides.
Video with German subtitles:
(turn subtitles on in the bottom right corner of the video)
Decision tree#
This is the decision tree for the “correct” form of data. Just because a variable arrives as a number, it does not mean that you should think about it as numerical data. Very often this happens when categories are encoded as numbers (e.g., 0, 1, 2 meaning \([0, 30,000)\), \([30,000, 60,000)\), \([60,000, \infty)\), which would be described in some metadata).
flowchart LR
A -->|"`Yes`"| C("`Numerical?`")
A(Ordered?) -->|No| B("`Unordered
Categorical`")
C -->|"`Yes`"| E("`Continuous?`")
C -->|"`No`"| D("`Ordered
Categorical`")
E -->|"`Yes`"| G("`Floating
point`")
E -->|"`No`"| F("`Integer`")
style A fill:#FFE5B4
style C fill:#FFE5B4
style E fill:#FFE5B4
Note that for numerical data, you still have to decide whether a variable is measured on a cardinal or ordinal scale. Both are possible for continuous or discrete data, it is not embedded in the Pandas data type.