The Many Shapes of Data#
Data rarely shows up in the neat spreadsheet we hope for. It arrives messy, uneven, unexpected, a mix of numbers, text, images, timestamps, social connections, and whatever the world records. Before we can analyze anything, we must learn how to recognize the shape that data takes.
Engineers sometimes describe data as “raw.” That sounds intimidating, but it really means the information hasn’t been processed yet. It is potential, waiting to become evidence, insight, or sometimes, just noise.
What Counts as Data?#
If you sit in a coffee shop for ten minutes and simply observe, you’ll see data everywhere: prices on a menu, Wi-Fi signal strength, the time between notifications on your phone, the music playlist, the number of people wearing headphones, and the language of their conversations. All of it is data, even if nobody wrote it down.
In computing and data science, we usually store data as numbers, text, images, audio, or combinations of these. But the real distinction isn’t the material; it’s the structure.
What is Data? Data is raw information: facts, measurements, observations, or descriptions that can be stored, transmitted, and analyzed. In data science, data refers to representations of information that can be digitized and manipulated by computational systems. It can take many forms, including numerical, categorical, text, image, audio, temporal, or relational.
Depending on how it is organized, data may be structured, semi-structured, or unstructured.
Scientists, analysts, and businesses collect data because the patterns hidden within it help them understand behavior, make decisions, and predict what comes next.
Remember: data becomes valuable only after it is interpreted and transformed into knowledge or used to answer meaningful questions.
At its simplest, data is recorded information about the world. It might be a number, a word, a photograph, a sound, or a sensor reading. On its own, data does not explain or prove anything; it simply captures what exists or what has happened.
Data becomes useful when we organize it, analyze it, and interpret it in context. A single heart rate reading means very little; a series of readings across time can reveal stress, fitness, or disease. Data does not guarantee truth; it offers evidence.
How Do We Describe or Measure Data?#
Before we think about how data is stored or structured, it is useful to ask a simpler question: what kind of information are we dealing with? Not all data behaves the same way. Some can be counted, some can be measured with precision, and some can only be described or categorized.
Broadly, based on what the data represents, we can divide it into two classical types:
Quantitative data, which expresses numerical quantities (how much, how many, how fast)
Qualitative data, which expresses categories, labels, or descriptive attributes (what kind, which type)
Quantitative Data#
Quantitative data expresses quantities. It tells us how much, how many, or how fast. This kind of data can be measured numerically and analyzed mathematically. We can add it, average it, compare it, and track how it changes over time.
Quantitative data often comes in two forms:
Discrete: countable values (e.g., number of laptops, number of cars, number of students)
Continuous: measurable values that vary smoothly (e.g., height, weight, temperature, time)
Discrete values make jumps (from 1 to 2 to 3), while continuous values can take any value within a range.
Qualitative Data#
Qualitative data describes qualities, categories, or labels. Instead of telling us how much of something exists, it tells us what kind it is.
Examples include hair color, species, survey responses, movie genres, or customer satisfaction levels. Qualitative data can be grouped into categories, but arithmetic operations like averaging or subtracting do not make sense here.
Figure 1. Classical categorization of data types. Quantitative data may be discrete or continuous, while qualitative data describes categorical properties. Source: International Journal of Neurolinguistics & Gestalt Psychology.
Why This Matters in Data Science#
These distinctions shape how we visualize, summarize, and model data. For example:
Heights (continuous) may be plotted on a histogram
Survey answers (qualitative) may be shown as bar charts
Counts (discrete) may be modeled with Poisson or binomial distributions
Categorical labels may be encoded for machine learning (e.g., one-hot encoding)
Knowing what kind of data we have is often the first step in deciding what tools to use.