A Beginner’s Working Model#

Before we touch syntax, it helps to understand what Pandas adds to Python.

From Python Objects to DataFrames#

Python already has ways to represent data:

  • lists ([1, 2, 3])

  • dictionaries ({“age”: 21, “major”: “CS”})

  • lists of dictionaries (a common “JSON-like” pattern)

The problem is that none of these act like a table. For example, consider a list of student records:

students = [
    {"name": "Alice", "year": 2025, "major": "CS"},
    {"name": "Bob",   "year": 2024, "major": "Math"},
]

Python can iterate through this, but asking simple questions (“which majors?”, “how many in each year?”, “filter by year”) requires loops, conditionals, and manual bookkeeping.

Pandas provides a format that behaves like a table:

import pandas as pd
df = pd.DataFrame(students)

Now the same questions become natural:

df["major"].value_counts()
df[df["year"] == 2025]

This illustrates the core Pandas philosophy:

Take common data-analysis questions and make them one line instead of a small program.

Core Data Structures#

It’s tempting to think Pandas has only one object — the DataFrame — because it dominates most workflows. But understanding the smaller building block helps.

The strength of Pandas lies in two core objects:

  1. Series: is a single column (one-dimensional labeled array)

  2. Dataframe: a table of columns (two-dimensional labeled data structure)

People often describe the relationship like this:

A DataFrame is a collection of Series that share the same index.

That index part will matter later.

Pandas Illustration

You won’t use Series much explicitly at first, but understanding it later pays off (especially for groupby, apply, or time series).

Series: The One-Dimensional Workhorse#

A Series is a one-dimensional labeled array that can hold any data type. Think of it as a single column in a spreadsheet.

Pandas Series

Unlike some arrays that require all elements to be the same type (homogeneous), a Series can store different types of values together, such as numbers, text, or dates. Each value has a label called an index, which can be numbers, words, or timestamps, and you can use it to quickly find or select values. Here are some examples:

How to Create a Series#

A Series can be created directly from a Python list, in which case pandas automatically assigns default numeric indexes (0, 1, 2, …) to each element.

You can also create a Series from a Python dictionary, where the dictionary keys become the index labels and the dictionary values become the Series values. In Python 3.7 and later, the order of the keys is preserved, so the Series keeps the same order as the dictionary

import pandas as pd

# Creating a Series from a list
temperatures = pd.Series([22, 25, 18, 30, 27],
                        index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
                        name='Daily_Temps')
print(temperatures)


# Creating a Series from a Dictionary

grades = {"Math": 90, "English": 85, "Science": 95}
dict_series = pd.Series(grades)

print(dict_series)
Mon    22
Tue    25
Wed    18
Thu    30
Fri    27
Name: Daily_Temps, dtype: int64
Math       90
English    85
Science    95
dtype: int64
import pandas as pd

# Homogeneous Series (all integers)
print("Homogeneous Series \n")
homo_series = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])
print(homo_series)
print(f"The data type is: {homo_series.dtype}\n")

# Heterogeneous Series (mix of int, float, string, bool)
print("Heterogeneous Series \n")
hetero_series = pd.Series([10, 20.5, 'hello', True])
print(hetero_series)
print(f"The data type is: {hetero_series.dtype}")
Homogeneous Series 

A    10
B    20
C    30
D    40
dtype: int64
The data type is: int64

Heterogeneous Series 

0       10
1     20.5
2    hello
3     True
dtype: object
The data type is: object

DataFrame: The Two-Dimensional Powerhouse#

A DataFrame is a two-dimensional labeled data structure, similar to a table with rows and columns. It is the most commonly used object in Pandas.

Pandas DF1
Pandas DF2

Every DataFrame has an index on the left side. In CSVs it often starts at 0 by default:

      name   year   major
0    Alice   2025    CS
1      Bob   2024  Math

Spreadsheets also have row numbers, but Pandas treats the index as part of the data structure itself. That gives us speed, alignment, joins, slicing, and time-based operations.

Here are some examples:

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'London', 'Paris', 'Tokyo']
}

df = pd.DataFrame(data)
print(df)
      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris
3    Diana   28     Tokyo
import pandas as pd

# Create a simple dataset
data = {
    'Product': ['Apple', 'Banana', 'Cherry', 'Date'],
    'Price': [1.20, 0.50, 3.00, 2.50],
    'Stock': [45, 120, 15, 80]
}

# Create DataFrame
df = pd.DataFrame(data)

# Display basic information
print("Our DataFrame:")
print(df)
print("\nData types:")
print(df.dtypes)
print("\nBasic statistics:")
print(df.describe())
Our DataFrame:
  Product  Price  Stock
0   Apple    1.2     45
1  Banana    0.5    120
2  Cherry    3.0     15
3    Date    2.5     80

Data types:
Product        str
Price      float64
Stock        int64
dtype: object

Basic statistics:
         Price       Stock
count  4.00000    4.000000
mean   1.80000   65.000000
std    1.15181   45.276926
min    0.50000   15.000000
25%    1.02500   37.500000
50%    1.85000   62.500000
75%    2.62500   90.000000
max    3.00000  120.000000