What is Pandas?#

Pandas is a powerful open-source Python library that provides high-performance, easy-to-use data structures and data analysis tools. The name “pandas” comes from “panel data,” a term used in econometrics to describe multi-dimensional data. Developed by Wes McKinney in 2008, pandas has become the cornerstone of data manipulation and analysis in the Python ecosystem.

At its core, Pandas helps you:

  • Load, clean, and transform datasets.

  • Perform statistical operations efficiently.

  • Handle missing or inconsistent data.

  • Merge, reshape, and aggregate large datasets.

If you have ever worked with spreadsheets in Excel, Pandas offers similar functionality—but with far greater power, speed, and scalability.

Why Use Pandas?#

Before pandas, data analysis in Python was cumbersome and required jumping between different libraries. Python users relied heavily on lists, dictionaries, and NumPy arrays for handling structured data. While these tools are powerful, they lack built-in functionality for common tasks like handling missing values, grouping data, or joining tables. Pandas solved this by providing:

  • Intuitive data structures: DataFrames and Series that feel familiar to users from various backgrounds, useful for for working with tabular and one-dimensional data.

  • Seamless integration: Works beautifully with other Python data science libraries (NumPy, Matplotlib, etc).

  • Powerful data manipulation: Easy filtering, grouping, and transformation of data

  • Performance: Built on top of highly optimized C code for speed.

  • Time series functionality: Excellent support for working with time-based data

  • Ease of Use: Simplifies complex operations into a few lines of readable code.

Installing Pandas#

Before using Pandas, you need to install it. If you are using Anaconda, Pandas comes pre-installed. Otherwise, you can install it with:

pip install pandas

Or if you’re using Anaconda:

conda install pandas

To confirm the installation, open a Python shell and type:

import pandas as pd print(pd.version)

#Loading Data with Pandas

One of Pandas’ biggest strengths is its ability to easily import/export datasets from multiple formats:

  • CSV: pd.read_csv(“file.csv”)

  • Excel: pd.read_excel(“file.xlsx”)

  • SQL Databases: pd.read_sql(query, connection)

  • JSON: pd.read_json(“file.json”)

#Example:
import pandas as pd

df = pd.read_csv("students.csv")
print(df.head())   # Displays first 5 rows
   Unnamed: 0 default student      balance        income
0           1      No      No   729.526495  44361.625074
1           2      No     Yes   817.180407  12106.134700
2           3      No      No  1073.549164  31767.138947
3           4      No      No   529.250605  35704.493935
4           5      No      No   785.655883  38463.495879

Core Data Structures#

The strength of Pandas lies in two core objects:

  1. Series: A one-dimensional labeled array

  2. Dataframe: A two-dimensional labeled data structure

Pandas Illustration

Series: The One-Dimensional Workhorse#

A Series is a one-dimensional labeled array that can hold any data type. Think of it as a single column in a spreadsheet.

Pandas Series

Unlike some arrays that require all elements to be the same type (homogeneous), a Series can store different types of values together, such as numbers, text, or dates. Each value has a label called an index, which can be numbers, words, or timestamps, and you can use it to quickly find or select values. Here are some examples:

How to Create a Series#

A Series can be created directly from a Python list, in which case pandas automatically assigns default numeric indexes (0, 1, 2, …) to each element.

You can also create a Series from a Python dictionary, where the dictionary keys become the index labels and the dictionary values become the Series values. In Python 3.7 and later, the order of the keys is preserved, so the Series keeps the same order as the dictionary

import pandas as pd

# Creating a Series from a list
temperatures = pd.Series([22, 25, 18, 30, 27],
                        index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
                        name='Daily_Temps')
print(temperatures)


# Creating a Series from a Dictionary

grades = {"Math": 90, "English": 85, "Science": 95}
dict_series = pd.Series(grades)

print(dict_series)
Mon    22
Tue    25
Wed    18
Thu    30
Fri    27
Name: Daily_Temps, dtype: int64
Math       90
English    85
Science    95
dtype: int64
import pandas as pd

# Homogeneous Series (all integers)
print("Homogeneous Series \n")
homo_series = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])
print(homo_series)
print(f"The data type is: {homo_series.dtype}\n")

# Heterogeneous Series (mix of int, float, string, bool)
print("Heterogeneous Series \n")
hetero_series = pd.Series([10, 20.5, 'hello', True])
print(hetero_series)
print(f"The data type is: {hetero_series.dtype}")
Homogeneous Series 

A    10
B    20
C    30
D    40
dtype: int64
The data type is: int64

Heterogeneous Series 

0       10
1     20.5
2    hello
3     True
dtype: object
The data type is: object