What is Pandas?#
Pandas is a powerful open-source Python library that provides high-performance, easy-to-use data structures and data analysis tools. The name “pandas” comes from “panel data,” a term used in econometrics to describe multi-dimensional data. Developed by Wes McKinney in 2008, pandas has become the cornerstone of data manipulation and analysis in the Python ecosystem.
At its core, Pandas helps you:
Load, clean, and transform datasets.
Perform statistical operations efficiently.
Handle missing or inconsistent data.
Merge, reshape, and aggregate large datasets.
If you have ever worked with spreadsheets in Excel, Pandas offers similar functionality—but with far greater power, speed, and scalability.
Why Use Pandas?#
Before pandas, data analysis in Python was cumbersome and required jumping between different libraries. Python users relied heavily on lists, dictionaries, and NumPy arrays for handling structured data. While these tools are powerful, they lack built-in functionality for common tasks like handling missing values, grouping data, or joining tables. Pandas solved this by providing:
Intuitive data structures: DataFrames and Series that feel familiar to users from various backgrounds, useful for for working with tabular and one-dimensional data.
Seamless integration: Works beautifully with other Python data science libraries (NumPy, Matplotlib, etc).
Powerful data manipulation: Easy filtering, grouping, and transformation of data
Performance: Built on top of highly optimized C code for speed.
Time series functionality: Excellent support for working with time-based data
Ease of Use: Simplifies complex operations into a few lines of readable code.
Installing Pandas#
Before using Pandas, you need to install it. If you are using Anaconda, Pandas comes pre-installed. Otherwise, you can install it with:
pip install pandas
Or if you’re using Anaconda:
conda install pandas
To confirm the installation, open a Python shell and type:
import pandas as pd print(pd.version)
#Loading Data with Pandas
One of Pandas’ biggest strengths is its ability to easily import/export datasets from multiple formats:
CSV: pd.read_csv(“file.csv”)
Excel: pd.read_excel(“file.xlsx”)
SQL Databases: pd.read_sql(query, connection)
JSON: pd.read_json(“file.json”)
#Example:
import pandas as pd
df = pd.read_csv("students.csv")
print(df.head()) # Displays first 5 rows
Unnamed: 0 default student balance income
0 1 No No 729.526495 44361.625074
1 2 No Yes 817.180407 12106.134700
2 3 No No 1073.549164 31767.138947
3 4 No No 529.250605 35704.493935
4 5 No No 785.655883 38463.495879
Core Data Structures#
The strength of Pandas lies in two core objects:
Series: A one-dimensional labeled array
Dataframe: A two-dimensional labeled data structure

Series: The One-Dimensional Workhorse#
A Series is a one-dimensional labeled array that can hold any data type. Think of it as a single column in a spreadsheet.
Unlike some arrays that require all elements to be the same type (homogeneous), a Series can store different types of values together, such as numbers, text, or dates. Each value has a label called an index, which can be numbers, words, or timestamps, and you can use it to quickly find or select values. Here are some examples:
How to Create a Series#
A Series can be created directly from a Python list, in which case pandas automatically assigns default numeric indexes (0, 1, 2, …) to each element.
You can also create a Series from a Python dictionary, where the dictionary keys become the index labels and the dictionary values become the Series values. In Python 3.7 and later, the order of the keys is preserved, so the Series keeps the same order as the dictionary
import pandas as pd
# Creating a Series from a list
temperatures = pd.Series([22, 25, 18, 30, 27],
index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
name='Daily_Temps')
print(temperatures)
# Creating a Series from a Dictionary
grades = {"Math": 90, "English": 85, "Science": 95}
dict_series = pd.Series(grades)
print(dict_series)
Mon 22
Tue 25
Wed 18
Thu 30
Fri 27
Name: Daily_Temps, dtype: int64
Math 90
English 85
Science 95
dtype: int64
import pandas as pd
# Homogeneous Series (all integers)
print("Homogeneous Series \n")
homo_series = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])
print(homo_series)
print(f"The data type is: {homo_series.dtype}\n")
# Heterogeneous Series (mix of int, float, string, bool)
print("Heterogeneous Series \n")
hetero_series = pd.Series([10, 20.5, 'hello', True])
print(hetero_series)
print(f"The data type is: {hetero_series.dtype}")
Homogeneous Series
A 10
B 20
C 30
D 40
dtype: int64
The data type is: int64
Heterogeneous Series
0 10
1 20.5
2 hello
3 True
dtype: object
The data type is: object