DataFrame: The Two-Dimensional Powerhouse

DataFrame: The Two-Dimensional Powerhouse#

A DataFrame is a two-dimensional labeled data structure, similar to a table with rows and columns. It is the most commonly used object in Pandas.

Here are some examples:

import pandas as pd
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'London', 'Paris', 'Tokyo']
}

df = pd.DataFrame(data)
print(df)

      Name  Age      City
  Alice   25  New York
    Bob   30    London
Charlie   35     Paris
  Diana   28     Tokyo

# Create a simple dataset
data = {
    'Product': ['Apple', 'Banana', 'Cherry', 'Date'],
    'Price': [1.20, 0.50, 3.00, 2.50],
    'Stock': [45, 120, 15, 80]
}

# Create DataFrame
df = pd.DataFrame(data)

# Display basic information
print("Our DataFrame:")
print(df)
print("\nData types:")
print(df.dtypes)
print("\nBasic statistics:")
print(df.describe())

Our DataFrame:
  Product  Price  Stock
0   Apple    1.2     45
1  Banana    0.5    120
2  Cherry    3.0     15
3    Date    2.5     80

Data types:
Product     object
Price      float64
Stock        int64
dtype: object

Basic statistics:
         Price       Stock
count  4.00000    4.000000
mean   1.80000   65.000000
std    1.15181   45.276926
min    0.50000   15.000000
25%    1.02500   37.500000
50%    1.85000   62.500000
75%    2.62500   90.000000
max    3.00000  120.000000

Pandas Basic Operations#

After reading tabular data as a DataFrame, you would need to have a glimpse of the data. Pandas makes it easy to explore and manipulate. A good first step is to inspect the dataset by previewing how many rows and columns it has, what the column names are, checking dimensions, or reviewing summary information such as data types and statistics. Pandas provides convenient methods for this.

##Viewing/Exploring Data

Command	Description	Default Behavior
`df.head()`	Displays the first rows of the DataFrame. Useful for quickly previewing the dataset.	Shows 5 rows
`df.tail()`	Displays the last rows of the DataFrame. Handy for checking the dataset’s ending records.	Shows 5 rows
`df.shape`	Returns a tuple `(rows, columns)` representing the dimensions of the DataFrame.	N/A
`df.info()`	Shows column names, data types, memory usage, and count of non-null values.	N/A
`df.dtypes`	Returns the data type of each column in the DataFrame.	N/A
`df.describe()`	Provides summary statistics (mean, std, min, max, quartiles) for numeric columns.	Includes numeric columns by default

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [24, 30, 28],
    'Salary': [50000, 60000, 55000]
}
df = pd.DataFrame(data)

print("head")
print(df.head())
print("shape")
print(df.shape)
print("info")
print(df.info)

head
      Name  Age  Salary
0    Alice   24   50000
1      Bob   30   60000
2  Charlie   28   55000
shape
(3, 3)
info
<bound method DataFrame.info of       Name  Age  Salary
0    Alice   24   50000
1      Bob   30   60000
2  Charlie   28   55000>

Selecting and Indexing Data#

After inspecting the structure of a DataFrame, the next step is often to select specific rows and columns (specific parts of the data). Pandas provides several approaches depending on whether you want to select columns, rows, or filter data based on conditions. It lets you choose columns, rows, or both using labels (loc), integer positions (iloc), or conditions.

1. Selecting Columns#

Command	Description
`df['col']`	Selects a single column as a Series.
`df[['col1','col2']]`	Selects multiple columns as a new DataFrame.

2. Selecting Rows#

Command	Description
`df.loc[row_label]`	Select row(s) by label (index name).
`df.iloc[row_index]`	Select row(s) by integer position.
`df.loc[0, 'col']`	Select a specific value by row & column.

3. Conditional Selection (Filtering)#

Command	Description
`df[df['col'] > 50]`	Returns rows where condition is True.
`df[(df['A'] > 50) & (df['B'] < 100)]`	Combine conditions with `&` (and), `	` (or).

Follow is an example:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [24, 30, 28],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

# Now, to access the columns: Select columns
df['Age']
df[['Name', 'City']]

# Select rows
df.iloc[0]        # First row
df.iloc[1:3]      # Rows 1–2
df.loc[0, 'Name'] # Specific cell

# Conditional selection
df[df['Age'] > 30]
df[(df['Age'] > 30) & (df['City'] == 'Chicago')]

      Name  Age         City
  Alice   24     New York
    Bob   30  Los Angeles
Charlie   28      Chicago

	Name	Age	City