2. Basics in Pandas
Posted: Tue Jul 22, 2025 11:03 am
Pandas is a Python library that helps accountants and finance professionals handle large datasets with ease — just like Excel, but with more automation and processing power.
For CA students, mastering Pandas can speed up analysis of trial balances, ledgers, GST data, and financial reports.
---
A DataFrame is like an Excel sheet in Python — rows and columns of labeled data.
Example – Creating from a Dictionary:
Code: Select all
import pandas as pd
# Sample accounting-style dataset
data = {
'Account': ['Sales', 'Service Income', 'Rent', 'Salaries', 'Utilities'],
'Category': ['Revenue', 'Revenue', 'Expense', 'Expense', 'Expense'],
'Amount': [150000, 85000, 30000, 50000, 12000]
}
df = pd.DataFrame(data)
print(df)
Real-world CA use: Import trial balances or P&L data into a DataFrame for automated adjustments.
---
Quickly check what’s inside your DataFrame.
Code: Select all
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
print(df.shape) # Dimensions (rows, columns)
print(df.columns) # Column names
Real-world CA use: Quickly preview top/bottom transactions in a bank statement or sales register.
---
Choose specific data you need.
Code: Select all
# Select one column
account_column = df['Account']
print(account_column)
# Select multiple columns
subset = df[['Account', 'Amount']]
print(subset)
# Filter based on a condition
high_value = df[df['Amount'] > 40000]
print(high_value)
Real-world CA use: Filter entries above a certain value for audit checks, e.g., expenses > Rs. 50,000.
---
Arrange data for better analysis.
Code: Select all
# Sort by Amount ascending
df.sort_values(by='Amount', ascending=True, inplace=True)
print(df)
# Sort by Amount then Account
df.sort_values(by=['Amount', 'Account'], ascending=[True, False], inplace=True)
print(df)
Real-world CA use: Sort by amount or account name to check trends or identify anomalies.
---
Get quick stats from your data.
Code: Select all
# Mean of the Amount column
mean_amount = df['Amount'].mean()
print(mean_amount)
# Group and sum by Category
grouped_data = df.groupby('Category', as_index=False)['Amount'].sum()
print(grouped_data)
Real-world CA use: Group sales by GST rate or expenses by cost center.
---
Code: Select all
# Add a calculated column (e.g., GST @ 18%)
df['GST'] = df['Amount'] * 0.18
print(df)
# Drop a column
df.drop(columns=['GST'], inplace=True)
print(df)
Real-world CA use: Create computed fields like GST-inclusive values or profit margins.
---
Code: Select all
# Introduce a missing value for demo (use pd.NA; avoid chained assignment issues)
df = df.copy()
df.loc[2, 'Amount'] = pd.NA
# Check missing values
print(df.isna().sum())
# Option A: Fill missing values with 0 (safe assignment, no inplace)
df_filled = df.copy()
df_filled['Amount'] = df_filled['Amount'].fillna(0)
print(df_filled)
# Option B: Drop rows with missing 'Amount' (assign back; clean index)
df_dropped = df.dropna(subset=['Amount']).reset_index(drop=True)
print(df_dropped)
Real-world CA use: Clean incomplete bank imports or missing invoice amounts before reconciliation.
---
Code: Select all
# Another dataset
df2 = pd.DataFrame({
'Account': ['Commission', 'Interest Income'],
'Category': ['Revenue', 'Revenue'],
'Amount': [10000, 5000]
})
# Merge on Category
merged_df = pd.merge(df, df2, on='Category', suffixes=('_main', '_extra'))
print(merged_df)
# Concatenate vertically
combined_df = pd.concat([df, df2], axis=0)
print(combined_df)
Real-world CA use: Combine monthly GST returns or merge ledger balances with corresponding party details.
---
Code: Select all
# Save to CSV
df.to_csv('output.csv', index=False)
# Load from Excel
# (Ensure 'input.xlsx' exists in the same folder)
# df_loaded = pd.read_excel('input.xlsx')
# print(df_loaded)
Real-world CA use: Export cleaned GST data for filing or save adjusted trial balances for audit working papers.
---
By mastering DataFrames, you will bridge the gap between accounting knowledge and data analytics.