Page 1 of 1

2. Basics in Pandas

Posted: Tue Jul 22, 2025 11:03 am
by admin
📊 Pandas for Chartered Accountants – DataFrame Essentials

Pandas is a Python library that helps accountants and finance professionals handle large datasets with ease — just like Excel, but with more automation and processing power.
For CA students, mastering Pandas can speed up analysis of trial balances, ledgers, GST data, and financial reports.

---

1️⃣ Creating a DataFrame
A DataFrame is like an Excel sheet in Python — rows and columns of labeled data.

Example – Creating from a Dictionary:

Code: Select all

import pandas as pd

# Sample accounting-style dataset
data = {
    'Account': ['Sales', 'Service Income', 'Rent', 'Salaries', 'Utilities'],
    'Category': ['Revenue', 'Revenue', 'Expense', 'Expense', 'Expense'],
    'Amount': [150000, 85000, 30000, 50000, 12000]
}
df = pd.DataFrame(data)
print(df)


Real-world CA use: Import trial balances or P&L data into a DataFrame for automated adjustments.

---

2️⃣ Viewing Data
Quickly check what’s inside your DataFrame.

Code: Select all

print(df.head())     # First 5 rows
print(df.tail())     # Last 5 rows
print(df.shape)      # Dimensions (rows, columns)
print(df.columns)    # Column names


Real-world CA use: Quickly preview top/bottom transactions in a bank statement or sales register.

---

3️⃣ Indexing and Selecting Data
Choose specific data you need.

Code: Select all

# Select one column
account_column = df['Account']
print(account_column)

# Select multiple columns
subset = df[['Account', 'Amount']]
print(subset)

# Filter based on a condition
high_value = df[df['Amount'] > 40000]
print(high_value)


Real-world CA use: Filter entries above a certain value for audit checks, e.g., expenses > Rs. 50,000.

---

4️⃣ Sorting Data
Arrange data for better analysis.

Code: Select all

# Sort by Amount ascending
df.sort_values(by='Amount', ascending=True, inplace=True)
print(df)

# Sort by Amount then Account
df.sort_values(by=['Amount', 'Account'], ascending=[True, False], inplace=True)
print(df)


Real-world CA use: Sort by amount or account name to check trends or identify anomalies.

---

5️⃣ Aggregation & Summary Statistics
Get quick stats from your data.

Code: Select all

# Mean of the Amount column
mean_amount = df['Amount'].mean()
print(mean_amount)

# Group and sum by Category
grouped_data = df.groupby('Category', as_index=False)['Amount'].sum()
print(grouped_data)


Real-world CA use: Group sales by GST rate or expenses by cost center.

---

6️⃣ Adding & Dropping Columns

Code: Select all

# Add a calculated column (e.g., GST @ 18%)
df['GST'] = df['Amount'] * 0.18
print(df)

# Drop a column
df.drop(columns=['GST'], inplace=True)
print(df)


Real-world CA use: Create computed fields like GST-inclusive values or profit margins.

---

7️⃣ Handling Missing Data

Code: Select all

# Introduce a missing value for demo (use pd.NA; avoid chained assignment issues)
df = df.copy()
df.loc[2, 'Amount'] = pd.NA

# Check missing values
print(df.isna().sum())

# Option A: Fill missing values with 0 (safe assignment, no inplace)
df_filled = df.copy()
df_filled['Amount'] = df_filled['Amount'].fillna(0)
print(df_filled)

# Option B: Drop rows with missing 'Amount' (assign back; clean index)
df_dropped = df.dropna(subset=['Amount']).reset_index(drop=True)
print(df_dropped)



Real-world CA use: Clean incomplete bank imports or missing invoice amounts before reconciliation.

---

8️⃣ Merging & Concatenating

Code: Select all

# Another dataset
df2 = pd.DataFrame({
    'Account': ['Commission', 'Interest Income'],
    'Category': ['Revenue', 'Revenue'],
    'Amount': [10000, 5000]
})

# Merge on Category
merged_df = pd.merge(df, df2, on='Category', suffixes=('_main', '_extra'))
print(merged_df)

# Concatenate vertically
combined_df = pd.concat([df, df2], axis=0)
print(combined_df)


Real-world CA use: Combine monthly GST returns or merge ledger balances with corresponding party details.

---

9️⃣ Saving & Loading Data

Code: Select all

# Save to CSV
df.to_csv('output.csv', index=False)

# Load from Excel
# (Ensure 'input.xlsx' exists in the same folder)
# df_loaded = pd.read_excel('input.xlsx')
# print(df_loaded)


Real-world CA use: Export cleaned GST data for filing or save adjusted trial balances for audit working papers.

---

✅ Key takeaway for CA students: Pandas lets you process accounting data like Excel but with automation, making it easier to handle large, repetitive tasks such as reconciliations, MIS reporting, and compliance checks.
By mastering DataFrames, you will bridge the gap between accounting knowledge and data analytics.