CBSE · SUB CODE 843 · CLASS 11 · UNIT 3 & 5

Data Science with Python

NumPy · Pandas · Matplotlib · Statistics — with real-world examples

NumPy Pandas Matplotlib Statistics CSV Files

WHAT IS DATA SCIENCE?

Data Science is the process of collecting, cleaning, analysing and visualising data to find useful insights and make decisions. Python is the most popular language for Data Science because of its powerful libraries.

LIBRARY 1

NumPy — Numerical Python

🌡️ DAILY LIFE EXAMPLE — Temperature Tracker

You recorded temperature for 7 days in your city: 38, 40, 37, 42, 39, 41, 36°C. NumPy helps you find average temperature, maximum, minimum in just one line of code — like a super calculator for lists of numbers!

import numpy as np

# 7 days temperature data (in Celsius)
temperature = np.array([38, 40, 37, 42, 39, 41, 36])

print("All temperatures:", temperature)
print("Average temp:   ", np.mean(temperature))      # 39.0
print("Maximum temp:   ", np.max(temperature))       # 42
print("Minimum temp:   ", np.min(temperature))       # 36
print("Std Deviation:  ", np.std(temperature))       # spread of data

# Array operations - add 2 degrees to all values
print("If +2 degrees:  ", temperature + 2)

IMPORTANT NUMPY FUNCTIONS — MUST KNOW

Function	What it does	Example
np.array()	Create an array	np.array([1,2,3])
np.mean()	Calculate average	np.mean([10,20,30]) → 20
np.median()	Find middle value	np.median([1,3,5]) → 3
np.std()	Standard deviation	np.std([2,4,4,4,5,5,7,9]) → 2
np.var()	Variance	np.var(data)
np.arange()	Create number sequence	np.arange(1,10,2) → [1,3,5,7,9]

LIBRARY 2

Pandas — Data Analysis Library

📊 DAILY LIFE EXAMPLE — Student Report Card

Think of Pandas like a super Excel sheet in Python. Your school has data of 500 students — name, marks, attendance, class. Pandas lets you load this data, filter students who scored above 80%, find the class average, and sort by marks — all with just a few lines of code!

import pandas as pd

# Create a student DataFrame (like a table)
data = {
    'Name':    ['Aarav','Priya','Rahul','Sneha','Arjun'],
    'Marks':   [85, 92, 78, 95, 88],
    'Subject': ['AI','AI','AI','AI','AI'],
    'Class':   [11, 11, 11, 11, 11]
}
df = pd.DataFrame(data)

print(df)                           # Show full table
print("\nAverage marks:", df['Marks'].mean())
print("Top scorer:", df['Name'][df['Marks'].idxmax()])

# Filter students with marks above 85
toppers = df[df['Marks'] > 85]
print("\nToppers:\n", toppers)

READING CSV FILES WITH PANDAS

CSV (Comma Separated Values) is a simple file format for storing data — like a spreadsheet saved as plain text. Example: student_data.csv contains hundreds of rows of student information.

import pandas as pd

# Load data from CSV file
df = pd.read_csv('student_data.csv')

print(df.head())         # First 5 rows
print(df.tail())         # Last 5 rows
print(df.shape)          # (rows, columns)
print(df.describe())     # Statistics summary
print(df.isnull().sum()) # Check missing values

# Save to new CSV
df.to_csv('cleaned_data.csv', index=False)

MUST-KNOW PANDAS FUNCTIONS

Function	Purpose
df.head(n)	Show first n rows (default 5)
df.describe()	Show count, mean, std, min, max of all columns
df.shape	Returns (number of rows, number of columns)
df.isnull()	Find missing/empty values in data
df.dropna()	Remove rows with missing values
df.fillna(value)	Fill missing values with a specific value

LIBRARY 3

Matplotlib — Data Visualization

📈 DAILY LIFE EXAMPLE — Class Performance Graph

Your teacher wants to show how the class performed in 5 subjects. Instead of showing a boring table of numbers, Matplotlib creates a bar graph or pie chart — making it easy to see which subject students did best in!

5 TYPES OF GRAPHS — CBSE SYLLABUS

1. LINE GRAPH

import matplotlib.pyplot as plt

months = ['Jan','Feb','Mar','Apr']
sales  = [200, 350, 300, 450]

plt.plot(months, sales, marker='o')
plt.title('Monthly Sales')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.show()

2. BAR GRAPH

import matplotlib.pyplot as plt

subjects = ['Maths','AI','English',
            'Science','Hindi']
marks    = [85, 92, 78, 88, 76]

plt.bar(subjects, marks, color='teal')
plt.title('Subject-wise Marks')
plt.ylabel('Marks')
plt.show()

3. PIE CHART

import matplotlib.pyplot as plt

labels  = ['Pass','Fail','Absent']
sizes   = [75, 15, 10]
colors  = ['green','red','gray']

plt.pie(sizes, labels=labels,
        colors=colors, autopct='%1.1f%%')
plt.title('Exam Results')
plt.show()

4. HISTOGRAM

import matplotlib.pyplot as plt

marks = [45,55,60,65,70,70,75,
         80,80,80,85,90,90,95]

plt.hist(marks, bins=5,
         color='steelblue')
plt.title('Marks Distribution')
plt.xlabel('Marks Range')
plt.ylabel('Number of Students')
plt.show()

5. SCATTER PLOT

import matplotlib.pyplot as plt

study_hours = [2,3,4,5,6,7,8]
exam_marks  = [50,55,65,70,78,85,92]

plt.scatter(study_hours, exam_marks,
            color='purple')
plt.title('Study Hours vs Marks')
plt.xlabel('Hours Studied')
plt.ylabel('Marks Scored')
plt.show()

UNIT 5

Statistics for Data Science

🎯 DAILY LIFE EXAMPLE — Cricket Match Scores

Virat Kohli scored these runs in 7 matches: 45, 82, 67, 23, 95, 56, 38. Let's calculate the key statistics to understand his performance!

import numpy as np
from scipy import stats

# Virat's runs in 7 matches
runs = np.array([45, 82, 67, 23, 95, 56, 38])

mean     = np.mean(runs)      # Average: 58.0
median   = np.median(runs)    # Middle value: 56.0
mode_val = stats.mode(runs)   # Most frequent (no repeat here)
std_dev  = np.std(runs)       # How spread out: ~23.6
variance = np.var(runs)       # Variance: ~557

print(f"Mean (Average):      {mean:.1f} runs")
print(f"Median (Middle):     {median:.1f} runs")
print(f"Standard Deviation:  {std_dev:.1f} runs")
print(f"Variance:            {variance:.1f}")

# Interpretation:
# High std deviation = inconsistent player
# Low std deviation  = consistent player

➕

MEAN

Sum of all values ÷ count. Like sharing pizza equally among friends.

🎯

MEDIAN

Middle value when sorted. Used for salaries to avoid effect of very high earners.

🔁

MODE

Most frequent value. Like the most popular shoe size in a shop.

📏

STD DEV

How spread out data is from the mean. Low = consistent, High = spread out.

EXAM TIPS — DATA SCIENCE

NumPy = for numerical operations on arrays and matrices
Pandas = for tabular data (rows and columns) — like Excel in Python
Matplotlib = for creating graphs and charts
CSV = Comma Separated Values — most common data file format
df.head() shows first 5 rows | df.describe() shows statistics
Always check for missing values with df.isnull().sum() before analysis

AI Logic School · Data Science with Python · CBSE Class 11 · Sub Code 843 · Units 3 & 5

Continue learning

📖

AI Logic School