AI Logic School

Empowering Students with AI & Computational Thinking

Data Science with Python

CBSE · SUB CODE 843 · CLASS 11 · UNIT 3 & 5

Data Science with Python

NumPy · Pandas · Matplotlib · Statistics — with real-world examples

NumPy Pandas Matplotlib Statistics CSV Files
WHAT IS DATA SCIENCE?

Data Science is the process of collecting, cleaning, analysing and visualising data to find useful insights and make decisions. Python is the most popular language for Data Science because of its powerful libraries.

LIBRARY 1

NumPy — Numerical Python

🌡️ DAILY LIFE EXAMPLE — Temperature Tracker

You recorded temperature for 7 days in your city: 38, 40, 37, 42, 39, 41, 36°C. NumPy helps you find average temperature, maximum, minimum in just one line of code — like a super calculator for lists of numbers!

import numpy as np

# 7 days temperature data (in Celsius)
temperature = np.array([38, 40, 37, 42, 39, 41, 36])

print("All temperatures:", temperature)
print("Average temp:   ", np.mean(temperature))      # 39.0
print("Maximum temp:   ", np.max(temperature))       # 42
print("Minimum temp:   ", np.min(temperature))       # 36
print("Std Deviation:  ", np.std(temperature))       # spread of data

# Array operations - add 2 degrees to all values
print("If +2 degrees:  ", temperature + 2)
IMPORTANT NUMPY FUNCTIONS — MUST KNOW
FunctionWhat it doesExample
np.array()Create an arraynp.array([1,2,3])
np.mean()Calculate averagenp.mean([10,20,30]) → 20
np.median()Find middle valuenp.median([1,3,5]) → 3
np.std()Standard deviationnp.std([2,4,4,4,5,5,7,9]) → 2
np.var()Variancenp.var(data)
np.arange()Create number sequencenp.arange(1,10,2) → [1,3,5,7,9]
LIBRARY 2

Pandas — Data Analysis Library

📊 DAILY LIFE EXAMPLE — Student Report Card

Think of Pandas like a super Excel sheet in Python. Your school has data of 500 students — name, marks, attendance, class. Pandas lets you load this data, filter students who scored above 80%, find the class average, and sort by marks — all with just a few lines of code!

import pandas as pd

# Create a student DataFrame (like a table)
data = {
    'Name':    ['Aarav','Priya','Rahul','Sneha','Arjun'],
    'Marks':   [85, 92, 78, 95, 88],
    'Subject': ['AI','AI','AI','AI','AI'],
    'Class':   [11, 11, 11, 11, 11]
}
df = pd.DataFrame(data)

print(df)                           # Show full table
print("\nAverage marks:", df['Marks'].mean())
print("Top scorer:", df['Name'][df['Marks'].idxmax()])

# Filter students with marks above 85
toppers = df[df['Marks'] > 85]
print("\nToppers:\n", toppers)
READING CSV FILES WITH PANDAS

CSV (Comma Separated Values) is a simple file format for storing data — like a spreadsheet saved as plain text. Example: student_data.csv contains hundreds of rows of student information.

import pandas as pd

# Load data from CSV file
df = pd.read_csv('student_data.csv')

print(df.head())         # First 5 rows
print(df.tail())         # Last 5 rows
print(df.shape)          # (rows, columns)
print(df.describe())     # Statistics summary
print(df.isnull().sum()) # Check missing values

# Save to new CSV
df.to_csv('cleaned_data.csv', index=False)
MUST-KNOW PANDAS FUNCTIONS
FunctionPurpose
df.head(n)Show first n rows (default 5)
df.describe()Show count, mean, std, min, max of all columns
df.shapeReturns (number of rows, number of columns)
df.isnull()Find missing/empty values in data
df.dropna()Remove rows with missing values
df.fillna(value)Fill missing values with a specific value
LIBRARY 3

Matplotlib — Data Visualization

📈 DAILY LIFE EXAMPLE — Class Performance Graph

Your teacher wants to show how the class performed in 5 subjects. Instead of showing a boring table of numbers, Matplotlib creates a bar graph or pie chart — making it easy to see which subject students did best in!

5 TYPES OF GRAPHS — CBSE SYLLABUS
1. LINE GRAPH
import matplotlib.pyplot as plt

months = ['Jan','Feb','Mar','Apr']
sales  = [200, 350, 300, 450]

plt.plot(months, sales, marker='o')
plt.title('Monthly Sales')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.show()
2. BAR GRAPH
import matplotlib.pyplot as plt

subjects = ['Maths','AI','English',
            'Science','Hindi']
marks    = [85, 92, 78, 88, 76]

plt.bar(subjects, marks, color='teal')
plt.title('Subject-wise Marks')
plt.ylabel('Marks')
plt.show()
3. PIE CHART
import matplotlib.pyplot as plt

labels  = ['Pass','Fail','Absent']
sizes   = [75, 15, 10]
colors  = ['green','red','gray']

plt.pie(sizes, labels=labels,
        colors=colors, autopct='%1.1f%%')
plt.title('Exam Results')
plt.show()
4. HISTOGRAM
import matplotlib.pyplot as plt

marks = [45,55,60,65,70,70,75,
         80,80,80,85,90,90,95]

plt.hist(marks, bins=5,
         color='steelblue')
plt.title('Marks Distribution')
plt.xlabel('Marks Range')
plt.ylabel('Number of Students')
plt.show()
5. SCATTER PLOT
import matplotlib.pyplot as plt

study_hours = [2,3,4,5,6,7,8]
exam_marks  = [50,55,65,70,78,85,92]

plt.scatter(study_hours, exam_marks,
            color='purple')
plt.title('Study Hours vs Marks')
plt.xlabel('Hours Studied')
plt.ylabel('Marks Scored')
plt.show()
UNIT 5

Statistics for Data Science

🎯 DAILY LIFE EXAMPLE — Cricket Match Scores

Virat Kohli scored these runs in 7 matches: 45, 82, 67, 23, 95, 56, 38. Let's calculate the key statistics to understand his performance!

import numpy as np
from scipy import stats

# Virat's runs in 7 matches
runs = np.array([45, 82, 67, 23, 95, 56, 38])

mean     = np.mean(runs)      # Average: 58.0
median   = np.median(runs)    # Middle value: 56.0
mode_val = stats.mode(runs)   # Most frequent (no repeat here)
std_dev  = np.std(runs)       # How spread out: ~23.6
variance = np.var(runs)       # Variance: ~557

print(f"Mean (Average):      {mean:.1f} runs")
print(f"Median (Middle):     {median:.1f} runs")
print(f"Standard Deviation:  {std_dev:.1f} runs")
print(f"Variance:            {variance:.1f}")

# Interpretation:
# High std deviation = inconsistent player
# Low std deviation  = consistent player
MEAN

Sum of all values ÷ count. Like sharing pizza equally among friends.

🎯
MEDIAN

Middle value when sorted. Used for salaries to avoid effect of very high earners.

🔁
MODE

Most frequent value. Like the most popular shoe size in a shop.

📏
STD DEV

How spread out data is from the mean. Low = consistent, High = spread out.

EXAM TIPS — DATA SCIENCE
  • NumPy = for numerical operations on arrays and matrices
  • Pandas = for tabular data (rows and columns) — like Excel in Python
  • Matplotlib = for creating graphs and charts
  • CSV = Comma Separated Values — most common data file format
  • df.head() shows first 5 rows | df.describe() shows statistics
  • Always check for missing values with df.isnull().sum() before analysis
AI Logic School · Data Science with Python · CBSE Class 11 · Sub Code 843 · Units 3 & 5

Comments

Chat on WhatsApp