LearnwithVishnu
Basics to Production to Architect
← Home
📊MIS & Data Analytics
BeginnerAnalystAdvanced
Excel · SQL · Python (Pandas) · Power BI · Airflow · Reporting Automation
OverviewExcelSQLPythonPower BIAirflowRoadmapInterview Q&As

📊 What is MIS and Why It Matters

MIS = Management Information System. It is the process of collecting, processing, and presenting data so managers can make better decisions. In practice it means: someone asks "what were our sales last month by region?" and you answer with a table, chart or dashboard — not a guess.

Without MISWith MIS
Managers guess based on intuitionDecisions backed by accurate data
Reports take hours to prepare manuallyAutomated reports delivered every morning
Data sits in silos across departmentsOne source of truth everyone accesses
Errors in manual Excel copy-pastePython scripts eliminate human error
Old data by the time report is readyLive dashboards updated in real time

The MIS toolkit — what each tool does and when you use it

ToolWhat it doesWhen to useReplaces
ExcelCalculations, charts, pivot tables, quick reportsOne-off analysis, small datasets (<100K rows), sharing with non-technical staffPaper reports, calculators
SQLQuery databases directly — filter, join, aggregate millions of rowsWhen data lives in a database and you need specific slices fastWaiting for IT to export data
Python (Pandas)Automate repetitive Excel/data tasks, process large files, send reportsSame task runs daily/weekly, large files slow Excel down, need automationManual Excel work, VBA macros
Power BILive interactive dashboards shared across the organisationManagement needs self-service reporting, data changes frequentlyStatic PowerPoint charts, emailed Excel files
AirflowSchedule and orchestrate data pipelines — run SQL + Python automaticallyPipelines that must run on a schedule without human interventionWindows Task Scheduler, manual running of scripts

📊 Excel — Pivot Tables, VLOOKUP, Advanced Formulas, Macros

Pivot Tables — the most important Excel skill

A pivot table summarises thousands of rows into a meaningful table in seconds. Insert → PivotTable. Drag fields: Rows (what you want to group by), Values (what you want to sum/count), Filters (what you want to narrow down), Columns (optional second grouping).

TaskRowsValuesFilter
Sales by regionRegionSum of SalesYear=2025
Count of patients by diagnosisDiagnosisCount of PatientIDMonth=Jan
Average salary by departmentDepartmentAverage of Salary

VLOOKUP vs HLOOKUP vs INDEX/MATCH

FunctionSyntaxWhenLimitation
VLOOKUP=VLOOKUP(lookup_value, table_array, col_index, FALSE)Look up a value in the leftmost column of a tableCan only look left-to-right; breaks if columns are inserted
HLOOKUP=HLOOKUP(lookup_value, table_array, row_index, FALSE)Look up across the top row of a table (horizontal)Less common; rarely used today
INDEX/MATCH=INDEX(return_range, MATCH(lookup_value, lookup_range, 0))Any direction lookup, more flexible than VLOOKUPSlightly more complex syntax but worth learning
XLOOKUP=XLOOKUP(lookup_value, lookup_array, return_array)Modern replacement for VLOOKUP — Excel 365 onlyNot available in older Excel versions

Essential formulas every MIS analyst must know

FormulaWhat it doesExample
SUMIFSum with one condition=SUMIF(A:A,"North",B:B) — sum column B where column A = "North"
SUMIFSSum with multiple conditions=SUMIFS(Sales,Region,"North",Month,"Jan")
COUNTIF/COUNTIFSCount with conditions=COUNTIF(Status,"Completed")
IFERRORCatch formula errors gracefully=IFERROR(VLOOKUP(A2,Sheet2!A:B,2,0),"Not Found")
TEXTFormat numbers/dates as text=TEXT(A2,"DD-MMM-YYYY") → "15-Jan-2025"
LEFT/RIGHT/MIDExtract text=LEFT(A2,3) → first 3 characters
TRIM/CLEANRemove extra spaces/characters=TRIM(A2) removes leading/trailing spaces
UNIQUE/FILTERDynamic array functions (Excel 365)=UNIQUE(A:A) — deduplicated list automatically

Macros and VBA — when to use them

Macros record your clicks and replay them. Use for: formatting reports the same way every time, copying data between sheets on a schedule, sending email alerts. Record with: Developer tab → Record Macro. Edit in VBA: Alt+F11. Simple VBA to loop through rows and colour cells based on value — saves hours of manual formatting.

Data Validation and Named Ranges

Data Validation: Data → Data Validation → List. Prevents wrong data entry — dropdown of valid choices. Named Ranges: Select a range → Name Box (top left) → type a name. Use =SalesData instead of =Sheet1!A2:A500. Makes formulas readable and maintainable.

🗄️ SQL — Querying Databases from Basic to Advanced

The 5 clauses you need for 90% of queries

ClausePurposeExample
SELECTChoose which columns to returnSELECT name, salary, department
FROMWhich table to queryFROM employees
WHEREFilter rows by conditionWHERE department = 'Sales' AND salary > 50000
GROUP BYAggregate rows by a categoryGROUP BY department
ORDER BYSort the resultsORDER BY salary DESC

JOINs — combining data from multiple tables

JOIN typeReturnsUse when
INNER JOINOnly matching rows in BOTH tablesYou want records that exist in both (most common)
LEFT JOINAll rows from left table + matching from rightKeep all left records even if no match (show NULLs)
RIGHT JOINAll rows from right table + matching from leftRare — can usually rewrite as LEFT JOIN
FULL OUTER JOINAll rows from both tablesFind records that exist in one table but not the other

Aggregate functions — the backbone of MIS reporting

FunctionWhat it doesExample query
COUNT(*)Count all rowsSELECT department, COUNT(*) as headcount FROM employees GROUP BY department
SUM(col)Total a numeric columnSELECT region, SUM(sales) as total_sales FROM orders GROUP BY region
AVG(col)Average valueSELECT AVG(salary) FROM employees WHERE department='Finance'
MAX/MIN(col)Highest/lowest valueSELECT MAX(order_date) as latest_order FROM orders

Date functions — essential for MIS time-based reporting

FunctionMySQL/SQL ServerResult
Today's dateGETDATE() / NOW()2025-06-05
Extract monthMONTH(date_col)6 (June)
Format dateFORMAT(date,'YYYY-MM')2025-06
Date differenceDATEDIFF(day, start, end)Number of days
Last N daysWHERE date >= DATEADD(day,-30,GETDATE())Filter last 30 days

Subqueries and CTEs — cleaner complex queries

-- CTE example: find departments above average salary
WITH dept_avg AS (
  SELECT department, AVG(salary) as avg_sal
  FROM employees
  GROUP BY department
)
SELECT * FROM dept_avg
WHERE avg_sal > (SELECT AVG(salary) FROM employees)
ORDER BY avg_sal DESC;

🐍 Python for MIS — Automate Excel, SQL, and Reports

6-step automation framework for any MIS task

StepCodeWhat it does
1. Importimport pandas as pdLoad the Pandas library
2. Read datadf = pd.read_excel("sales.xlsx")Load Excel/CSV into a DataFrame
3. Cleandf.dropna() / df.fillna(0)Remove or fill blank values
4. Transformdf.groupby("Region")["Sales"].sum()Summarise like a pivot table
5. Analysedf[df["Sales"] > 10000]Filter rows by condition
6. Outputdf.to_excel("output.xlsx", index=False)Save result

Complete automation script — monthly sales report

import pandas as pd
from datetime import datetime

# 1. Read source data
df = pd.read_excel("raw_sales.xlsx")

# 2. Clean - remove rows with no sales amount
df = df.dropna(subset=["Sales Amount"])
df["Date"] = pd.to_datetime(df["Date"])

# 3. Filter current month
this_month = datetime.now().month
df = df[df["Date"].dt.month == this_month]

# 4. Summarise by region
summary = df.groupby("Region").agg(
    Total_Sales=("Sales Amount", "sum"),
    Order_Count=("Order ID", "count"),
    Avg_Order=("Sales Amount", "mean")
).round(2).reset_index()

# 5. Sort by total
summary = summary.sort_values("Total_Sales", ascending=False)

# 6. Export
summary.to_excel(f"report_{datetime.now().strftime('%Y%m')}.xlsx", index=False)
print(f"Report saved: {len(summary)} regions")

Reading from SQL database with Python

import pandas as pd
import pyodbc  # or pymysql, psycopg2 depending on your DB

conn = pyodbc.connect(
    "DRIVER={SQL Server};SERVER=myserver;DATABASE=mydb;Trusted_Connection=yes"
)
query = """
SELECT region, SUM(sales) as total
FROM sales_table
WHERE MONTH(sale_date) = MONTH(GETDATE())
GROUP BY region
"""
df = pd.read_sql(query, conn)
conn.close()
df.to_excel("monthly_by_region.xlsx", index=False)

Key Pandas operations every MIS analyst needs

OperationCodeExcel equivalent
Filter rowsdf[df["Status"]=="Active"]AutoFilter
Pivot/groupdf.groupby("Region")["Sales"].sum()Pivot Table
Merge tablespd.merge(df1, df2, on="ID")VLOOKUP
Add columndf["Tax"] = df["Sales"] * 0.18Formula column
Sortdf.sort_values("Sales", ascending=False)Sort A-Z
Remove duplicatesdf.drop_duplicates(subset=["Order ID"])Remove Duplicates
Count by groupdf.groupby("Dept").size()COUNTIF

📈 Power BI — Dashboards, DAX, and Live Reporting

Step-by-step: build your first Power BI dashboard

StepActionNotes
1Get Data → Excel / SQL Server / CSVConnect to your data source
2Power Query Editor → TransformClean data: remove blanks, rename columns, change types
3Model view → define relationshipsLike SQL JOINs — link tables by common column
4Report view → add visualsBar chart, line chart, card, table, map
5Add slicers for interactivityDate slicer, region slicer — user filters on click
6Publish to Power BI ServiceShare URL with team — live dashboard, no email needed

Essential DAX formulas

DAX FormulaWhat it doesExample
CALCULATEChange filter context for a measureSales YTD = CALCULATE(SUM(Sales), DATESYTD(Date[Date]))
SUMXIterate over rows and sum expressionTotal Revenue = SUMX(Orders, Orders[Qty] * Orders[Price])
RELATEDBring value from related tableCategory = RELATED(Products[Category])
IFConditional column or measureStatus = IF(Sales > Target, "Above","Below")
DISTINCTCOUNTCount unique valuesUnique Customers = DISTINCTCOUNT(Orders[CustomerID])
RANKXRank rows by a measureRank = RANKX(ALL(Products), [Total Sales])

Power BI vs Excel — when to use which

ScenarioUse ExcelUse Power BI
One-off analysis for yourself
Live dashboard for management
Share with non-technical usersEmail file (risky)✅ Share URL, always fresh
Data over 1 million rows❌ Slow✅ Handles billions
Quick ad-hoc calculation

⚙️ Apache Airflow — Schedule and Orchestrate MIS Pipelines

What Airflow does in plain English

Airflow is a scheduler for data pipelines. Instead of running your Python script manually every Monday morning, Airflow runs it automatically at 6am, sends you an alert if it fails, and keeps a log of every run. Think of it as Windows Task Scheduler but for data teams — far more powerful and with full visibility.

Key concepts

ConceptWhat it meansReal example
DAGDirected Acyclic Graph — your pipeline as a Python filemonthly_sales_report.py defines what runs and in what order
TaskOne step in the pipelineextract_data → clean_data → generate_report → send_email
OperatorType of taskPythonOperator (run Python), BashOperator (run shell), EmailOperator (send email)
ScheduleWhen to run — cron format"0 6 * * 1" = every Monday 6am
TriggerWhat starts the DAGSchedule, manual trigger, or when another DAG finishes

Simple MIS pipeline in Airflow

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

def extract(): 
    # query database, save to CSV
    pass

def transform():
    # pandas cleaning and aggregation
    pass

def load():
    # write to Excel, send email
    pass

with DAG("mis_monthly_report",
         schedule_interval="0 6 1 * *",  # 6am on 1st of every month
         start_date=datetime(2025,1,1),
         catchup=False) as dag:

    t1 = PythonOperator(task_id="extract", python_callable=extract)
    t2 = PythonOperator(task_id="transform", python_callable=transform)
    t3 = PythonOperator(task_id="load", python_callable=load)

    t1 >> t2 >> t3  # defines the order

🗺️ MIS Learning Roadmap — Month by Month

MonthFocusTarget outcomeDifficulty
Month 1-2Excel masteryPivot tables, VLOOKUP, SUMIFS, basic macros, clean formattingBeginner
Month 3-4SQL fundamentalsSELECT, WHERE, JOIN, GROUP BY — query any database independentlyBeginner-Medium
Month 5-6Python basics + PandasRead/write Excel, filter/group/merge data, automate one real taskMedium
Month 7-8Power BIBuild a live dashboard connected to SQL or Excel, share with teamMedium
Month 9-10Python advanced + SQL advancedETL pipelines, scheduled reports, complex SQL (CTEs, window functions)Medium-High
Month 11-12Airflow + Cloud basicsAutomated pipelines on a schedule, Azure Data Factory or AWS Glue basicsHigh
💡 Practical tip: After each month, build one real project using that skill at work. Theory without practice is forgotten in 2 weeks. The goal is not to finish the roadmap — it is to make your work 20% faster each month.

🎯 MIS Interview Questions

MIS · BEGINNER
What is a VLOOKUP and when would you use it?
VLOOKUP = Vertical Lookup. It searches for a value in the leftmost column of a table and returns a value in the same row from a specified column. Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]). Use FALSE for exact match. When to use: joining data from two sheets by a common ID. Example: you have employee IDs in Sheet1 and a salary table in Sheet2. =VLOOKUP(A2,Sheet2!A:C,3,FALSE) returns the salary. Limitation: VLOOKUP can only look to the RIGHT. If you need to look left, use INDEX/MATCH instead.
MIS · BEGINNER
What is the difference between COUNT, COUNTA, and COUNTIF in Excel?
COUNT counts only numeric cells. COUNTA counts ALL non-empty cells including text. COUNTIF counts cells that match a specific condition. Examples: COUNT(A:A) — how many cells in column A have numbers. COUNTA(A:A) — how many cells are not empty. COUNTIF(B:B,"Completed") — how many cells in column B say "Completed". COUNTIFS extends this to multiple conditions: COUNTIFS(Region,"North",Status,"Active") = count rows where both conditions are true.
MIS · ENGINEER
A manager asks for a report showing top 10 products by revenue this quarter. Walk through how you would build this in SQL.
SELECT product_name, SUM(revenue) as total_revenue, COUNT(order_id) as order_count FROM orders WHERE order_date >= DATEADD(quarter, DATEDIFF(quarter,0,GETDATE()),0) AND order_date < GETDATE() GROUP BY product_name ORDER BY total_revenue DESC LIMIT 10 (or TOP 10 in SQL Server before the SELECT). Walk through: SELECT the product name and aggregate revenue and order count. FROM the orders table. WHERE filter to current quarter — DATEDIFF/DATEADD calculates the first day of current quarter. GROUP BY product so aggregates are per product. ORDER BY DESC to get highest first. LIMIT/TOP 10 for only the top 10. In an interview, always explain WHY each clause is there, not just what it does.
MIS · ENGINEER
You have a 500MB Excel file that takes 10 minutes to open and crashes regularly. What do you do?
This is a Python/SQL problem, not an Excel problem. Step 1 — identify what is making it large: formulas that reference entire columns (=SUM(A:A) instead of =SUM(A2:A10000)), images or charts embedded, many pivot caches, no filters applied to limit data. Step 2 — migrate to Python: import pandas as pd; df = pd.read_excel("large.xlsx"). Pandas handles 500MB easily. Step 3 — if data comes from a database, connect Python directly to the DB with pd.read_sql() instead of exporting to Excel first. Step 4 — output only the summary, not raw data, back to Excel: summary.to_excel("report.xlsx"). The original 500MB becomes a 10KB summary report.
MIS · ARCHITECT
Design a monthly sales report automation for 50 regional managers. Data is in SQL Server, output is email with Excel attachment.
Architecture: Python script triggered by Airflow on the 1st of every month at 6am. Step 1 — Python reads regional manager list from SQL: SELECT manager_id, name, email, region FROM managers. Step 2 — Loop through each manager. Step 3 — For each manager, run SQL query filtered to their region: SELECT product, SUM(sales) FROM orders WHERE region = ? AND MONTH(date) = last_month GROUP BY product. Step 4 — Use Pandas + openpyxl to create a formatted Excel file per manager with their data only. Step 5 — Use smtplib or SendGrid to email each manager their personalised report. Airflow tracks success/failure per manager. If one email fails, only that one retries — other 49 already sent. Total time for 50 reports: under 2 minutes. Without automation: one person spending a full day each month.