1. Probability Theory
Concept of Probability:
Probability measures the likelihood of an event occurring.
It always lies between 0 and 1.
-
0 → Impossible event
-
1 → Certain event
Types of Probability:
-
Classical – based on equally likely outcomes (e.g., coin toss).
-
Empirical – based on past data (e.g., rainfall probability).
-
Subjective – based on personal judgment.
🔹 Important Concepts:
-
Independent Events: Occurrence of one doesn’t affect another.
-
Mutually Exclusive Events: Cannot occur simultaneously.
-
Conditional Probability:
-
Bayes’ Theorem: Used for revision of probabilities based on new information.
2. Probability Distributions
Discrete Distributions:
-
Binomial Distribution:
Used for success-failure experiments.
-
Poisson Distribution:
Used when events are rare and independent (e.g., accidents).
🔹 Continuous Distribution:
-
Normal Distribution:
Bell-shaped curve; symmetric around mean.
Mean = Median = Mode.
Used in sampling, hypothesis testing, etc.
3. Moments and Central Limit Theorem
🔹 Moments:
Moments describe shape of a distribution.
-
1st moment → Mean
-
2nd moment → Variance
-
3rd moment → Skewness (asymmetry)
-
4th moment → Kurtosis (peakedness)
🔹 Central Limit Theorem (CLT):
As sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of population distribution.
💡 This theorem justifies the use of normal probability in statistics.
4. Descriptive Statistics
🔹 Measures of Central Tendency:
-
Mean (Average):
-
Median: Middle value when data arranged in order.
-
Mode: Most frequent value.
🔹 Measures of Dispersion:
Indicate how data values are spread around the mean.
-
Range
-
Variance
-
Standard Deviation
-
Coefficient of Variation (CV)
🔹 Correlation:
Shows relationship between two variables.
Karl Pearson’s coefficient:
Values of r lie between -1 and +1.
🔹 Index Numbers:
Measure changes in price, quantity, or value over time.
Types:
-
Price Index (e.g., CPI, WPI)
-
Quantity Index
-
Value Index
Formulas:
-
Laspeyres Index: Base year weights
-
Paasche Index: Current year weights
-
Fisher’s Index: Geometric mean of the two (Ideal index)
5. Sampling Methods & Sampling Distribution
🔹 Sampling Methods:
-
Random Sampling – every unit has equal chance.
-
Simple Random
-
Stratified Random
-
Systematic Sampling
-
Cluster Sampling
-
-
Non-random Sampling – convenience or judgment-based.
🔹 Sampling Distribution:
Distribution of a statistic (like mean) from repeated random samples.
Used to estimate population parameters.
Standard Error (SE) = Standard deviation of a sampling distribution.
6. Statistical Inference and Hypothesis Testing
🔹 Estimation:
-
Point Estimate: single value (e.g., sample mean).
-
Interval Estimate: range of values (confidence interval).
🔹 Hypothesis Testing Steps:
-
State Null (H₀) and Alternative (H₁) hypotheses
-
Choose significance level (α)
-
Select appropriate test statistic (Z, t, χ², F)
-
Define rejection region
-
Calculate test statistic
-
Accept or reject H₀
🔹 Common Tests:
-
Z-test: Large samples (n > 30)
-
t-test: Small samples
-
χ²-test: Goodness of fit or independence
-
F-test: Compare two variances
7. Linear Regression Models
🔹 Simple Linear Regression:
where
-
Y = Dependent variable
-
X = Independent variable
-
u = Random error term
🔹 Properties of OLS (BLUE):
OLS estimators are Best Linear Unbiased Estimators when:
-
Linear in parameters
-
Expected value of error = 0
-
Homoscedasticity (constant variance)
-
No autocorrelation
-
No perfect multicollinearity
-
Errors are normally distributed
8. Identification Problem
Occurs in simultaneous equation systems when parameters cannot be uniquely estimated.
Identification Types:
-
Under-identified: Insufficient restrictions → No unique solution
-
Exactly identified: Just enough restrictions → Unique solution
-
Over-identified: More restrictions than needed → Multiple estimates
9. Simultaneous Equation Models
🔹 Recursive Models:
-
Equations arranged in sequence
-
No feedback
-
Can be solved by OLS
🔹 Non-Recursive Models:
-
Feedback present (mutual dependence)
-
Require Two-Stage Least Squares (2SLS) or Instrumental Variables (IV) for estimation.
10. Discrete Choice Models
Used when dependent variable is categorical (0/1, yes/no).
Types:
-
Logit Model – uses logistic function
-
Probit Model – uses cumulative normal distribution
Example: Probability of employment, adoption of technology, etc.
11. Time Series Analysis
Components of Time Series:
-
Trend (T): Long-term direction.
-
Seasonal (S): Regular pattern within a year.
-
Cyclical (C): Long-term up and down movements (business cycles).
-
Irregular (I): Random variations.
Models:
-
Additive Model:
-
Multiplicative Model:
Stationarity:
A series is stationary when mean, variance, and covariance remain constant over time.
Autocorrelation:
Measures correlation between current and past values of a series.
AR, MA, ARMA, ARIMA Models:
Used for forecasting and economic time series modeling.
🧾 Quick Summary Table
| Topic | Key Concept / Formula | Use / Importance |
|---|---|---|
| Probability | Foundation of statistics | |
| Normal Distribution | Bell-shaped curve | Basis for inference |
| CLT | Sample mean → normal | Enables hypothesis testing |
| Correlation | Strength of relationship | |
| Regression | Predictive analysis | |
| BLUE |
Best Linear Unbiased Estimator |
Gauss-Markov theorem |
| Hypothesis Testing | Z, t, χ², F tests | Decision making |
| Identification | Unique estimation issue |
Econometric modeling |
| Logit/Probit | Binary dependent variable | Discrete choice |
| Time Series |
Trend, Seasonality, Cyclic |
Forecasting |
