UGC NET Economics Unit 3 -Statistics and Econometrics

1. Probability Theory

Concept of Probability:

Probability measures the likelihood of an event occurring.
It always lies between 0 and 1.

$P (A) = \frac{Number of favourable outcomes}{Total number of outcomes}$

0 → Impossible event
1 → Certain event

Types of Probability:

Classical – based on equally likely outcomes (e.g., coin toss).
Empirical – based on past data (e.g., rainfall probability).
Subjective – based on personal judgment.

🔹 Important Concepts:

Independent Events: Occurrence of one doesn’t affect another.
Mutually Exclusive Events: Cannot occur simultaneously.
Conditional Probability:

$P (A ∣ B) = \frac{P (A \cap B)}{P (B)}$
Bayes’ Theorem: Used for revision of probabilities based on new information.

2. Probability Distributions

Discrete Distributions:

Binomial Distribution:

$P (x) = (\binom{n}{x}) p^{x} q^{n - x}$

Used for success-failure experiments.
Poisson Distribution:
Used when events are rare and independent (e.g., accidents).

$P (x) = \frac{e^{- λ} λ^{x}}{x!}$

🔹 Continuous Distribution:

Normal Distribution:
Bell-shaped curve; symmetric around mean.
Mean = Median = Mode.
Used in sampling, hypothesis testing, etc.

3. Moments and Central Limit Theorem

🔹 Moments:

Moments describe shape of a distribution.

1st moment → Mean
2nd moment → Variance
3rd moment → Skewness (asymmetry)
4th moment → Kurtosis (peakedness)

🔹 Central Limit Theorem (CLT):

As sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of population distribution.

💡 This theorem justifies the use of normal probability in statistics.

4. Descriptive Statistics

🔹 Measures of Central Tendency:

Mean (Average):

$\overset{ˉ}{X} = \frac{\sum X_{i}}{n}$
Median: Middle value when data arranged in order.
Mode: Most frequent value.

🔹 Measures of Dispersion:

Indicate how data values are spread around the mean.

Range
Variance
Standard Deviation
Coefficient of Variation (CV)

🔹 Correlation:

Shows relationship between two variables.
Karl Pearson’s coefficient:

$r = \frac{\sum (X - \overset{ˉ}{X}) (Y - \overset{ˉ}{Y})}{\sqrt{\sum (X - \overset{ˉ}{X})^{2} \sum (Y - \overset{ˉ}{Y})^{2}}}$

Values of r lie between -1 and +1.

🔹 Index Numbers:

Measure changes in price, quantity, or value over time.
Types:

Price Index (e.g., CPI, WPI)
Quantity Index
Value Index

Formulas:

Laspeyres Index: Base year weights
Paasche Index: Current year weights
Fisher’s Index: Geometric mean of the two (Ideal index)

5. Sampling Methods & Sampling Distribution

🔹 Sampling Methods:

Random Sampling – every unit has equal chance.
- Simple Random
- Stratified Random
- Systematic Sampling
- Cluster Sampling
Non-random Sampling – convenience or judgment-based.

🔹 Sampling Distribution:

Distribution of a statistic (like mean) from repeated random samples.
Used to estimate population parameters.

Standard Error (SE) = Standard deviation of a sampling distribution.

6. Statistical Inference and Hypothesis Testing

🔹 Estimation:

Point Estimate: single value (e.g., sample mean).
Interval Estimate: range of values (confidence interval).

🔹 Hypothesis Testing Steps:

State Null (H₀) and Alternative (H₁) hypotheses
Choose significance level (α)
Select appropriate test statistic (Z, t, χ², F)
Define rejection region
Calculate test statistic
Accept or reject H₀

🔹 Common Tests:

Z-test: Large samples (n > 30)
t-test: Small samples
χ²-test: Goodness of fit or independence
F-test: Compare two variances

7. Linear Regression Models

🔹 Simple Linear Regression:

$Y = α + β X + u$

where

Y = Dependent variable
X = Independent variable
u = Random error term

🔹 Properties of OLS (BLUE):

OLS estimators are Best Linear Unbiased Estimators when:

Linear in parameters
Expected value of error = 0
Homoscedasticity (constant variance)
No autocorrelation
No perfect multicollinearity
Errors are normally distributed

8. Identification Problem

Occurs in simultaneous equation systems when parameters cannot be uniquely estimated.

Identification Types:

Under-identified: Insufficient restrictions → No unique solution
Exactly identified: Just enough restrictions → Unique solution
Over-identified: More restrictions than needed → Multiple estimates

9. Simultaneous Equation Models

🔹 Recursive Models:

Equations arranged in sequence
No feedback
Can be solved by OLS

🔹 Non-Recursive Models:

Feedback present (mutual dependence)
Require Two-Stage Least Squares (2SLS) or Instrumental Variables (IV) for estimation.

10. Discrete Choice Models

Used when dependent variable is categorical (0/1, yes/no).

Types:

Logit Model – uses logistic function
Probit Model – uses cumulative normal distribution

Example: Probability of employment, adoption of technology, etc.

11. Time Series Analysis

Components of Time Series:

Trend (T): Long-term direction.
Seasonal (S): Regular pattern within a year.
Cyclical (C): Long-term up and down movements (business cycles).
Irregular (I): Random variations.

Models:

Additive Model: $Y = T + S + C + I$
Multiplicative Model: $Y = T \times S \times C \times I$

Stationarity:

A series is stationary when mean, variance, and covariance remain constant over time.

Autocorrelation:

Measures correlation between current and past values of a series.

AR, MA, ARMA, ARIMA Models:

Used for forecasting and economic time series modeling.

🧾 Quick Summary Table

Topic	Key Concept / Formula	Use / Importance
Probability	$P (A) = \frac{f}{n}$	Foundation of statistics
Normal Distribution	Bell-shaped curve	Basis for inference
CLT	Sample mean → normal	Enables hypothesis testing
Correlation	$r \in [- 1, + 1]$	Strength of relationship
Regression	$Y = a + b X$	Predictive analysis
BLUE	Best Linear Unbiased Estimator	Gauss-Markov theorem
Hypothesis Testing	Z, t, χ², F tests	Decision making
Identification	Unique estimation issue	Econometric modeling
Logit/Probit	Binary dependent variable	Discrete choice
Time Series	Trend, Seasonality, Cyclic	Forecasting