Statistics
Distributions
Binomial Distribution
Binomial Setting
1. Each observation is either a "success" or "failure"
2. Fixed number n of observations
3. The n observations are independent
4. Probability of success p is the same for each observation
Binomial Probability
Binomial Mean
Binomial Standard Deviation
Approximating Binomial distribution with Normal distribution
np ≥ 10 and n(1-p) ≥ 10
Accuracy improves as n increases and p is close to 0.5
Geometric Distribution
Geometric Setting
1. Each observation is either a "success" or "failure"
2. All observations are independent
3. Probability of success p is the same for each observation
4. Variable of interest is number of trails required to obtain first success
Geometric Probability
Geometric Mean
Geometric Standard Deviation
Probability that it takes more than n trials to see the first success
Poisson Distribution
Poisson Setting
1. Events occur singly and randomly
2. Events occur uniformly
3. Events occur independently
4. Probability of occurrence within a small fixed interval is negligible
Poisson Probability
Poisson Mean
Poisson Variance
Approximating Binomial distribution with Poisson distribution
n is greater than 50, np is smaller than 5
λ = np
Accuracy improves as n increases and p decreases
Sampling Distribution
Parameter: Number that describes population
Statistic: Number that describes sample
Center of distribution close to true value of p
Unbiased statistic: Mean of sampling distribution is equal to true value of parameter
Variability: Described by spread
Larger samples give smaller spread
Sampling Distribution of Sample Proportions
Mean = p
Standard Deviation
Only used when population is at least 10 times sample size
Normal approximation: np and n(1-p) are at least 10
Sampling Distribution of Sample Mean
Mean = μ
Standard Deviation
Only used when population is at least 10 times sample size
Central Limit Theorem
n > 30, sampling distribution of sample mean is approximately Normal for any population with finite standard deviation σ
t Distribution
Spread is a bit greater than that of standard Normal distribution
More Normal as degrees of freedom increases
Confidence Intervals
Statistic ± Margin of Error
Confidence level C gives probability that interval will capture the true parameter value in repeated samples
Population Mean
Procedure
Step 1: State parameter of interest
Step 2: Name inference procedure & check assumptions/conditions
1. Sample must be an SRS from population of interest
2. Sampling distribution of sample mean is at least approximately Normal (CLT when n is larger than 30)
3. Individual observations are independent, population size at least 10 times sample size
Step 3: Calculate confidence interval
Step 4: Interpret results
σ is known
σ is unknown
One-Sample t Interval
Two-Sample t Interval
Population Proportion
Procedure
Step 1: State parameter of interest
Step 2: Name inference procedure & check assumptions/conditions
1. Sample must be an SRS from population of interest
2. Sampling distribution of sample mean is at least approximately Normal (np and n(1-p) are at least 10)
3. Individual observations are independent, population size at least 10 times sample size
Step 3: Calculate confidence interval
Step 4: Interpret results
One-Proportion z Interval
Two-Proportion z Interval
Normality Condition
Significance Tests
Procedure
Step 1: State hypotheses
Null Hypothesis H0 - No effect or change in population
Alternative Hypothesis Ha - Claim about population we are trying to find evidence for
Step 2: Name inference procedure & check assumptions/conditions
1. Sample must be an SRS from population of interest
2. Sampling distribution of sample mean is at least approximately Normal
Means: Normal distribution or large sample size (n is at least 30)
Proportions: np and n(1-p) are at least 10
3. Individual observations are independent, population size at least 10 times sample size
Step 3: Calculate test statistic & find P-value
Test Statistic
P-values
Probability that observed outcome would take a value as extreme as or more extreme than actually observed
Step 4: Interpret results
Statistical Significance
Data is statistically significant at level α if P-value is as small or smaller than α
2-Sided Tests
Rejects null hypothesis exactly when μ0 falls outside a level (1-α) for μ
Errors
Type I
Reject null hypothesis when it is actually true
P(Type I) = α
Type II
Do not reject null hypothesis when it is actually false
P(Type II) = β
Power
1-β
Increasing Power
Increase α
Consider alternative father away from μ0
Increase n
Decrease σ
One-Sample z Statistic
One-Sample t Statistic
One-Proportion z Statistic
np0 and n(1-p0) are both at least 10
Two-Sample z Statistic
Two-Sample t Statistic
Two-Proportion z Statistic
Normality Condition
Chi-Square Procedures
Chi-Square Test Statistic
Chi-Square Test for Goodness of Fit
Condition: Expected counts are at least 5
Chi-Square Test for Homogeneity of Populations
Inference for Two-Way Tables
(r-1)(c-1) degrees of freedom
Condition: No more than 20% of expected counts are less than 5 and all individual expected counts are at least 1
Expected Count
Chi-Square Test for Association/Independence