Statistics

Distributions

Binomial Distribution

Binomial Setting

1. Each observation is either a "success" or "failure"

2. Fixed number n of observations

3. The n observations are independent

4. Probability of success p is the same for each observation

Binomial Probability

Binomial Mean

Binomial Standard Deviation

Approximating Binomial distribution with Normal distribution

np ≥ 10 and n(1-p) ≥ 10

Accuracy improves as n increases and p is close to 0.5

Geometric Distribution

Geometric Setting

1. Each observation is either a "success" or "failure"

2. All observations are independent

3. Probability of success p is the same for each observation

4. Variable of interest is number of trails required to obtain first success

Geometric Probability

Geometric Mean

Geometric Standard Deviation

Probability that it takes more than n trials to see the first success

Poisson Distribution

Poisson Setting

1. Events occur singly and randomly

2. Events occur uniformly

3. Events occur independently

4. Probability of occurrence within a small fixed interval is negligible

Poisson Probability

Poisson Mean

Poisson Variance

Approximating Binomial distribution with Poisson distribution

n is greater than 50, np is smaller than 5

λ = np

Accuracy improves as n increases and p decreases

Sampling Distribution

Parameter: Number that describes population

Statistic: Number that describes sample

Center of distribution close to true value of p

Unbiased statistic: Mean of sampling distribution is equal to true value of parameter

Variability: Described by spread

Larger samples give smaller spread

Sampling Distribution of Sample Proportions

Mean = p

Standard Deviation

Only used when population is at least 10 times sample size

Normal approximation: np and n(1-p) are at least 10

Sampling Distribution of Sample Mean

Mean = μ

Standard Deviation

Only used when population is at least 10 times sample size

Central Limit Theorem

n > 30, sampling distribution of sample mean is approximately Normal for any population with finite standard deviation σ

t Distribution

Spread is a bit greater than that of standard Normal distribution

More Normal as degrees of freedom increases

Confidence Intervals

Statistic ± Margin of Error

Confidence level C gives probability that interval will capture the true parameter value in repeated samples

Population Mean

Procedure

Step 1: State parameter of interest

Step 2: Name inference procedure & check assumptions/conditions

1. Sample must be an SRS from population of interest

2. Sampling distribution of sample mean is at least approximately Normal (CLT when n is larger than 30)

3. Individual observations are independent, population size at least 10 times sample size

Step 3: Calculate confidence interval

Step 4: Interpret results

σ is known

σ is unknown

One-Sample t Interval

Two-Sample t Interval

Population Proportion

Procedure

Step 1: State parameter of interest

Step 2: Name inference procedure & check assumptions/conditions

1. Sample must be an SRS from population of interest

2. Sampling distribution of sample mean is at least approximately Normal (np and n(1-p) are at least 10)

3. Individual observations are independent, population size at least 10 times sample size

Step 3: Calculate confidence interval

Step 4: Interpret results

One-Proportion z Interval

Two-Proportion z Interval

Normality Condition

Significance Tests

Procedure

Step 1: State hypotheses

Null Hypothesis H0 - No effect or change in population

Alternative Hypothesis Ha - Claim about population we are trying to find evidence for

Step 2: Name inference procedure & check assumptions/conditions

1. Sample must be an SRS from population of interest

2. Sampling distribution of sample mean is at least approximately Normal

Means: Normal distribution or large sample size (n is at least 30)

Proportions: np and n(1-p) are at least 10

3. Individual observations are independent, population size at least 10 times sample size

Step 3: Calculate test statistic & find P-value

Test Statistic

P-values

Probability that observed outcome would take a value as extreme as or more extreme than actually observed

Step 4: Interpret results

Statistical Significance

Data is statistically significant at level α if P-value is as small or smaller than α

2-Sided Tests

Rejects null hypothesis exactly when μ0 falls outside a level (1-α) for μ

Errors

Type I

Reject null hypothesis when it is actually true

P(Type I) = α

Type II

Do not reject null hypothesis when it is actually false

P(Type II) = β

Power

1-β

Increasing Power

Increase α

Consider alternative father away from μ0

Increase n

Decrease σ

One-Sample z Statistic

One-Sample t Statistic

One-Proportion z Statistic

np0 and n(1-p0) are both at least 10

Two-Sample z Statistic

Two-Sample t Statistic

Two-Proportion z Statistic

Normality Condition

Chi-Square Procedures

Chi-Square Test Statistic

Chi-Square Test for Goodness of Fit

Condition: Expected counts are at least 5

Chi-Square Test for Homogeneity of Populations

Inference for Two-Way Tables

(r-1)(c-1) degrees of freedom

Condition: No more than 20% of expected counts are less than 5 and all individual expected counts are at least 1

Expected Count

Chi-Square Test for Association/Independence