Unit 1 Summary Notes

Activity 1: Exploring Data

Other Definitions

Population: A complete group that can be subject to a study, all with one common attribute between them.

Data: A physical representation of information from which a conclusion can be drawn.

Sample: A smaller, more manageable section of a population, the size of which is found using a statistical measurement.

Frequency: The number of times a value of data will occur.

Frequency Tables: A table that shows how many times each value in data occurs.

Class Interval: Ranges of a numerical width that data on frequency can be sorted into.

Casual Relationship: One variable having a direct influence on the other in a set of data.

Statistics: The branch of science/mathematics that is concerned with conducting studies, then collecting and organizing the data to summarize and analyze it, and draw conclusions.

Helpful Tips

A general rule for qualitative versus quantitative is: if math can be applied to it, then it is quantitative

Even though numbers are occasionally applied to data dealing with qualitative variables (for example, males being referred to as “1” and females being referred to as “2”), they would still be qualitative variables

Discrete variable can be seen as 1 and 2, while continuous variables can be seen as every value in between (1.999999, and so on)

Nominal sounds like “name”, which is a reminder that they are labels

Useful Websites

Types of Variables

Important Definitions

Qualitative Variables: A qualitative variable is sometimes referred to as a categorical variable, as it fits into categories that have no natural order. These variables are not numerical and cannot be used in mathematical equations.

Quantitative Variables: A quantitative variable is sometimes referred to as a numerical variable, as it represents a measurable quantity. To create a scatterplot of data, the data needs to be made up of quantitative variables.

Discrete Variables: This is a numerical variable that does not have an infinite amount of possible values. If a set of items can be counted, then it is referred to as a discrete variable.

Continuous Variables: If a variable can have any possible value, meaning there is an infinite number of values, then it is a continuous variable. A continuous variable is found by measuring, and can take any value that is between two numbers.

Nominal Measurement: Labelling variables that do not have and quantitative value to them. Because of that, any nominal scales can also simply be referred to as labels. These labels are mutually exclusive, which means that they do not have any overlap.

Ordinal Measurement: The order of the values is important, but the differences between each variable is not known. Typically, a measure of concepts that are non-numerical, such as feelings and similar concepts.

Interval Measurement: In this numeric measurement, we know both the order and the difference between the values. A common example is temperature, in which the difference between 60 degrees and 50 degrees can be seen as 10 degrees, and the two measurements of temperature can be ordered.

Activity 2: Sampling Techniques

Important Definitions

Simple Random Sampling: This kind of sampling is tough to utilize, and is the kind that is used the least. This type of sampling solidifies the sample being representative of the population, as all individuals in the population have an equal chance of being chosen for the sample. This is difficult to perform as a full list of people in the population is needed, which means it cannot be used for large studies. When a list of people is required, a specific number is given to each individual and those numbers are randomly chosen to produce the sample.

Systematic Random Sampling: This type of sampling starts with the population being arranged into an order, this can include things such as ordering by last name. Once the list is required, a random starting point is chosen, then each individual from the population to be put into the sample is picked by intervals. The interval can be found by dividing the population number by the number of people you want in the sample, and that is used to count from the starting point, picking the individual at the end of each count until the sample is selected.

Stratified Random Sampling: For this type of sampling, the population is divided into groups of a common characteristic, these groups are called strata. When these strata are found, a simple random sample is applied within them to achieve the sample. This method of sampling is helpful in large populations that have distinguishable groups that exist within it.

Cluster Random Sampling: Much like stratified random sampling, this kind of sampling will divide the population into groups. Once the groups, or clusters, are found, a random sample of the clusters as a whole is selected. These clusters will make up the sample, and the study is conducted on all members of the cluster.

Voluntary-Response Sampling: This type of sampling involves volunteers choosing to answer a survey or questionnaire. This sample of the population has not been chosen by the administrator of the survey, it has now been chosen by the individuals that choose to answer the survey. Because of this, bias occurs as the people who will choose to answer usually have a strong opinion on the topic.

Convenience Sampling: This type of sampling is used when an administrator of a survey will pick their sample by who is close and easy to access. This is a non-probability type of sampling and is not representative of the entire population, as the administrator focuses more on individuals that are easy for them to survey.

Quota Sampling: This is a type of stratified sampling that incorporates strata within the population, then will take the sample to meet a certain quota. These quotas will ensure that the sample taken is exactly proportionate and representative of the overall population, the percentages of different groups in the population will be the same as the percentages of these same groups of the sample that is selected

Formulas

SAMPLE SIZE = Percent being sampled x Population size

INTERVAL = Population size/Sample size

PERCENT OF POPULATION SAMPLED = (number of people chosen/total population) x 100%

Sampling Techniques

Useful Websites

Activity 3: Bias

Other Forms of Bias

Acquiescence Bias: This happens when the respondent of the survey will tend to agree with whoever made the survey, which leads to agreement and positive answers. This is also known as the friendliness bias.

Social Desirability Bias: This happens when respondents will answer questions dishonestly so they fit in with the norm, as they want to present themselves in the best light. This occurs frequently with questions that deal with sensitive topics, in which people will want to give socially acceptable answers to.

Habituation: This happens when questions begin to get repetitive, which will influence respondents to respond similarly for each of these questions as a cognitive response.

Types of Bias

Important Definitions

Sampling Bias: This type of bias occurs when the sample collected from does not correctly represent the full population. If a sample did not have any sampling bias, the only differences between the samples taken from and the distribution of the population occur from random chance. This is commonly caused by convenience sampling or the design of the study which may pertain more to certain groups of individuals.

Non-response Bias: This type of bias occurs in study methods that typically have a very low response rate, such as mail surveys. Some people can choose to not respond to this survey, which shows that there is a differentiation between the people who responded to the survey, and those that did not. This kind of bias can be avoided by keeping surveys interesting and brief, and giving possible respondents an incentive to respond.

Household Bias: This type of bias occurs when the different groups from the sampling frame do not receive equal representation. This can occur when someone conducting a survey doesn’t acknowledge the fact that the strata within the population have different numbers, and he samples the same number of people from each stratum. This means that the composition of the sample will not match up with the composition of the population, which will skew the results of the study.

Response Bias: This bias occurs when aspects of the survey (leading questions, the wording of questions, confusing questions or format, ect..) cause the respondent to respond dishonestly when completing the survey. The respondent may be unaware of how their answers are being swayed, yet this still is a big contributor to bias and unreliable answers in survey questions.

Useful Websites

Activity 4: Collecting Data

Useful Websites

Types of Closed-Ended Questions

Dichotomous Question: These closed-ended questions only have two possible answers, such as yes/no, or true/false.

Multiple Choice Question: This type of closed-ended question consists of the question asked, then multiple possible answers that are available for a respondent to pick, these questions are easy to use in studies.

Rating Scale Multiple Choice Question: This question asks for a rating on a certain aspect of the service or product, and the respondent can answer with a fixed value (usually numbers, or stars).

Likert Scale Multiple Choice Question: This question involves either a question or statement, with the possible answers being presented in a scale with varying levels of agreement as the options to pick.

Checklist Multiple Choice Questions: This presents the respondent with a list of items as an answer to a question, the respondent then picks one or more of the items from the list, the number chosen depends on the question that is asked.

Rank Order Multiple Choice Questions: These questions have multiple items that the respondent can rank, usually placing high numbers on the options they have more of a preference for, and low numbers on options they do not like as much.

Confounding Triangle

Important Definitions

Primary Data: This is data that a researcher has directly obtained or observed.

Secondary Data: This is data that had been published in the past and was collected by other parties.

Open-Ended Questions: These questions are meant to receive an answer that is free form and thoughtful answers. These usually start with “how” or “what”.

Closed-Ended Questions: These are questions that have a set number of responses, whether that be a yes or no, or answers organized in a multiple choice form or a checklist. This type of question is ideal for surveys.

Confounding Variable: This is an outside influence that is not accounted for in a cause and effect relationship. This means that it changes the effect and outcome between a dependent variable and an independent variable, and can render a study useless as a cofounding variable will cause correlations to be drawn between the independent and dependent variable where no correlation is actually occurring.

Activity 5: Collecting Data Using Technology

Using the Statscan Website

1. Hover over "browse by key resource" at the top of the page and select CANSIM

2. Find something that interests you by either searching for it or browsing the available subjects

3. Select a narrowed down topic within the subject

4. Select the chart of data within that topic that interests you

5. Select Add/Remove Data and refine the chart to what you are looking for

6. Select Manipulate and chose what your chart of data will contain

7. Choose Download, select CSV for file format, then download the data and save it

In Excel you can manipulate the data to your liking and use it to create graphs

The Beginner's Guide to Excel

How to Find Old Stats

1. Hover over Browse by Key Resource and select Publication

2. In the text box, type in the code 11-516-x and hit search

3. Select the archived content that interests you

4. Download or view the stats for the topic

Statscan

Satisfies the federal responsibility for Canadian citizens to have access to statistics

This data can be utilized and observed by Canadians to provide a wide array of functions, such as influencing the choices made by politicians.

There is a Census which is conducted every five years, and the website also features approximately 350 active surveys that Canadian's can observe

If one were to want to search for information on this websites there are two stages they should consider, the first being a broad search for trends within data that the searcher deems relevant and interesting, and the second being searching further for more statistical data to further analyze.

The first step in searching for information can be completed by looking at the news section on the website, which can be accessed by selecting The Daily and browsing the recent and relevant articles there.

Jordan Aultman