QM Data Collection & Data Cleaning Flashcards

1
Q

 Used to filter individuals from a population and create samples

A

probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • – random selextion of elements for a sample. Most used
A

Simple random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • – a large population is divided into groups(strata)
A

Stratified random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • – the main segment is divided into clusters (geographic segmentation) univ -> colleges
A

Cluster sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • – the starting point of a sample is chosen randomly and all the other elements are chosen using a fixed interval(population size / target sample size)
A

Systematic sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • – elements of a sample are chosen only due to one prime reason, their proximity to the researcher. It is also quick and easy to implement
A

Convenience sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • – similar to cs but the researchers can choose a single element or a group of samples and conduct research consecutively
A

Consecutive sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • – researchers can select elements using their knowledge of target traits and personalities to form strata
A

Quota sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  • – conducted with target audiences who are difficult to contact and get information. Target audience are rare to put together
A

Snowball sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  • – samples are created only based on the researcher’s experience and research skill.
A

Judgmental sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

o – research method used for collecting data

A

Survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
    • Used to categorize data into mutually exclusive categories or groups.
A

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
    • Used to measure variables in a natural order, such as rating or ranking. They provide meaningful insights into attitudes, preferences, and behaviors by understanding the order of responses.
A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
    • Used to measure variables with equal intervals between values. Temperature and time often make use of this type of measurement, enabling precise comparisons and calculations.
A

Interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
    • Allows for comparisons and computations such as ratios, percentages, and averages. Great for research in fields like science, engineering, and finance, where you need to use ratios, percentages, and averages to understand the data.
A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  • – most widely used and effective survey distribution method (can use questionpro email management feature to send out and collect survey responses.
A

Email

17
Q
  • : Using social media to distribute the survey aids in collecting a higher number of responses from the people that are aware of the brand.
A

Social distribution

18
Q

It store the URL for the survey. You can print/publish this code in magazines, signs, business cards, or on just about any object/medium.

A

QR code

19
Q
  • is a quick and time-effective way to collect a high number of responses.
A

SMS survey

20
Q
  • ‘’ is a market research term used for defining the number of individuals included in conducting research. Researchers choose their sample based on demographics, such as age, gender questions, or physical location. It can be vague or specific.
A

Sample size

21
Q
  • is the process of choosing the right number of observations or people from a larger group to use in a sample. The goal of figuring out the sample size is to ensure that the sample is big enough to give statistically valid results and accurate estimates of population parameters but small enough to be manageable and cost-effective.
A

Sample size determination

22
Q
  • The _________ tells you how sure you can be that your data is accurate. It is expressed as a percentage and aligned to the confidence interval. For example, if your confidence level is 90%, your results will most likely be 90% accurate.
A

confidence level

23
Q
  • A _ describes how close you can reasonably expect a survey result to fall relative to the real population value.
A

margin of error

24
Q
  • is the measure of the dispersion of a data set from its mean. It measures the absolute variability of a distribution. The higher the dispersion or variability, the greater the standard deviation and the greater the magnitude of the deviation.
A

Standard deviation

25
Q
  • Also known as** data scrubbing** or data cleansing.
  • Huge impact on the reliability and validity of your final data.
  • Ensures the use of the highest-quality data to perform the analysis.
  • ”Garbage in, garbage out” George Feuchsel
  • 80/20 Dilemma: 80% of research time is finding, cleaning, and reorganizing huge amounts of data. Only 20% is spent on actual data analysis.
A

Data Cleaning

26
Q

o Also known as Exploratory Data Analysis (EDA)
o Developed by John Tukey in the late 1970s
o The role of the data analyst is to listen to the data in as many ways as possible until a plausible ‘story’ of the data is apparent (Behrens, 1997).
oEDA is an approach used to better understand the data through quantitative and graphical methods.
o Quantitative methods summarize variable characteristics by using measures of central tendency, including mean, median, and mode.
o Exploring data through EDA techniques supports discovery of underlying patterns and anomalies, helps frame hypotheses, and verifies assumptions related to analysis.

A

1. Discovering Data

27
Q

o Structuring is an important core data cleaning and preparation activity that focuses on reshaping data for a particular statistical analysis. Data can contain irregularities and inconsistencies, which can impact the accuracy of the researcher’s models.
o Depending on the research question(s), you may need to set up the data in different ways for different types of analyses. Repeated measures data, where each experimental unit or subject is measured at several points in time or at different conditions, can be used to illustrate this.

A

2. Structuring Data

28
Q

o is central to ensuring you have high-quality data for analysis.

A

3. Data cleaning

29
Q

TIPS FOR DATA CLEANING

A
  1. spell check
  2. duplicates
  3. find and replace
  4. letter case
  5. spaces and non-printing characters
  6. numbers and signs
  7. dates and time
  8. merge and split columns
  9. subset data
30
Q

o Sometimes a dataset may not have all the information needed to answer the research question. This means you need to find other datasets and merge them into the current one. This can be as easy as adding geographical data, such as a postal code or longitude and latitude coordinates; or demographic data, such as income, marital status, education, age, or number of children. Enriching data improves the potential for finding fuller answers to the research question(s) at hand.

A

4. Enriching Data

31
Q

o Data validation is vital to ensure data are clean, correct, and useful. Remember the adage by Fuechsel — “garbage in, garbage out.” If the incorrect data are fed into a statistical analysis, then the resulting answers will be incorrect too. A computer program doesn’t have common sense and will process the data it is given, good or bad, and while data validation does take time, it helps maximize the potential for data to respond to the research question(s) at hand.

A

5. Validating Data

32
Q

o A key Research Data Management best practice is to ensure your data are available for appropriate use by others, which is embodied by the FAIR principles.
o Findable, Accessible, Interoperable, and Reusable
o Data should be converted to nonproprietary formats for publication, like plaint text or CSV (Comma Separated Values).

A

6. Publishing Data

33
Q
  • is an important task that improves the accuracy and quality of data ahead of data analysis.
A

Data cleaning

34
Q

- Six core data cleaning tasks

A
  1. are discovering
  2. structuring
  3. cleaning
  4. enriching
  5. validating
  6. publishing