Definitions Flashcards
(31 cards)
What is a Categorical Variable?
Contains values reflecting different categories. The categorical variable TreatmentType has the nominal values Placebo, Existing_Treatment, and Experimental. In study design contexts, categorical variables could also be referred to as factors, with values as levels
What are Frequency Tables?
- A frequency table lists the categories in a categorical variable and gives the count of observations for each category
- “Frequency” is just another word for count
What are examples of Categorical Data Visualization?
- Bar Graph
- Pie Chart
- Mosiac Plots
What is a Bar Chart or Bar Graph?
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent.
What is a pie chart?
A circular statistical graphic, which is divided into slices to illustrate numerical proportion
How do you create a bar graph in R Studio?
horiz=TRUE,
bargraph( ~ file-name-here,
data = survey_xlsx_cleaned,
main=”TITLE OF GRAPH” col=c(“color-here”),
xlab=”Name of X-Axis”,
ylab=”Name of Y Axis”)
How do you create a tally in R studio?
tally(~ file-name-here, format = “put in counts, proportions, and percentages”, data = survey_xlsx_cleaned
, margins=TRUE)
How do you create a pie chart in R Studio?
percents <- tally(-FILE-NAME-HERE, format = “percent”, data = survey_xlsx_cleaned
)
piepercent<- round(percents, 1)
pie(percents,
radius=1, #radius of pie chart
labels = piepercent,
main = “TITLE OF PIE CHART”,
col = (“turquoise”, “yellow”), # Colors of the pie chart
cex = 0.75) # Label text size
legend(“topright”, #position of Legend
c(“‘No”, “Yes”), #Data Labels
cex = 0.7, #size of the legend
fill = (“turquoise”, “yellow”))* #Color of legend*
What are some things you want to avoid in things to avoid with Data Visualization?
- Biased labeling - Are labels empirically valid?
- Misleading scales
- No scale or labels
- Excessive Visualizations
- Unequal Areas (Violation of Area Principle)
What is a Contingency Table?
- A contingency table (or cross-tabulation) is a table of counts, proportions, or percentages from two categorical variables
- It’s called a contingency table because it can tell us how cases are distributed along each variable contingent (or conditional) on one or more categories of the other variable
What is Statistics?
- Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data
- Can be applied to questions related to scientific, industrial, social or other settings
What is the process of investigation?
- Identify question/problem
- Collect relevant data
- Analyze data
- Make a conclusion
What is Data in Statistics?
Set of observations
What are some examples of Categorical - qualitative data?
Factor
Levels - different qualities
Gender, yes/no answer, smoker or not…
Nominal - no inherent order to category
* Religious affiliation, political party…
Ordinal
* Expected or natural order
* Highest education, level of approval - strong–>weak…
What are some examples of Numerical - quantitative data?
Take on values that are numbers
Add, subtract, take averages
Continuous:
* Can take on any number
Discrete - count:
* Only non negative counting numbers
* Numbers of votes for a politician
What is meant by a Population?
An entire group that you want to draw conclusions about
What is Sampling?
The selection of a subset of the population of
interest in a research study
What is a census?
Difficult and expensive way to collect data on the population
What is Descriptive Statistics?
Organizing and presenting data from a pop/sample
* Presenting data
* Summarizing data
What is Inferential statistics?
Making conclusions about population based on data from a sample
* Estimating it.
* Hypothesis testing
What is Parameter notation?
a. Statistic is a numerical summary based on a sample
b. Parameter - numerical summary of a population
c. Keep track of sample vs. census/population
d. Lower case n for sample, N for population
e. Sample statistics usually latin
f. Population usually greek
What is Anecdotal evidence?
a. Evidence based on a very limited sample size that is not representative
b. Usually composed of odd cases
What is a Study Design?
Process of investigation
* Identify
* Collect data
* Analyze data
* Draw conclusion
Explain Observational Data collection.
- Cross sectional
a. Data collected at one point in time on a set of individuals - Longitudinal
a. Data collected over time on same individuals - No attempt to intervene
- Typically cannot prove cause and effect
- Survey is typical observational study
- Careful about interpreting results
- Can always be confounding or lurking variables
a. Variable not included in study that has an effect on variables studied