Definitions Flashcards
What is a Categorical Variable?
Contains values reflecting different categories. The categorical variable TreatmentType has the nominal values Placebo, Existing_Treatment, and Experimental. In study design contexts, categorical variables could also be referred to as factors, with values as levels
What are Frequency Tables?
- A frequency table lists the categories in a categorical variable and gives the count of observations for each category
- “Frequency” is just another word for count
What are examples of Categorical Data Visualization?
- Bar Graph
- Pie Chart
- Mosiac Plots
What is a Bar Chart or Bar Graph?
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent.
What is a pie chart?
A circular statistical graphic, which is divided into slices to illustrate numerical proportion
How do you create a bar graph in R Studio?
horiz=TRUE,
bargraph( ~ file-name-here,
data = survey_xlsx_cleaned,
main=”TITLE OF GRAPH” col=c(“color-here”),
xlab=”Name of X-Axis”,
ylab=”Name of Y Axis”)
How do you create a tally in R studio?
tally(~ file-name-here, format = “put in counts, proportions, and percentages”, data = survey_xlsx_cleaned
, margins=TRUE)
How do you create a pie chart in R Studio?
percents <- tally(-FILE-NAME-HERE, format = “percent”, data = survey_xlsx_cleaned
)
piepercent<- round(percents, 1)
pie(percents,
radius=1, #radius of pie chart
labels = piepercent,
main = “TITLE OF PIE CHART”,
col = (“turquoise”, “yellow”), # Colors of the pie chart
cex = 0.75) # Label text size
legend(“topright”, #position of Legend
c(“‘No”, “Yes”), #Data Labels
cex = 0.7, #size of the legend
fill = (“turquoise”, “yellow”))* #Color of legend*
What are some things you want to avoid in things to avoid with Data Visualization?
- Biased labeling - Are labels empirically valid?
- Misleading scales
- No scale or labels
- Excessive Visualizations
- Unequal Areas (Violation of Area Principle)
What is a Contingency Table?
- A contingency table (or cross-tabulation) is a table of counts, proportions, or percentages from two categorical variables
- It’s called a contingency table because it can tell us how cases are distributed along each variable contingent (or conditional) on one or more categories of the other variable
What is Statistics?
- Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data
- Can be applied to questions related to scientific, industrial, social or other settings
What is the process of investigation?
- Identify question/problem
- Collect relevant data
- Analyze data
- Make a conclusion
What is Data in Statistics?
Set of observations
What are some examples of Categorical - qualitative data?
Factor
Levels - different qualities
Gender, yes/no answer, smoker or not…
Nominal - no inherent order to category
* Religious affiliation, political party…
Ordinal
* Expected or natural order
* Highest education, level of approval - strong–>weak…
What are some examples of Numerical - quantitative data?
Take on values that are numbers
Add, subtract, take averages
Continuous:
* Can take on any number
Discrete - count:
* Only non negative counting numbers
* Numbers of votes for a politician