INTRO+DATASETS Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is Data Science

A

process of building, cleaning, structuring datasets to analyse and extract meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Process of Data Science

A
  1. Ask interesting qn
  2. Get data
  3. Explore data
  4. model data
  5. visualize and communicate results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

key principles in DS

A
  • get many data sources
  • understand how data collected
  • use statistical models
  • understand correlations
  • good comm skills
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the discussion of probability include

A

-random experiments that produce a series of possible outcomes (can be infinity outcomes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

elements of probability model (uncertainty of experiment)

A
  • sample space(ohm symbol)(set that contains all possible outcomes. outcomes are mutually exclusive and collective exhaustive)(an event is a collection of one or more outcomes–subset of sample space)
  • probability fraction p(A) assigns event A a no. between 0 and 1. Complement of event A= A^c– p(A^c)=1-p(A)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

conditional probability

A

probability of outcome A given that event B (DENOMINATOR)has occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

independent

A

A and B are independent if the occurrence of B provides no information about A. intersect of events A and B =P(A)*P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variable?

A

variable is any characteristic observed in a study. summary of ALL outcomes in a random process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

quantitative variable

A

there is meaningful distance between any 2 points of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

types of categorical variable

A
  • ordinal

- nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

types of quantitative variable

A
  • discrete (separate numbers)

- continuous (possible values form an interval)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

distribution of a variable (probability distribution)

A

list of possible outcomes+associated probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cumulative probability distribution

A

probability that the discrete variable is less than or equal to a particular value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

probability density function (used for continuous variable as impossible to list down all values and prob for each value

A

Probability density function (PDF) is the probability that the value of a continuous variable falls within an interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

cumulative density function

A

Cumulative distribution function (CDF) is the probability that the variable is less than or equal to a particular value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

modal category?

A

category with the highest frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bar plot (common way to display categorical variable)

A

One vertical bar for each possible category that could occur,
with the height proportional to the frequency of that category.

18
Q

Histogram(quantitative variable)

A
  • Divide the range of data into intervals of equal width.
  • Count the number of observations that fall within each interval.
  • Label the intervals on the x-axis.
  • Draw a bar over each interval
19
Q

Weakness of range?

A

sensitive to extreme observations

20
Q

variance definition

A

average squared deviations from the mean

21
Q

empirical rules of SD

A
  • 68% of observations fall within +-1SD
  • 95% fall within +-2SD
  • almost all fall within +-3SD(check for outliers)
22
Q

interquartile range

A

range between upper and lower quartiles (robust to outliers)

23
Q

5 number summary

A

min , lower quartile, median (X0.5), upper quartile, max (min N max NOT considering outliers)

24
Q

when does an association exist?

A

if a particular value for a certain variable(response/dependent) is more likely to occur with certain values of another variable(explanatory/independent)

25
Q

covariance

A

measures the extent to which two variables move in the same direction

26
Q

correlation

A

covariance between two variables divided by the product of their standard deviations

27
Q

To check your working directory

A

getwd()

28
Q

get data types

A

class(a) (if a assigned to smth)

29
Q

true or false class?

A

logical

30
Q

Creating a vector of numbers and name it x

A

x=c(1,2,3,4) x= 1 2 3 4 class:numeric

31
Q

length of vector

A

length(x)

32
Q

alternative ways to write

x = matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)

A

x=matrix(c(1,2,3,4),2,2)

x=matrix(1:4,2,2)

33
Q

vector by row first?

A

y=matrix(1:4,2,2,byrow=TRUE)

34
Q

class of matrix?

A

‘matrix’ ‘array’

35
Q

dimension of matrix

A

dim(x)=2 2 (row then column)

36
Q

extract component from row 2, column 3 of matrix A

A

A[2,3]

37
Q

attain subset of first row of A

A

A[1,]

38
Q

delete first row of A

A

A[-1,]

39
Q

list out all objects?

A

ls()

40
Q

remove one or all object?

A
  • rm(x)

- rm(list=ls()) (must contain name or character strings)