General Flashcards

1
Q

Sample percentile

A

Within a sample that has been ranked from least to greatest the 100p percentile of data is the value of data where

  1. ) 100p percent of the data is equal to or less than the data value and
  2. ) 100(1-p) percent are greater than or equal to it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A statistic

A

This is a numerical value that is derived from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bivariate Data Analysis

A

This is when you are investigating an IV and a DV and the relationship between IV and DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Box Plots

A

This is a plot that shows the extreme values, the first quartile, median, and third quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Central Tendency Measures

A

This is described by the mean, median, and mode of the dataset where the mean is influenced by the extreme values and median is independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Chubyshovs inequality

A

If we are trying to identify how much of a dataset lies between the values of x̄ +-ks where s = standard deviation and k = some number then

% min = 100(1- 1/(k2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Class Boundaries

A

These are the max/min of the class intervals. We use the left-end inclusion rule which says that the value to the left is included in the bin and the one in the right is not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Class intervals

A

These are the bins for grouping observations in a reasonable way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Closed Data

A

This is data that is of a fixed ratio where the maximum cannot exceed some value.

Examples include any cumulative data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Correlation Coefficient

A

r = [Σ (xi - x̄)( yi - ȳ)]/(n-1)sxsy = [Σ (xi - x̄)( yi - ȳ)]/[Σ (xi - x̄)2( yi - ȳ)2].5

This says that if we have a paired dataset such that xi,yi are the pairs and are described by their respective means such that y = mx + b then this statistic will indicate the linearity of the pairs of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cumulative Frequency

A

This shows the bins as a function of an additive frequency.

These are also called Ogives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Directional Data

A

This is data expressed in angles and can indicate how a vector is directed in space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Frequency Table

A

This is a table that displays the number of occurrences vs. a characteristic of the sample being investigated with relatively small and discrete values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gini Coefficient

A

The gini coefficient (G) is the integral of the area between L(p) = 1 and the Lorenz Curve. It has a maximum value of .5 and a minimum value of 0

G=1-2B where B = area under Lorenze curve, L(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Histograms

A

These are bar charts without spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Image Processing

A

This is an increasingly important form of analysis that involves the changing of images from signals to visuals, enhancing the signal to noise ratio, extract features, and understand patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Inferential Statistics

A

This is the practice of using statistics to make inferences about a experiment or population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Interval Data

A

These are data that are seperated by even values but they can be less than zero (temperature)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Lorenz Curve

A

This is a cumulative curve showing the income distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

mean

A

x bar = Σx/n = Σ v*f/n

where v = bin value and f = frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mean influence by multiplication/addition

A

for some function y = ax+b

y bar = a x(bar) + b so the mean is affected by both multiplication and addition in a linear way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Median

A

This is the middle value of a sample when data is arranged from least to greatest

If n is odd then the median value occurs at n = (n+1)/2

If n is even then the median is the average of (n/2)+1 and n/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Mode

A

This is the observed value that occurs most often within a dataset. If there are more than one values that occur the same number of times then there are modal values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Nominal Data

A

This is data that is non-numerical in character (fossils, minerals, rocks…)

It is occasionally converted into binary (0=not present, 1 = present)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Normal Data Set

A

This is a data set where mean=median=mode and where 68% of the data lies between x̄+-s

95% is within x̄ +-2s

99.7% is within x̄ +- 3s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Ordinal Data

A

This is ranked data that can be numerical but the intervals separating the data is not equal. (Ex: Moh’s scale of hardness). Values also cannot be negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Paired Data Sets

A

These are data sets that are trying to understand how one variable influences a different variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Population

A

This is the total collection of elements that we want to investigate. This is too large to investigate each of the contained elements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Probability Models

A

These are models that help us understand the validity of our conclusions by assigning probabilities of finding our results. It acts as the basis of statistical inference and if an inference cannot be checked using a probability model then we cannot conclude the inference is legitamate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

r meaning

A

If the slope relating y and x is <0 then r <0 and vice versa. the absolute value of r indicates the linearity of the relationship

If r is for (x, y) where w = a + bx and z = c + dy then

r(x,y) = r(w,z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Ratio data

A

This is data that is greater than zero and on a scale where each interval is spaced evenly. Examples include weight or length

32
Q

Relative frequency

A

This is f/n where f is the number of occurrences for a given phenomena and n is the total number of phenomena investigated.

The summation of f/n =1

33
Q

Sample 100p percentile

A

The data point equal to where less than 100*p% of data lies. It includes that data point

p=probability as a decimal

34
Q

Sample Variance

A

s2=Σ (xi-x)2 /(n-1)

This fundamentally finds the average values of the squared difference of a data point to the mean (hypotenuse)

It is squared to find the absolute value of the difference and not be influenced by negatives

35
Q

Samples

A

These are subgroups of populations which ideally represent the population

36
Q

Sampling Strategies (3 kinds for geology)

A

regular sampling is where sampling occurs in evenly distributed plots

Uniform sampling occurs by taking random samples within a defined area

clustered sampling takes samples from an outcrop or other area of limitted exposure

37
Q

Scatter diagrams

A

These are x vs y plots where y = DV

38
Q

signal processing

A

This includes all techniques for manipulating a signal to minimize the noise. They are most often used in combination with time series anlayses to make sense of things like geophysical data

39
Q

Spatial Analyses

A

This is a suite of techniques used to understand how observations relate to one another in 2 or 3 space.

40
Q

Spatial Data

A

This is data that is collected in either 2 or 3 space and represent the occurrence of something in space

Ex: Spatial distribution of a tracer in water

41
Q

Spread influence

A

Central tendencies are influenced by addition/subtraction. Spread is not but multiplication changes spread by the constant squared

42
Q

Statistics

A

This is the art of learning from data and includes the collection, description, and inference of data

43
Q

Stem and leaf Plots

A

These are plots for small to medium datasets of data that have two parts which can be separated to make a stem and a leaf

44
Q

Time series Analysis

A

This is understooding data sequences as a function of time. It can also include periodic oscilatory data

45
Q

Univariate Data Analysis

A

This is reserved for data that is independent meaning that every outcome under analysis does not influence the other

46
Q

Ways to Display a frequency table

A

These can be displayed using line graphs, bar graphs, or frequency polygons

47
Q

Ways to show relative frequency

A

This can be shown on a table, line graph, bar chart, relative frequency polygon, or a pie chart.

48
Q

What is a common need for descriptive statistics?

A

We need to be able to display data in a way that makes it interpretable and meaningful. It should be intuitive.

49
Q

When are normal distributions most common?

A

These are most common in very large datasets or the conglomeration of datasets

50
Q

Finding a sample percentile

A
  1. ) arrange data from least to greatest. n = number of data
  2. ) find n*p. The resultant value is the np’th smallest value which satisfies the 100p criterions.
  3. )

IF np is not an integer then ROUND UP and that value is the 100p

IF np is an integer then the 100p value is given by ((np)+(np+1))/2 This is like the median where if the median is an even number then you use the average of the n/2 and n/2+1 values

51
Q

Quartiles

A

These are the 25%th, 50%th, and 75%th values. You find what value of the data set meets these criteria by finding np where p = .25,.5,.75 and if np is whole use the average.

52
Q

Mean vs. Median when to use

A

Generally mean gives a better understanding of the dataset in terms of describing the data. The median should be used when probabilities are involved and/or the value is being used to understand the order of a group.

Ex: Housing. The mean income would be best for determining what the average person in an area can spend on a home but if we want to design housing where we could expect 50% of the population could live (P(Affordable)=.5) then the median is more useful.

53
Q

Sample Space

A

S = sample space = all possible outcomes to some event or occurrence

54
Q

Subset

A

This is a specific outcome of the sample space S consisting of one or more outcomes that can be defined within one event.

55
Q

Intersection

A

This is the ∩ symbol that is used interchangeably with “and”

To say we have two events, E ; F, which occur we could say EF, E∩F, or E and F

This represents E and F must occur together and if E and F are mutually exclusive then EF = ϕ = null event

56
Q

Null event

A

ϕ = null event = the scenerio where the input situation cannot occur. This means that there is no way for the inputs to occur as described.

Ex: if E and F are mutually exclusive then EF = ϕ because there are no parts of E and F that overlap

57
Q

Union

A

E U F = E or F which means that any outcomes within either subset or event E or F are valid.

58
Q

Compliment

A

If we have an event, E, within the sample space, S then Ec = compliment of E and includes everything that is not E

59
Q

Set Containment

A

If the occurrence of E means that F must have also occured then E is contained within F which is shown with a sideways U symbol

60
Q

Communitive law for union/intersection

A

E U F = F U E

Event E or F = Event F or E

EF=FE

Event E and F = Events F and E

61
Q

Associative Law for Union and Intersection

A

(EUF)UG = EU(FUG)

E or F or G = E or F or G

EF(G) = (E)FG

E and F and G = E and F and G

62
Q

Distributive law for intersection and union

A

(EUF)G = EGUFG

Events (E or F) and G = Events E and G or F and G

EFUG = (EUG)(FUG)

Events E and F or G = (E or G) and (F or G)

63
Q

Demorgans Laws

A

(EUF)C = ECFC

E or F do not occur = E not occurring and F not occurring

(EF)C = EC U FC

E and F not occurring = E not occurring or F not occurring

64
Q

Three axioms of Probability

A

1: 0
2: P(S) = 1
3: P(Uin Ei) =Σin P(Ei) = P(E1) + P(E2) +… +P(En)

This is assuming that Ei and Ei+1 are mutually exclusive

65
Q

Sample spaces with equally likely outcomes

A

This refers to a sample space where each outcome has an equal probability of occurring, aka there is no weight to a particular outcome

In this scenario P(E) = 1/N = p

66
Q

Theory of counting

A

This says that if we have a set of events that can occur inour sample space and each of these events creates a series of potential outcomes then the total number of outcomes is the product of the total number of secondary outcomes and the total number of first events

In other terms if we have “r” experiments and each experiment has “n” outcomes then r*n = total number of outcomes

67
Q

Permutations

A

This is a specific arrangement of a set of objects where the total number of permutations available to a subset of things is equal to n! where n is the total number of things in the subset

68
Q

number of unique groups in a set

A

If we have a sample of size = n and we want to know how many unique combinations of size r can be made from the elements of n this is

= n!/[(n-r)!r!]

This says that for a sample size=n that we can arrange “r” elements this many ways uniquely

69
Q

Combinations notation

A

The number of unique combinations of n within groups of size r

(nr) = n!/[(n-r)!r!] where r

70
Q

When is conditional probability particularly useful?

A

It is used when there is limited information within a problem (you are attempting to derive the probability of an event based on other events)

or

It is the easiest way to find the probability of a cause or input to an event with new information (backwards reasoning)

71
Q

P(E|F) = ?

A

P(E|F) = P(EF)/(P(F)

Probability of E occurring given F has occurred = the probability E and F occur divided by the probability F occurs

72
Q

P(E|Fc) = ? (expand)

A

P(E|FC) = P(EFC)/P(Fc)

This is says that the probability of E occurring given that F does NOT occur equals the probability of E occurring and F not occurring divided by the probability F does not occur.

73
Q

P(E) Expansion using compliments

A

P(E) = P(EF) + P(EFC))

Probability of E = Prob of E and F + Probability of E and not F

74
Q

P(E) = ? as a weighted average

A

P(E) = P(E|F)P(F) + P(E|Fc)(1-P(F))

The P(E) = The weighted average of E occurring if F has occurred and if E occurs and F does not occur

Where E occurs as the consequence of F.

75
Q

How to use the weighted probability of E?

A

If tasked with finding P(F|E) where E is the second event

Then P(F|E) = P(FE)/(P(E)

where P(FE) = P(F)P(E|F) and P(E) = P(F)P(E|F) + P(Fc)P(E|Fc)

76
Q

Independent events

A

If P(E|F) = P(E) then E and F are independent and E is not a function of event F

and P(EFG)=P(E)P(F)P(G)

77
Q
A