Chapter 6 Basic Data Analysis Principles Flashcards

1
Q

________ data consists of a single variable such as cost data for a single element or a set of historical cost growth factors for various programs in a given phase. It can be displayed graphically using histograms, bar graphs , or boxplots. It is rare in cost estimating

A

Univariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

________data has one independent variable and one dependent variable. An example of _______ data is software development cost as a function of the number of lines of code.

A

Bivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

_________data has several independent variables and one dependent variable. and example of _______ data would be the dependent variable of the cost of ship supplies as a function of two independent variables

A

Multivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

__________data differs from univariate, bivariate, and multivariate data and requires a different approach. Some examples are cost growth as a function of the year since program initiation or worker productivity measured by quarter

A

Time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

________indicates a marked change in the nature of the data occurring at some point or over some period. An example of a _________ would be lower cost growth in programs entering production due to a specific change in acquisition law.

A

Paradigm shift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

_________are repeating periodic trends and can occur at any interval. They are often found in seasonal data . Maintenance actions such as ship overhauls are another example

A

Cycles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

__________ is present when the variable (e.g., health, weather) in time
“t” is correlated to the variable in time “t−1” which is correlated to the variable in time
“t−2”. In other words, the value of the variable in the present is correlated to the value of the variable in the previous time period(s). __________ occurs due to dependencies within the data, usually when the data is from the same source.

A

Autocorrelation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

__________involves examining cost data descriptive statistics, assessing potential outliers, and comparing historical results

A

Data validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

____________characterize and describe the data by revealing the central tendency of the data as well as the dispersion. They are calculated for each data group and especially for the cost data of the element in question.
They include:
the sample size (i.e., the number of data points selected for analysis),
mean,
standard deviation,
Coefficient of Variation (CV), and
specialized averages.

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

______________are either weighted averages for cost data representative of different quantities or moving averages for time series data

A

Specialized averages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

___________are data points that fall far from the central mass of the data and may distort both descriptive and inferential statistics.

A

Outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

____________ Compare results to historical data whenever possible to confirm the data’s validity.

A

Standard Factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The most popular measure of central tendency is the _______. The _________of a data set is the sum of the data values divided by the number of data points.

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The _______ is the middle data point such that exactly half of the remaining data points are lower than the median and half are higher than the ________.

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The ______ is the most frequently occurring point and is the least used of the three measures of central tendency.

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If the mean and median of a distribution are not equal, it is said to be ________.

A

Skewed

17
Q

The _________ is the average squared distance of the data points from their mean; it is a measure of the spread of a distribution.

A

Variance

18
Q

A lower variance indicates less ___________, or spread

A

dispersion

19
Q

The ___________ of a distribution is calculated as the square root of the variance and measures the absolute distance of the data points from their mean:

A

standard deviation

20
Q

The ______________ is a measure of the size of the standard deviation relative to the mean and is expressed as a percentage. This descriptive statistic is unitless and therefore allows for comparison of the variability across distributions.

A

Coefficient of Variation (CV)

21
Q

_________ graph two variables along two axes and can illustrate attributes such as central tendency and the dispersion.

A

Scatter plots

22
Q

___________provide a common way to show the density of univariate data by grouping the data into several bins and plotting the bins on the horizontal axis with the frequency or relative frequency on the vertical axis.

A

Histograms

23
Q

____________present categorical data as rectangles whose height or length correspond to their associated value for each category.

A

Bar Charts