Chapter 6 Basic Data Analysis Principles Flashcards
________ data consists of a single variable such as cost data for a single element or a set of historical cost growth factors for various programs in a given phase. It can be displayed graphically using histograms, bar graphs , or boxplots. It is rare in cost estimating
Univariate
________data has one independent variable and one dependent variable. An example of _______ data is software development cost as a function of the number of lines of code.
Bivariate
_________data has several independent variables and one dependent variable. and example of _______ data would be the dependent variable of the cost of ship supplies as a function of two independent variables
Multivariate
__________data differs from univariate, bivariate, and multivariate data and requires a different approach. Some examples are cost growth as a function of the year since program initiation or worker productivity measured by quarter
Time series
________indicates a marked change in the nature of the data occurring at some point or over some period. An example of a _________ would be lower cost growth in programs entering production due to a specific change in acquisition law.
Paradigm shift
_________are repeating periodic trends and can occur at any interval. They are often found in seasonal data . Maintenance actions such as ship overhauls are another example
Cycles
__________ is present when the variable (e.g., health, weather) in time
“t” is correlated to the variable in time “t−1” which is correlated to the variable in time
“t−2”. In other words, the value of the variable in the present is correlated to the value of the variable in the previous time period(s). __________ occurs due to dependencies within the data, usually when the data is from the same source.
Autocorrelation
__________involves examining cost data descriptive statistics, assessing potential outliers, and comparing historical results
Data validation
____________characterize and describe the data by revealing the central tendency of the data as well as the dispersion. They are calculated for each data group and especially for the cost data of the element in question.
They include:
the sample size (i.e., the number of data points selected for analysis),
mean,
standard deviation,
Coefficient of Variation (CV), and
specialized averages.
Descriptive statistics
______________are either weighted averages for cost data representative of different quantities or moving averages for time series data
Specialized averages
___________are data points that fall far from the central mass of the data and may distort both descriptive and inferential statistics.
Outliers
____________ Compare results to historical data whenever possible to confirm the data’s validity.
Standard Factors
The most popular measure of central tendency is the _______. The _________of a data set is the sum of the data values divided by the number of data points.
Mean
The _______ is the middle data point such that exactly half of the remaining data points are lower than the median and half are higher than the ________.
Median
The ______ is the most frequently occurring point and is the least used of the three measures of central tendency.
Mode