Week 3 Flashcards
Basic descriptive stats ?
Maximum
Minimum
Mean
Median
Visualisation techniques ?
Regressions
Correlations
Standard Deviation
Histograms
the Data Management Association (DAMA) outlines six elements that good quality data should have ?
Completeness
Uniqueness (no duplicates)
Consistency
Timeliness
Validity
Accuracy
Completeness ?
Completeness is when all records are present. Accuracy and completeness are not the same things, complete records can be not accurate.
Uniqueness ?
This is when there are no duplicates in records. This means there is only one entity it represents and each value is stored once. However, some fields should be unique i.e. passport number, but other fields like DOB are less likely to be unique.
Consistency ?
This is how much the data might contradict each other when it is supposed to be representing the same entity. For example, if the order date for goods is after the date the goods were received. The data would be consistent if it does not contradict another data set. The address of a supplier is the same on two different databases.
Timeliness ?
Timeliness describes the degree to which an accurate reflection on the period that they represent, and the data and its value are up to date. Some data is static, i.e DOB, whereas other data changes such as income. Annual report data is often criticised for not being timely, as there is a gap between data collection and reporting, as most large PLCs have a lag of around 3 months between the end of the financial year and reporting.
Validity ?
Validity describes the degree to which data is the range and format expected. For example, DOB is not before the present day and within a reasonable range. Valid data is stored in a data set in the appropriate format for that data. For example, sales revenue is in currency format, not in plain text.
Accuracy ?
Accuracy describes the degree to which the data matches reality. There may be bias within the data, which can affect the accuracy of the data. This bias needs to be communicated to the users. In a data set, individual records can be measured for accuracy, or the whole data set can be measured, which you chose depends on the purpose of the data and your business needs.