Data Management Part 1 Flashcards
- is a process by which information is acquired and processed to ensure the accessibility and reliability of the data for its users.
- One of the most important tool in processing and managing such information is statistics.
Data Management
- is a science which deals with the collection, organization, presentation, analysis, and interpretation of data so as to give a more meaningful information.
- subdivided into two branches, namely: descriptive statistics and inferential statistics
Statistics
- refers to the collection, organization, summary, and presentation of data
- Examples are the measures of location, measures of variability, skewness and kurtosis.
Descriptive Statistics
- deals with the interpretation and analysis of data where conclusion is drawn based from the subset of the population.
- Examples are hypothesis testing and regression analysis
Inferential statistics
5 stages in statistical investigation
- Collection of Data
- Organization of data
- Presentation of data
- Analysis of data.
- Interpretation of data.
- Is a characteristic or attribute that can assume different values in different persons, places, or things.
- includes age, race, gender, intelligence, personality type, attitudes, ethnic group or patients, height, weight, heart rate, marital status, eye color, etc.
Variable
- data which can assume values that manifest the concept of attributes.
- are sometimes called categorical data.
- e.g. person’s gender, home town, birthdate, post code, marital status, eye color, etc.
Qualitative variables
- data are obtained from counting or measuring.
- Numerical data which represents the numerical value i.e. how much, how often, how many
- Numerical data gives information about the quantities of a specific thing e.g. height, length, weight, test score, and so on.
Quantitative variables
- contains only a finite number of possible values.
- this type of data can’t be measured but it can be counted. e.g. number of students in a class
Discrete variables
- Continuous data has an infinite number of probable values that can be selected within a given range.
- This type of data can’t be counted but it can be measured. e.g. temperature range
Continuous variable
Levels of measurement:
* values in the variable are used to label or classify variables. It has no order.
* words, letters and alpha numeric symbols can be used.
Nominal
Levels of measurement:
* values represent discrete and ordered units. It follows a natural order
Ordinal
Levels of measurement:
* values tell the distances between the measurements in addition to the classification and ordering. It does not have a true zero point.
Interval
Levels of measurement:
* is the most informative level of measurement. The combination of first three levels of measurements. It also order units that have the same difference.
Ratio
the entire group that you want to draw conclusions about
population
is a way of selecting individual members or a subset of the population to make statistical inferences from them and estimate characteristics of the whole population.
sampling methods
the specific group of individuals that you will collect data from.
sample
means that every member of the population has a chance of being selected. It is mainly used in quantitative research
Probability sampling
involves non-random selection based on convenience or other criteria, allowing you to easily collect data. It is often used in exploratory and qualitative research
non-probability sampling
every member of the population has an equal chance of being selected. Your sampling frame should include the whole population. Two ways of: lottery or fishbowl technique and table of random numbers.
simple random sampling
is similar to simple random sampling, but it is usually slightly easier to conduct. Every member of the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular intervals
systematic sampling
involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. Sometimes referred to as “area sampling”
Cluster sampling
to use this sampling method, divide the population into subgroups (called strata) based on the relevant characteristic (e.g. gender, age range, income bracket, job role).
stratified random sampling
simply includes the individuals who happen to be most accessible to the researcher
convenience sampling
are always at least somewhat biased, as some people will inherently be more likely to volunteer than others.
voluntary response sampling
also known as judgement sampling, involves the researcher using their expertise to select a sample that is most useful to the purposes of the research.
purposive sampling
can be used to recruit participants via other participants. The number of people you have access to “snowballs” as you get in contact with more people.
snowball sampling
provide raw information and first-hand evidence
primary sources
provide second-hand information and commentary from other researchers.
secondary sources
the researcher asks questions of a large sampling of people, either by direct _____ or means of mass communication such as by phone or mail. This method is by far the most common means of data gathering
interviews
data gathering is an indirect interview, used when potential respondents know why they’re being asked questions and hesitate to answer. The interviewees get an incomplete question, and they must fill in the rest, using their opinions, feelings, and attitudes
projective technique
each expert answers questions in their field of specialty, and the replies are consolidated into a single opinion
delphi technique
like interviews, are a commonly used technique. The group consists of anywhere from a half-dozen to a dozen people, led by a moderator, brought together to discuss the issue
focus groups
are a simple, straightforward data collection method. Respondents get a series of questions, either open or close ended, related to the matter at hand
questionnaires
In this method, collected data are presented in narrative and paragraph forms. This mode of presentation combines text and figures in a statistic
textual presentation
This mode of presentation is better than textual form. The data are systematically presented through tables consisting of vertical columns and horizontal rows with headings for an easier and more comprehensible comparison of figures.al report.
tabular presentation
Data gathered are presented in visual or pictorial form. This would enable the researcher to get clear view of the relationships of data through pictures and colored maps
graphical presentation
It is the most widely used practical device effective in showing a trend over a period.
line graph
It is the simplest form of graphic presentation. It is generally intended for comparison of simple magnitude. It may be either horizontal or vertical
bar graph
It is a circle divided into parts whose sizes are proportional to the magnitude or percentages they represent. It is used to show component parts of a whole.
Circle graph or pie chart
Graphical presentation that uses pictorial symbols for population to indicate data.
pictograph
a tabular arrangement of data showing its classification or grouping according to magnitude or size.
frequency distribution (table)
The end numbers of a class. It is the highest and the lowest values that can go into each class
class limits
Are the “true” class limits defined by lower and upper boundaries
class boundaries
Also known as class midpoint. It is the average of the lower and upper limits or boundaries of each class. It may be represented by the letter x.
class mark
The range values used in defining a class. It is simply the length of each class. It is the difference or distance between the upper and lower class boundaries of each class, and is affected by the nature of the data and by the number of classes
class interval
The width of each class interval.
class size
This is derived by getting the ratio of the number of items in each class to the total number of frequency. This may be expressed in percent. Its total sum must be equal to 100%
relative frequency distribution
is desired to determine the number or percentage of values “greater than” or “lesser than” a specified value
cumulative frequency
is a special bar graph constructed by plotting the class boundaries on the horizontal axis against frequencies plotted on the vertical axis. When class intervals have uniform width the width of the bar must also be uniform
histogram
a closed broken line curve constructed by plotting the class marks on the horizontal or x-axis against the class frequencies which are plotted on the vertical y-axis.
frequency polygon
- is the graph of a cumulative frequency distribution. It is constructed by plotting the class boundaries on the horizontal or x-axis against the cumulative “less than” and “more than” frequencies plotted on the vertical or y-axis
- may look like an open pair of scissors
O-give curve