Defining Data science Flashcards
What is data science
Art of processing data to find answers to your question
Analyzing data and trying to get answers
What we can get from data science
1- We can be descriptive ( summarize characteristics of data set, no interpretation)
2- We can be exploratory ( explore patterns, trends and relationships within a dataset; we use to generate hypothesis for future investigation)
3- It can be predictive ( in order to predict an outcome)
4- It can be inferential ( derive conclusions about data set)
5- It can be “causal” not “casual” (if we change one factor will it lead to the change of other factors
6- About underlying mechanisms of the observed patterns
Framework of doing data science:
1- Problem Identification 2- Data discovery (through screening , inventory, or aquisition) 3- Data ingestion and governance 4- Data wrangling 5- Fitness for use 6- Statistical modelling and analysis 7- Communication and dissemination 8- All that must have Ethics Review
Factors affecting problem Identification
1- Theories and Hypothesis
2- Domain expertise
3- Domain knowledge
Factors affecting data discovery
1- Potential data sources
2- Data integration
Factors affecting data discovery
1- Potential data sources
2- Data integration
Data collection types
1- Statistically designed ( surveys, experiments, remote sensing)
2- Adminstrative ( governmental agnecies, registered student data)
3- Oppurtunity ( from the internet - API)
4- Procedural ( Related to process and policies like change in insurance policy)
Data Wrangling
Transforming raw data into appropriate form for analysis
Data governance and ingestion
Data governance: establishment and adherence to rules regarding data access, dissemination and destruction
Data ingestion
Bringing data into data management platforms
Fitness for use assessment
Assessing the constraints on data by the statistical methods
Data analysis type
1-Summarization
2-Visualization
3-Classification: predicting category for new data
4- Regression: Predicting quantitative value for new observation
5- Clustering finding unlabeled subgroups
6- Estimation: taking measurements for small numbers in a large group and making a good guess for the large group
Variable types:
Quantitative and Qualitative
Quantitative : measurements are close in value and nature like pressure of wind
Qualitative: assumes values in finite set , also known as categories, discrete variables and factors
Attribute:
Data field representing characteristic of data object
Qualitative attribute- Nominal
Like patient ID , occupation