Chapter 3 Data Preprocessing Flashcards
What are the method for imputation
Imputation is a method which range from simple technique such as Mean, Median,
and Mode to complex technique such as regression, interpolation, K Nearest Neighbors and etc
What is moving average?
It is a method for smoothing time-series data by calculating the average of a window of adjacent data points over a specified time period.
What are the techniques to calculate moving average?
Moving averages can be calculated using a variety of techniques, including simple moving average (SMA), weighted moving average (WMA), and exponential moving average (EMA).
What is gaussian smoothing
It is a method of data smoothing that involves convolving the data with a
Gaussian kernel which is a bell shaped curve that assigns weights to neighboring
data points
The method is also known as Gaussian blur or Gaussian filtering
Describe the different characteristics of ETL!
maintenance
It requires more maintenance and more knowledge
Processing time
Processing time increases as the data volume increases because all transformations must take place
Infrastructure
An on-premises environment that is expensive and difficult to scale is essential
Costs
High initial and running costs
Describe the different characteristics of ELT!
maintenance
Virtually maintenance-free as we move raw data
Processing time
Processing time is significantly less dependent on the amount of data, because we migrate raw data
Infrastructure
It uses cloud services such as SaaS or PaaS, which do not need to be installed. They enable dynamic scalability.
Costs
Low start-up costs, downstream costs depending on data volume
How data integration helpful in business?
In the business world, data integration helps organizations to gain a unified view of their operations, customers, and markets, which can then be used for reporting, analysis, and decision-making purposes. Data integration can help businesses to streamline their processes, reduce costs, and improve their overall performance.
How data integration helpful in healthcare?
In healthcare, data integration helps to combine data from various sources such as electronic health records, lab reports, and medical imaging, to provide a comprehensive view of a patient’s health history. This can help healthcare providers to make more informed decisions about patient care, improve patient outcomes, and reduce healthcare costs.
How data integration helpful in manufacturing?
In manufacturing, data integrationintegrationis usedusedto combine data from various
sources suchsuchas production systems, sensors, and supply chain systems, to provideprovideaunified viewviewof the manufacturing processprocess.This cancanhelp manufacturersmanufacturersto optimize their
production processes, reduce costs, and improve product quality.