Project Cycle- Data Acquisition Flashcards
define data
. Data can be a piece of information or facts and statistics collected together
for reference or analysis
define training data
it is the primary dataset tht is fed into the system for the purpose of developing and training it.
give some examples where the machine takes in different trainng data
- text catgorization-the input is asentence and the target tells the topic of the secntence.
- image recognition-the input is an image which is analysed
- sentiment analysis- The input is a sentence or a phrase from social media feeds like twitter, facebook or customer reviews from web sites or surveys
- spam detection-where input is an email or text message which is analyzed as spam or not
what is the validating data set
Also called secondary data set
This data is used to check if the newly developed model is correctly identifying the data for making predictions.
what does validating step ensure
This step makes sure that the new model has not become specific to the primary dataset values in making predictions.
If that is the case then corrections and tweaks are made in the project.
The primary and the secondary data sets are also re runs through the model untill the desired accuracy is achieved.
define testing data
it is the final dataset which paves the way for the machine model to enter the real world and start making predictions
how does testing data differ from training and validating data
All primary and secondary data come with relevant label tags on the data
The testing data is the final dataset which provides no help in terms of tag to the model produced
define datawarehousing
Data is always collected in bulk from various sources using various formats. The storing of this data is called data warehousing
define data features
data features are the factors and parameters that affect the problem directly or indirectly.
what shud be the chracteritics of trainign data
For better efficiency of an AI project, the Training data needs to be relevant and authentic. Data plays an important part of the AI project as it creates the base on which the AI project is built. Therefore, the data acquired should be authentic, reliable and correct.
what should be the characteristics of our data sources
it is necessary
to find a reliable source of data from where some authentic information can be taken. At the same
time, we should keep in mind that the data which we collect is open-sourced and not someone’s
property. Extracting private data can be an offence.
what are most reliable and authentic data soucres
One of the most reliable and authentic sources of
information, are the open-sourced websites hosted by the government. These government portals
have general information collected in suitable format which can be downloaded and used wisely.
ex: data.gov.in, india.gov.in
examples of some data sources
- cameras
- sensors
- surveys
- observations
- web scraping
- application program interface
define system map
it is a tool used to infer relationships between the different data features.
- the data features are put in circles
- the direction of the relationship is coneveryed by the directionof the arrowhead
- the nature of the relationship is conveyed by the + or - sign. a ‘+’ sign indicates that two features are directly related, while a ‘-‘ sign indicates that two features are inversely related,