Data Science Flashcards

1
Q

What is Data Science?

A

Data Sciences is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data. It employs techniques and theories drawn from many fields within the context of Mathematics, Statistics, Computer Science and Information Science. Data Science is a combination of Python and Mathematical concepts like statistics, data analysis, probability, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

List some applications of data science.

A

Data Science is not a new field. Data Sciences majorly works around analyzing the data and when it comes to AI, the analysis helps in making the machine intelligent enough to perform tasks by itself.
- Fraud and risk detection
- Genetics and genomics
- Internet search
- Targeted advertising
- Website recommendations
- Air route planning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the importance of data collection? Examples of datasets?

A

Data Science not only gives us a clearer idea around the dataset but also provides a deeper and clearer analyses around it. It helps in predictions and suggestions by the machine become possible on the same.
- Banks
- ATM Machines
- Movie Theatres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sources of data?

A

There exist various sources of data from where we can collect any type of data required and this can be offline or online:
OFFLINE
- Sensors
-Surveys
- Interviews
- Observations
ONLINE
- Open-sourced government portals
- Reliable websites (kaggle)
- World organizations’ open-sourced statistical websites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Points to be kept in mind for data collection?

A
  • Data which is available for public usage should only be taken up
  • Personal datasets can only be used with the consent of the owner
  • One can never breach someone’s privacy to collect data
  • Data should only be taken from reliable sources as the data collected from random sources can be wrong or unusable.
  • Reliable sources of data ensure the authenticity of data which helps in proper training of the AI model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the types of data?

A

The datasets can be stored in different formats. Some of the commonly used formats are:
- CSV: CSV stands for comma separated values. It is a simple file format used to store tabular data. Each line of the file is a data record and each data record consists of one or more fields that are separated by a comma, hence they are known as CSV files.
- Spreadsheet: A spreadsheet is a piece of paper or a computer program which is used for accounting and recording data using rows and columns into which information can be entered. Microsoft excel is a program which helps in creating spreadsheets.
- SQL: SQL is a programming language also known as structured query language. It is a domain specific language used in programming and is designed for managing data held in different kinds of DBMS ( Database Management System ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List the important python packages.

A

Important python packages help us in accessing data inside the code:
1) Numpy: which stands for numerical python, is the fundamental package in python for mathematical and logical operations on numbers.
2) Matplotlib: a visualization library in python for 2D plots of arrays.
3) Pandas: for data manipulation and analysis of python

How well did you know this?
1
Not at all
2
3
4
5
Perfectly