C1 : What is Data Science? Flashcards
Understand Introductory concepts.
What is MLOps?
- Machine learning operations.
- Tools that provide ongoing monitoring of models and automated retraining of drifted models.
What is a Algorithm?
A set of step-by-step instructions to solve a problem or complete a task.
What is a Model?
A representation of the relationships and patterns found in data.
* They are useful for making predictions or when analyzing complex systems.
* They retain the essential elements of the data needed for analysis.
What’s an Outlier?
A data point that differs significantly from other observations.
Potentially indicating anomalies, errors, or unique phenomena that could impact statistical analysis or modeling.
What is Structured Data?
Data is organized and formatted into a predictable schema, usually related tables with rows and columns.
What is Unstructured Data?
- Unorganized data that lacks a predefined data model.
- Which are harder to analyze using traditional methods.
- This data type often includes text, images, videos, and other content that doesn’t fit neatly into rows and columns like structured data.
What does .CSV stand for?
Comma seperated values.
What does .XLSX stand for?
Microsoft Excel Open XML Spreadsheet.
What does .XML stand for?
Extennsible Markup Language.
What does .PDF stand for?
Portable document format. (Adobe)
What does .JSON stand for?
JavaScript Object Notation.
What does .TSV stand for?
Tab Seperated Values.
What are some of the benfits of .JSON file format?
- Language-independent data format.
- Is considered as one of the best tools for sharing data of any size and type, even audio and video.
What are some of the benifits of .XLSX file format?
- XLSX uses the open file format.
- It can use and save all functions available in Excel.
- Is known to be one of the more secure file formats as it cannot save malicious code.
What are some of the benifits of the .XML file format?
- Readable by humans and machines.
- It is a self-descriptive language.
- Does not use predefined tags like .HTML does. * XML is platform independent.
What is a Data Visualization?
A visual way of representing data and it’s trends that is easily comprehensible.
What defines a Delimited Text File?
It is a plain text file where a specific character separates the data values.
What is Hadoop?
An open-source framework designed to store and process large datasets across clusters of computers.
What are Jupyter Notebooks?
An IDE and type of computational notebook that allows reserchers create to share code, equations, visualizations, and explanatory text.
(AKA, Python notebooks.)
What is the Nearest Neighbor algorithm?
An algorithm that uses proximity to make classifications or predictions about how to group an individual data point.
aka., KNN or k-NN.
What is a Neural Network?
A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways. It takes an input, processes it using previous learning, and produces an output.
What is Pandas?
- An open-source Python library that provides tools for working with structured data.
- It is often used for data manipulation and analysis.
What is R?
An open-source programming language used for statistical computing, data analysis, and data visualization.
What is a recommendatoin engine?
A computer program that analyzes user input, such as behaviors or preferences, and makes personalized recommendations based on that analysis.
What is regression?
A statistical model that identifies strength & correlation between one or more inputs and an output.
What defines Tabular Data?
Data that is orgainized into rows and columns.
What are the five characteristics of Cloud Computing?
- On-demand self-service.
- Broad network access.
- Resource pooling.
- Rapid elasticity.
- Measured service.
What is on on-demand self-service in cloud computing?
Access cloud resources such as the processing power, storage, and network without requiring human interaction..
What is broad network access in cloud computing?
When cloud computing resources can be accessed via the network through standard mechanisms and platforms such as mobile phones, tablets, laptops, and workstations.
What is resource pooling in cloud computing?
*** A schema that gives cloud providers economies of scale. **
* Whereby cloud resources are dynamically assigned and reassigned according to demand, without customers needing to know the physical location of these resources.
What is rapid elasticity in cloud computing?
A characteristic of cloud computing wherby organizations are able to access more cloud resources when they need them, and scale back when they don’t.
What is measured service in cloud computing?
A schema by which an organization only pays for what they use or reserve as they go.
* Resource usage is monitored, measured, and reported transparently based an organization’s utilization.
* If they’re not using resources, they’re not paying.
What are the three Cloud Deployment Models?
- Public Cloud.
- Hybrid Cloud.
- Private Cloud.
What is a public cloud in cloud computing?
When an orgaization leverages cloud services over the open internet on hardware owned by the cloud provider, but its usage is shared by other companies.
What is private cloud in cloud computing?
Infrastructure provisioned for exclusive use by a single organization. It could run on-premises or it could be owned, managed, and operated by a service provider.
What is hybrid cloud in cloud computing?
When an oganization is leveraging a mix of public cloud(s) and private cloud(s) that are configured to work together seamlessley.
What are the three cloud service models?
- IaaS
- PaaS
- SaaS