ML Concepts Flashcards
What is the data network effect?
Property of a product that improves with the more data it has available, due to emergent relationships between segments of the data.
What does a sandboxed pilot involve? And what does it replicate?
Isolates untested code changes and outright experimentation from the production environment or repository,
Sandboxing protects “live” servers and their data, vetted source code distributions, and other collections of code, data and/or content, proprietary or public
Sandboxes replicate at least the minimal functionality needed to accurately test the programs or other code under development
What is data exhaust? And what type is it usually? And what form can it be in?
is the trail of data left by the activities of an Internet or other computer system users during their online activity, behavior, and transactions.
Every visited website, clicked link, and even hovering with a mouse is collected, leaving behind a trail of data.
An enormous amount of often raw data are created, which can be in the form of cookies, temporary files, logfiles, storable choices, and more.
What is a convertible note? And what provisions often included?
A short-term loan that converts into equity when a startup raises their next round of funding, often in 12 to 18 months.
The note defers the company’s valuation to this next round of funding.
Other provisions in most convertible notes include what
- An interest rate (typically 8 percent to 10 percent) and maturity date
- A cap on the valuation price for note holders when the loan converts to equity
- A discount rate on the share price when the note converts.
What is involved in deploying the cloud? And what does it include?
Enablement of SaaS , PaaS, or IaaS solutions that may be accessed on demand by end users or consumers.
Includes all of the required installation and configuration steps that must be implemented before user provisioning can occur.
What is a data moat? And what does it need?
A proprietary data set.
It’s a modern extension of traditional “moats” of business, such as vendor lock-in, branding, trade secrets, efficiencies of scale, and regulatory capture
A data moat needs 1) an accurate collection of relevant data, 2) the culture to effectively use it
What does active learning involve in ML?
And what is it called in statistic?
Where a learning algorithm can interactively query a user (or some other information source) to label new data points with the desired outputs.
In statistics literature, it is sometimes also called optimal experimental design.The information source is also called teacher or oracle.
What is transfer learning in ML? And whats a good example?
A research problem that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.
For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks.
What is synthetic data? (And what does it not reveal?)
Artificial data that is created by using different algorithms that mirror the statistical properties of the original data
But does not reveal any information regarding real people
What is adversarial training? And what is most common reason?
Is a machine learning technique that attempts to fool models by supplying deceptive input.
The most common reason is to cause a malfunction in a machine learning model.
What is the federated model in ML? And what does this stand in contrast of?
ML technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them; and sends a periodic summary of its learning as an encrypted message to the cloud.
This approach stands in contrast to traditional centralized machine learning techniques where all the local datasets are uploaded to one server, as well as to more classical decentralized approaches which often assume that local data samples are identically distributed.
Also known as collaborative learning
What is GANs? And why are they exciting?
Generative adversarial networks (GANs) are an exciting recent innovation in machine learning.
GANs are generative models: they create new data instances that resemble your training data. For example, GANs can create images that look like photographs of human faces, even though the faces don’t belong to any real person
What is approximate computing? Whats an example?
An emerging paradigm for energy-efficient and/or high-performance design.
It includes a plethora of computation techniques that return a possibly inaccurate result rather than a guaranteed accurate result, and that can be used for applications where an approximate result is sufficient for its purpose.
One example of such situation is for a search engine where no exact answer may exist for a certain search query and hence, many answers may be acceptable.
What is function approximation?
A technique for estimating an unknown underlying function using historical or available observations from the domain.
Artificial neural networks learn to approximate a function.
What is different of type I and type II error?
Equivalent to what?
And what example?
Type I error refers to non-acceptance of hypothesis which ought to be accepted.
- false positive
- innocent person go to jail
Type II error is the acceptance of hypothesis which ought to be rejected.
- False negative
- guilty person free
Lets take an example of Biometrics. When someone scans their fingers for a biometric scan, a Type I error is the possibility of rejection even with an authorized match. A Type II error is the possibility of acceptance even with a wrong/unauthorized match.