Network, Graph and Data Science Flashcards
Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling methods.
What is topic modeling?
Topic modeling is a method for unsupervised classification of documents, similar to clustering on numeric data, which finds some natural groups of items (topics) even when we’re not sure what we’re looking for.
https://towardsdatascience.com/latent-dirichlet-allocation-lda-9d1cd064ffa2
Supervised Machine Learning - When you provide the algorithm/model/etc. with pre-made categories and examples from those categories. Used in classification tasks - for example, finding which photos are cats, which are dogs, which are birds, etc.
https://blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/
Unsupervised Machine Learning - the algorithm attempts to automatically find structure in the data by extracting useful features and analyzing its structure.
Clustering:
Without being an expert ornithologist, it’s possible to look at a collection of bird photos and separate them roughly by species, relying on cues like feather color, size or beak shape. That’s how the most common application for unsupervised learning, clustering, works: the deep learning model looks for training data that are similar to each other and groups them together.
Anomaly detection:
Banks detect fraudulent transactions by looking for unusual patterns in customer’s purchasing behavior. For instance, if the same credit card is used in California and Denmark within the same day, that’s cause for suspicion. Similarly, unsupervised learning can be used to flag outliers in a dataset.
Association: Fill an online shopping cart with diapers, applesauce and sippy cups and the site just may recommend that you add a bib and a baby monitor to your order. This is an example of association, where certain features of a data sample correlate with other features. By looking at a couple key attributes of a data point, an unsupervised learning model can predict the other attributes with which they’re commonly associated.
Autoencoders:
Autoencoders take input data, compress it into a code, then try to recreate the input data from that summarized code. It’s like starting with Moby Dick, creating a SparkNotes version and then trying to rewrite the original story using only SparkNotes for reference. While a neat deep learning trick, there are fewer real-world cases where a simple autocoder is useful. But add a layer of complexity and the possibilities multiply: by using both noisy and clean versions of an image during training, autoencoders can remove noise from visual data like images, video or medical scans to improve picture quality.
https://blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/
Semi-Supervised Machine Learning -
https://blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/
Homophily - The degree to which nodes are alike
?
Stylometric analysis - identification using language patterns
?
Causal inference - finding out what caused phenomenon by analyzing the phenomenon itself
?
Automorphic equivalence - all store managers can be considered Automorphically equivalent based on the role
?
Structural equivalence - Parties in a network that are completely interchangeable and substitutable for each other
?
Eigenvector centrality - ranking the cumulative power or influence of a nodes connected network, emphasis on centrality
?
PageRank - ranking of the cumulative number of links or mentions of an article by other articles
?
Entrainment - The study of how the movement or actions of one party affect other parties, for example, being ‘drawn into their wake.’
?
Map generalization - simplifying map data and removing items to make things more clear
?
Directed / Nondirected Networks - Whether you pay attention to directionality of the connection or not
?
Edges - AKA Links or Lines
Different terminology is prevalent in different communities. “Graph theory” people tend to prefer “vertices and edges”, but “network science” people tend to prefer “nodes and links”. Early 20th century topologists called them “points and lines”. The definitions strongly suggest “things and pairs of things”
https://academia.stackexchange.com/questions/52659/vertices-edges-vs-nodes-links