College 11: Graphs Flashcards
Benefits of graphs
- Capture naturally heteregeneous data
- No specific schema
- Shows how different parts connect to each other
- Hard to model with relational joins
- Very important for more informative insights
Property Graph Model
- Nodes
* Represent Objects
* Can be labeled - Relationships
* Relate nodes
* Have a label
* Have direction - Properties
* < Name:value > pairs
* Describe characteristics
* Can be on nodes or on edges
Different graph sorts
Social networks, media networks, information nets, transportation networks, communication nets
Neo4J
Neo4J is Whiteboard friendly, Relationship-focused, and Scalable.
Neo4j is a graph database that prioritizes relationships over indexes, storing data natively as nodes, relationships, and properties across separate store files. Nodes and relationships are managed as fixed-size records with specific bits for reclaiming and flags for characteristics like dense connectivity. Relationships in Neo4j have types and are organized in a doubly linked list, connecting nodes through relationship records. Properties are stored as key-value pairs.
Gremlin features
- Designed to work with various graph databases and frameworks, providing a powerful and flexible way to query and manipulate graph data
- Java, Rest, Cypher, Gremlin
- Gremplin: Graph traversal language
- g: variable that represents a graph
- Profiles
- Returns always a se
Graph Searches
- Breadth first: explore nearest neighbors first (application: find nearest bank branches that have ATM)
- Depth first: explore as far as possible, then backtrack until you can start exploring again (application: trace root of a problem)
Degree centrality
Measures number of relationships a node has.
To estimate popularity of some service or identify fraudsters from legitimate users (centrality of fraudsters tend to be higher in order to inflate prices)
Community detection
- Find groups of users such that relationships betwene memebrs of the group are stronger than relationships across groups
- Help identify clusters of nodes, isolated groups, or network structure
Cypher
A pattern matching query language made for graphs: declarative, expressive, pattern-matching.
Example:
//Find all of the co-actors Tom Hanks have worked with
MATCH (:Person {name:”Tom Hanks”})–>(:Movie)<-[:ACTED_IN]-(coActor:Person)
RETURN coActor.name
Neo4J Strengths and Weaknesses
Strengths:
* Open-source.
* Scales well with large datasets (up to 34.4 billion nodes and relationships).
* Typeless and schemaless, though type simulation is available through plugins.
Weaknesses:
* Replicates the entire graph rather than subgraphs.
* Graph size is limited to millions despite high scalability.
**
Neo4J internals
Data Storage: Separate files for nodes, relationship labels, and properties.
Memory Management: Uses Least Recently Used (LRU) cache.
Transactions: Managed in-memory with write-ahead logs, also acting as lock managers.
Scalability and Performance
Scale: Supports tens of billions of nodes, properties, and relationships due to its efficient data model.
Latency: Low response times due to traversal-based querying.
Throughput: High read and write throughput due to data locality.
High Availability and Load Balancing
High Availability: Multiple synchronized nodes with a write-master and read-slave architecture. Writes through slaves are not immediately synchronized.
**Load Balancing: **Separates read and write operations and supports cache sharding.