College 11: Graphs Flashcards

1
Q

Benefits of graphs

A
  • Capture naturally heteregeneous data
  • No specific schema
  • Shows how different parts connect to each other
  • Hard to model with relational joins
  • Very important for more informative insights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Property Graph Model

A
  1. Nodes
    * Represent Objects
    * Can be labeled
  2. Relationships
    * Relate nodes
    * Have a label
    * Have direction
  3. Properties
    * < Name:value > pairs
    * Describe characteristics
    * Can be on nodes or on edges
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Different graph sorts

A

Social networks, media networks, information nets, transportation networks, communication nets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Neo4J

A

Neo4J is Whiteboard friendly, Relationship-focused, and Scalable.
Neo4j is a graph database that prioritizes relationships over indexes, storing data natively as nodes, relationships, and properties across separate store files. Nodes and relationships are managed as fixed-size records with specific bits for reclaiming and flags for characteristics like dense connectivity. Relationships in Neo4j have types and are organized in a doubly linked list, connecting nodes through relationship records. Properties are stored as key-value pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Gremlin features

A
  • Designed to work with various graph databases and frameworks, providing a powerful and flexible way to query and manipulate graph data
  • Java, Rest, Cypher, Gremlin
  • Gremplin: Graph traversal language
  • g: variable that represents a graph
  • Profiles
  • Returns always a se
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Graph Searches

A
  • Breadth first: explore nearest neighbors first (application: find nearest bank branches that have ATM)
  • Depth first: explore as far as possible, then backtrack until you can start exploring again (application: trace root of a problem)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Degree centrality

A

Measures number of relationships a node has.
To estimate popularity of some service or identify fraudsters from legitimate users (centrality of fraudsters tend to be higher in order to inflate prices)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Community detection

A
  1. Find groups of users such that relationships betwene memebrs of the group are stronger than relationships across groups
  2. Help identify clusters of nodes, isolated groups, or network structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cypher

A

A pattern matching query language made for graphs: declarative, expressive, pattern-matching.
Example:
//Find all of the co-actors Tom Hanks have worked with
MATCH (:Person {name:”Tom Hanks”})–>(:Movie)<-[:ACTED_IN]-(coActor:Person)
RETURN coActor.name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Neo4J Strengths and Weaknesses

A

Strengths:
* Open-source.
* Scales well with large datasets (up to 34.4 billion nodes and relationships).
* Typeless and schemaless, though type simulation is available through plugins.
Weaknesses:
* Replicates the entire graph rather than subgraphs.
* Graph size is limited to millions despite high scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

**

Neo4J internals

A

Data Storage: Separate files for nodes, relationship labels, and properties.
Memory Management: Uses Least Recently Used (LRU) cache.
Transactions: Managed in-memory with write-ahead logs, also acting as lock managers.

Scalability and Performance

Scale: Supports tens of billions of nodes, properties, and relationships due to its efficient data model.
Latency: Low response times due to traversal-based querying.
Throughput: High read and write throughput due to data locality.

High Availability and Load Balancing

High Availability: Multiple synchronized nodes with a write-master and read-slave architecture. Writes through slaves are not immediately synchronized.
**Load Balancing: **Separates read and write operations and supports cache sharding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly