Database Specialty - Neptune Flashcards

Question 1

Q

Amazon Neptune – Overview

Answer

A

Fully managed graph database service (non-relational)
Relationships are first-class citizens
Can quickly navigate relationships and retrieve complex relations between highly
connected datasets
Can query billions of relationships with millisecond latency
ACID compliant with immediate consistency
Supports transaction semantics for highly concurrent OLTP workloads (ACID transactions)
Supported graph query languages – Apache TinkerPop Gremlin and RDF/SPARQL
Supports 15 low-latency read replicas (Multi-AZ)
Use cases:
- Social graph / Knowledge graph
- Fraud detection
- Real-time big data mining
- Customer interests and recommendations (Recommendation engines)

Question 2

Q

Graph Database

Answer

A

Models relationships between data
- e.g. Subject / predicate / object / graph (quad)
- Joe likes pizza
- Sarah is friends with Joe
- Sarah likes pizza too
- Joe is a student and lives in London
- Let’s you ask questions like “identify Londoners who
  like pizza” or “identify friends of Londoners who like
  pizza”
Uses nodes (vertices) and edges (actions) to
describe the data and relationships between
them
DB stores – person / action / object (and a graph
ID or edge ID)
Can filter or discover data based on strength,
weight, or quality of relationships

Question 3

Q

Graph query languages

Answer

A

Neptune supports two popular modeling frameworks – Apache
TinkerPop and RDF/SPARQL
TinkerPop uses Gremlin traversal language
RDF (W3C standard) uses SPARQL
SPARQL is great for multiple data sources, has large variety of
datasets available
We can use Gremlin or SPARQL to load data into Neptune and
then to query it
You can store both Gremlin and SPARQL graph data on the same
Neptune cluster
It gets stored separately on the cluster
Graph data inserted using one query language can only be queried
with that query language (and not with the other)

Question 4

Q

Neptune Architecture

Answer

A

6 copies of your data across 3 AZ (distributed design)
- Lock-free optimistic algorithm (quorum model)
- 4 copies out of 6 needed for writes (4/6 write quorum - data
  considered durable when at least 4/6 copies acknowledge the write)
- 3 copies out of 6 needed for reads (3/6 read quorum)
- Self healing with peer-to-peer replication, Storage is striped across
  100s of volumes
One Neptune Instance takes writes (master)
Compute nodes on replicas do not need to write/replicate
(=improved read performance)
Log-structured distributed storage layer – passes incremental
log records from compute to storage layer (=faster)
Master + up to 15 Read Replicas serve reads
Data is continuously backed up to S3 in real time, using
storage nodes (compute node performance is unaffected)

Question 5

Q

Neptune Cluster

Answer

A

Loader endpoint – to load the data into Neptune (say, from S3)
- e.g. https://<cluster_endpoint>:8182/loader</cluster_endpoint>
Gremlin endpoint – for Gremlin queries
- e.g. https://<cluster_endpoint>:8182/gremlin</cluster_endpoint>
Sparql endpoint – for Sparql queries
- e.g. https://<cluster_endpoint>:8182/sparql</cluster_endpoint>

Question 6

Q

Bulk loading data into Neptune

Answer

A

Use the loader endpoint (HTTP POST to the loader endpoint)
- e.g. curl –X POST –H ‘Content-Type: application/json’
  https://<cluster_endpoint>:8182/loader –d
  ‘{
  “source”: “s3://bucket_name/key_name,
  …
  }’</cluster_endpoint>
S3 data can be accessed using an S3 VPC endpoint (allows access to
S3 resources from your VPC)
Neptune cluster must assume an IAM role with S3 read access
S3 VPC endpoint can be created using the VPC management console
S3 bucket must be in the same region as the Neptune cluster
Load data formats
- csv (for gremlin), ntripples / nquads / rdfxml / turtle (for sparql)
All files must be UTF-8 encoded
Multiple files can be loaded in a single job

Question 7

Q

Neptune Workbench

Answer

A

Lets you query your Neptune cluster
using notebooks
Notebooks are Jupyter notebooks
hosted by Amazon SageMaker
Available within AWS console * Notebook runs behind the scenes on
an EC2 host in the same VPC and has
IAM authentication
The security group that you attach in
the VPC where Neptune is running
must have an additional rule that allows
inbound connections from itself

Question 8

Q

Neptune Replication

Answer

A

Up to 15 read replicas
ASYNC replication
Replicas share the same underlying
storage layer
Typically take 10s of milliseconds
(replication lag)
Minimal performance impact on the
primary due to replication process
Replicas double up as failover targets
(standby instance is not needed)

Question 9

Q

Neptune High Availability

Answer

A

Failovers occur automatically * A replica is automatically promoted to be the
new primary during DR
Neptune flips the CNAME of the DB instance
to point to the replica and promotes it
Failover to a replica typically takes under 30-120
seconds (minimal downtime)
Creating a new instance takes about 15 minutes (post failover)
Failover to a new instance happens on a best-effort basis and can take longer

Question 10

Q

Neptune Backup and Restore

Answer

A

Supports automatic backups
Continuously backs up your data to S3 for
PITR (max retention period of 35 days)
latest restorable time for a PITR can be up
to 5 mins in the past (RPO = 5 minutes)
The first backup is a full backup.
Subsequent backups are incremental
Take manual snapshots to retain beyond 35 days
Backup process does not impact cluster performance

Question 11

Q

Neptune Backup and Restore

Answer

A

Can only restore to a new cluster
Can restore an unencrypted snapshot to an
encrypted cluster (but not the other way
round)
To restore a cluster from an encrypted
snapshot, you must have access to the KMS
key
Can only share manual snapshots (can copy
and share automated ones)
Can’t share a snapshot encrypted using the
default KMS key of the a/c
Snapshots can be shared across accounts, but
within the same region

Question 12

Q

Neptune Scaling

Answer

A

Vertical scaling (scale up / down) – by resizing instances
Horizontal scaling (scale out / in) – by adding / removing up to 15 read replicas
Automatic scaling storage – 10 GB to 64 TB (no manual intervention needed)

Question 13

Q

Database Cloning in Neptune

Answer

A

Different from creating read replicas – clones
support both reads and writes
Different from replicating a cluster – clones use
same storage layer as the source cluster
Requires only minimal additional storage
Quick and cost-effective
Only within region (can be in different VPC)
Can be created from existing clones
Uses a copy-on-write protocol
- both source and clone share the same data initially
- data that changes, is then copied at the time it changes either on the source or on
  the clone (i.e. stored separately from the shared data)
- delta of writes after cloning is not shared

Question 14

Q

Neptune Security – IAM

Answer

A

Uses IAM for authentication and
authorization to manage Neptune
resources
Supports IAM Authentication (with AWS
SigV4)
You use temporary credentials using an
assumed role
Create an IAM role
Setup trust relationship
Retrieve temp creds
Sign the requests using the creds

Question 15

Q

Neptune Security – Encryption & Network

Answer

A

Encryption in transit – using SSL / TLS
- Cluster parameter neptune_enforce_ssl = 1 (is default)
Encryption at rest – with AES-256 using KMS
- encrypts data, automated backups, snapshots, and replicas in the same cluster
Neptune clusters are VPC-only (use private subnets)
Clients can run on EC2 in public subnets within VPC
Can connect to your on-premises IT infra via VPN
Use security groups to control access

Question 16

Q

Neptune Monitoring

Answer

Study These Flashcards

A

Integrated with CloudWatch
can use Audit log files by enabling DB cluster parameter neptune_enable_audit_log
must restart DB cluster after enabling audit logs
audit log files are rotated beyond 100MB (not configurable)
audit logs are not stored in sequential order
(can be ordered using the timestamp value of each record)
audit log data can be published (exported) to a CloudWatch Logs log group by enabling Log exports for your cluster
API calls logged with CloudTrail

Question 17

Q

Query Queuing in Neptune

Answer

Study These Flashcards

A

Max 8192 queries can be queued up per Neptune instance
Queries beyond 8192 will result in ThrottlingException
Use CloudWatch metric
MainRequestQueuePendingRequests to get number of queries queued (5 min
granularity)
Get acceptedQueryCount value using Query Status PI
- For Gremlin, acceptedQueryCount = current count of queries queued
- For SPARQL, acceptedQueryCount = all queries accepted since the server started

Question 18

Q

Neptune Service Errors

Answer

Study These Flashcards

A

Graph engine errors
- Errors related to cluster endpoints, are HTTP error codes
- Query errors – QueryLimitException /
  MemoryLimitExceededException / TooManyRequestsException etc.
- IAM Auth errors – Missing Auth / Missing token / Invalid Signature /
  Missing headers / Incorrect Policy etc
API errors
- HTTP errors related to APIs (CLI / SDK)
- InternalFailure / AccessDeniedException / MalformedQueryString /
  ServiceUnavailable etc
Loader Error
- LOAD_NOT_STARTED / LOAD_FAILED /
  LOAD_S3_READ_ERROR / LOAD_DATA_DEADLOCK etc

Question 19

Q

SPARQL federated query

Answer

Study These Flashcards

A

Query across multiple Neptune clusters or external data sources that
support the protocol, and aggregate the results
Supports only read operations

Question 20

Q

Neptune Streams

Answer

Study These Flashcards

A

Capture changes to your graph (change logs)
Similar to DynamoDB streams
Can be processed with Lambda (use Neptune Streams API)
SPARQL
- https://<cluster_endpoint>:8182/sparql/stream</cluster_endpoint>
Gremlin
- https://<cluster_endpoint>:8182/gremlin/stream</cluster_endpoint>
Only GET method is allowed

Use cases
* Amazon ES Integration
* To perform full-text search queries on Neptune data
* Uses Streams + federated queries
* Supported for both gremlin and SPARQL
* Neptune-to-Neptune Replication

Question 21

Q

Neptune Pricing

Answer

Study These Flashcards

A

You only pay for what you use
On-demand instances – per hour pricing
IOPS – per million IO requests
- Every DB page read operation = one IO
- Each page is 16 KB in Neptune
- Write IOs are counted in 4KB units
DB Storage – per GB per month
Backups (automated and manual) – per GB per month
Data transfer – per GB
Neptune Workbench – per instance hour

Database Specialty - Neptune Flashcards

(21 cards)