🗄️Database

Amazon Neptune

Fully managed graph database for connected datasets

Neptune is a database designed for relationships, not just storing data, but understanding how things connect. Imagine Facebook's friend network, or Amazon's product recommendations ('customers who bought X also bought Y'). Traditional databases struggle with these 'who knows who' or 'what's related to what' queries. Neptune is built for this; it stores data as a graph (nodes and edges), making relationship queries lightning fast. Instead of asking 'show me all friends of friends of friends' with complex SQL joins that take minutes, Neptune answers in milliseconds. It's like having a map of connections where you can instantly trace any path.

Neptune is a fully managed graph database supporting three query languages: Gremlin (Apache TinkerPop), SPARQL (RDF), and openCypher. Data is stored as vertices (nodes) and edges (relationships), optimized for traversal queries. Neptune replicates data across 3 AZs with 6 copies, provides read replicas (up to 15), and supports point-in-time recovery.

Key Capabilities

Supports two graph models: Property Graph (queried with Gremlin or openCypher) and RDF (queried with SPARQL), within the same managed service
Stores relationships as first-class data, making multi-hop traversal queries (friends-of-friends, shortest paths, influence graphs) significantly faster than equivalent relational JOINs
Distributed storage layer replicates 6 copies across 3 Availability Zones with up to 15 read replicas and automatic failover
Neptune Serverless: automatically scales compute capacity up and down based on graph query load, removing the need to provision fixed instance sizes
Neptune Analytics: in-memory graph engine optimized for running algorithms (PageRank, community detection, shortest path) across large graphs at high speed
Neptune Streams: change log of all graph mutations, enabling event-driven pipelines and downstream data synchronization

Gotchas & Constraints

Gotcha #1: Graph databases require different thinking than relational databases, you model relationships explicitly, not via foreign keys. Gotcha #2: Neptune pricing is based on instance hours and storage; it's more expensive than DynamoDB for simple key-value lookups. Use Neptune only when you need graph traversals. Constraints: Maximum graph size is 128TB, single-region deployment (use backup/restore for DR), and limited query language support (Gremlin, SPARQL, or openCypher, not SQL).

A fraud detection system analyzes transaction networks to identify suspicious patterns. Traditional SQL queries to find 'all accounts connected within 3 hops of a flagged account' require recursive joins and take 10 minutes. They migrate to Neptune, modeling accounts as vertices and transactions as edges. The same query in Gremlin completes in 200ms. When a transaction is flagged, Neptune instantly traverses the graph to find connected accounts, identifying fraud rings (multiple accounts controlled by the same person). They use Neptune ML to train a model that predicts fraud likelihood based on graph features (number of connections, transaction patterns). For social features, they implement 'friend recommendations' by finding common connections: 'users who know your friends but aren't your friends yet.' Neptune handles 10,000 graph queries per second with sub-second latency.

The Result

fraud detection accuracy improves by 40%, false positives drop by 60%, and investigation time drops from hours to minutes.

Official AWS Documentation