Your nightly job replicates three terabytes from the data center to S3. For two years a site-to-site VPN carried it without complaint. Then the business grew, the backup window tightened, and one quarter-end the job that used to finish in four hours took fourteen — the tunnel saturated, latency wandered between 40 ms and 300 ms, and packets retried over a public-internet path nobody at your company controls. Meanwhile the data-transfer line on the AWS bill kept climbing, because every byte left over the internet at internet egress rates.

The VPN wasn't misconfigured. It was doing the only thing a tunnel over the public internet can do: best-effort, shared, and variable. AWS Direct Connect is the other option — a private, dedicated physical line into AWS, with consistent latency and a cheaper rate for moving data. This is the long version: what it is, how a connection physically comes to exist, how traffic actually flows over it, how to make it survive a failure, and when its cost and lead time are worth it.

The problem Direct Connect was built to solve

Before Direct Connect, connecting a data center to AWS meant a Site-to-Site VPN: an IPsec tunnel from your edge router to a virtual private gateway, riding the public internet the whole way. That works, and for modest traffic it's still the right answer. But the public internet gives you three things you can't design around: variable latency (your packets take whatever path BGP on the open internet chooses, hop to hop), jitter (that path changes, so latency wanders), and throughput that tops out at what a single IPsec tunnel and the congested middle can carry. On top of that, every gigabyte you pull out of AWS over the internet is billed at the internet data-transfer-out rate — the most expensive egress AWS sells.

For a chatty management connection none of that matters. For three terabytes a night, a latency-sensitive database replica, or a trading feed, all of it matters. Direct Connect removes the public internet from the path entirely.

What Direct Connect is — and what it is not

A Direct Connect is a physical cross-connect: a fibre cable, inside a data center AWS calls a Direct Connect location, running from your router (or a partner's) to an AWS router in that same building. You light the fibre, bring up BGP across it, and now there is a private Layer-2/Layer-3 path between your network and AWS that never touches the public internet.

Being precise about what it is not saves you from three common misconceptions:

  • It is not a VPN. There is no tunnel and no encryption by default — it's a dedicated link, not an encrypted one.
  • It is not the internet. There is no shared path and no route flapping; latency is consistent because the path is fixed.
  • It is not, by itself, "private and secure." It is private in the sense that no one else is on your fibre. It is not confidential unless you add encryption on top.
·
Mental Model A VPN is the public road with an armored car: encrypted, but stuck in the same traffic as everyone else, and the route changes daily. Direct Connect is a dedicated lane you lease into AWS's building: consistent and yours, but you pay for the lane whether or not a car is on it — and the lane itself isn't armored unless you add the armor.

How a connection physically comes to exist

Direct Connect is one of the few AWS services with a hard dependency on the physical world, so it's worth tracing how a port actually appears.

Physical path: on-prem router to a Direct Connect location, cross-connected to an AWS router, onto the AWS backbone and into a VPC PHYSICAL PATH // ON-PREM → AWS On-prem router your data center DIRECT CONNECT LOCATION your cage your gear AWS router AWS cage cross-connect AWS backbone private global net VPC your workloads The only new physical thing is the cross-connect inside the DX location. Everything right of it is AWS's private network.
The physical path. The one cable you provision is the cross-connect; from the AWS router on, it's AWS's backbone.

Two port models decide your lead time and your floor:

ModelSpeedsHow you get itFits when
Dedicated1, 10, 100, 400 GbpsOrder the port from AWS; AWS issues an LOA-CFA; you/your colo run the cross-connectYou have gear in a DX location and weeks to provision
Hosted50 Mbps – 25 GbpsAn AWS Direct Connect Partner already in the location carves you a slice of their portYou have no presence there and want it live in days

The provisioning ceremony for a dedicated connection runs like this: you request the connection in the console (choosing the location and speed); AWS issues a Letter of Authorization and Connecting Facility Assignment (LOA-CFA) — the document that authorizes a cross-connect to a specific AWS port; you hand that LOA-CFA to the colocation provider that runs the building; and they physically patch a fibre from your equipment to AWS's. Then you create virtual interfaces and bring up BGP. The LOA-CFA is the hinge of the whole process, and not knowing what it is marks someone who has never actually ordered a circuit.

·
In Practice Teams reach the wire three ways, trading control for speed. (1) A dedicated port if they already have equipment in a Direct Connect location. (2) A hosted connection through a partner (Megaport, Equinix, a telco) when they don't — the partner owns the port and the floor space. (3) Fully partner-managed, where the provider runs the entire path. The decision is whether you have a footprint in a DX location at all, and how fast you need it live.

Virtual interfaces: one wire, three doors

The physical port is just glass and light. What you route over it are logical virtual interfaces (VIFs), each an 802.1Q VLAN with its own BGP session, and there are three kinds.

Three virtual interface types: private VIF to a VPC, public VIF to AWS public services, transit VIF to a Transit Gateway via a Direct Connect gateway VIRTUAL INTERFACE TYPES Private VIF reaches one VPC, via a virtual private gateway or a Direct Connect gateway → private IPs in a VPC Public VIF reaches AWS public services over the private line, in any Region → S3, DynamoDB, … Transit VIF reaches a Transit Gateway through a Direct Connect gateway → many VPCs → TGW, multi-Region
Three VIF types on one port — private to a VPC, public to AWS services, transit to a Transit Gateway.

A private VIF carries traffic to private IP addresses in a VPC, attached either to that VPC's virtual private gateway or to a Direct Connect gateway. A public VIF reaches AWS's public service endpoints — S3, DynamoDB, public APIs — in any Region, over the private line instead of the internet (you advertise your public prefixes and receive AWS's). A transit VIF lands on a Direct Connect gateway and, through it, an AWS Transit Gateway — the path to many VPCs across many Regions. A single dedicated connection carries up to 51 virtual interfaces (including transit VIFs), so one port can serve all three patterns at once.

BGP: how routing actually runs over the wire

Every VIF runs a BGP session between your router and AWS's. You bring your own ASN — a public ASN you own, or a private one in the 64512–65534 range — and AWS uses its own on its side. You advertise the on-prem prefixes you want AWS to reach; AWS advertises the VPC (or public) prefixes back. BGP is also how failover works: when you run two connections, the routes learned over both let traffic shift automatically if one drops.

Deep dive: BGP tuning that separates a working link from a resilient one

Three knobs matter in production. MD5 authentication on the BGP session is optional but standard practice — it stops a misconfigured neighbor from forming a session. BFD (Bidirectional Forwarding Detection) is the one that earns its keep: default BGP hold timers take ~90 seconds to notice a dead peer, but BFD detects a failed path in well under a second, so with two connections your failover is sub-second instead of a minute-and-a-half outage. Enable it on both ends. Finally, route preference — AWS evaluates longest-prefix-match first, then its own routing policy; to make one path primary and another standby you use AS-path prepending or more-specific advertisements on the path you prefer. Active/active needs equal advertisements; active/passive needs an intentional tiebreaker.

The Direct Connect gateway: one wire, many VPCs and Regions

A private VIF reaches exactly one VPC, which is a dead end the moment you have more than one. The Direct Connect gateway (DXGW) is the global join that fixes it: a Region-agnostic object you associate your gateways with, so a single physical connection reaches VPCs anywhere.

One Direct Connect gateway connects a single on-prem connection to VPCs and Transit Gateways across multiple AWS Regions DIRECT CONNECT GATEWAY // GLOBAL REACH on-prem 1 connection DX gateway Region-agnostic VIF us-east-1 TGW → VPCs eu-west-1 TGW → VPCs ap-southeast-2 TGW → VPCs A DXGW associates up to 6 Transit Gateways (or up to 20 virtual private gateways) across Regions.
A Direct Connect gateway turns one physical connection into reach across Regions.

Two patterns: associate the DXGW directly with virtual private gateways (up to 20) to reach VPCs one-to-one, or — the scalable choice — land a transit VIF on the DXGW and associate up to 6 Transit Gateways, one per Region, each fanning out to its Region's VPCs. The physical port and VIF stay single; the gateway provides the geography. (A Transit Gateway association advertises up to 200 prefixes to the DXGW.)

Traffic flow, end to end

Tracing a single packet from an on-prem host to an EC2 instance is the fastest way to see where each component sits.

End-to-end traffic flow from an on-prem host through the router, cross-connect, AWS router, Direct Connect gateway, Transit Gateway, and VPC to an EC2 instance PACKET PATH // HOST → EC2 on-premhost yourrouter cross-connect AWS DXrouter DXgateway TransitGateway VPC EC2instance physical fibre AWS backbone · routes learned via BGP The return path is symmetric; BGP advertised the VPC CIDR to your router and your prefixes to AWS.
One packet, every hop: the only physical segment is the cross-connect; the rest is BGP-routed over AWS's backbone.

The host sends to a VPC private IP; your router has a BGP-learned route for the VPC CIDR pointing at the Direct Connect; the packet crosses the fibre to the AWS router, which hands it to the Direct Connect gateway, then the Transit Gateway, then into the VPC subnet and the instance's ENI. The return trip is symmetric because BGP advertised routes in both directions. No NAT, no tunnel, no internet — just routing over a private path.

One link is not a plan: resilience

A single Direct Connect is a single point of failure at several layers at once — the cross-connect can be cut, the AWS device it lands on can fail, the whole Direct Connect location can go dark — and because provisioning a physical port takes weeks to months, you cannot launch a replacement when one dies. The wire that fixed your throughput becomes your biggest availability risk.

Three resilience models: non-redundant (single point of failure), high resiliency (two locations), and maximum resiliency (separate devices in two locations) RESILIENCE MODELS Non-redundant SPOF — dev/test only 1 connection 1 device · 1 location any failure = outage, for weeks High resiliency survives a location loss 2 connections 2 separate locations covers fibre cut or single-site failure Maximum resiliency AWS-recommended separate connections on separate devices, 2+ locations survives device, cut, and site failure
AWS's Resiliency Toolkit, left to right by what each survives. Production starts at the middle.

AWS frames the choices as a Resiliency Toolkit. Maximum Resiliency uses separate connections terminating on separate AWS devices in two or more locations — the recommended posture for critical workloads, surviving a device, a cross-connect, or an entire location. High Resiliency uses two connections across two locations. The non-redundant tier (one connection, or two on separate devices in a single location) is for dev and test only. Two more tools matter: a link aggregation group (LAG) bundles up to four sub-100 Gbps connections (or two at 100/400 Gbps) into one logical link for capacity plus device redundancy; and a Site-to-Site VPN — cheap and live in an hour — makes an honest cold backup for the day the fibre is cut.

·
Gotcha Two connections into the same AWS device in the same location is not redundancy — it survives a single fibre cut and nothing else. "Two connections" is necessary but not sufficient; resilience comes from separate devices and separate locations. AWS's Resiliency Toolkit will only certify (and SLA) the configuration if the topology actually separates the failure domains.

One non-obvious capability: with SiteLink enabled on VIFs at two or more Direct Connect locations, traffic can flow directly between those locations over the AWS global backbone, bypassing AWS Regions entirely. It turns Direct Connect into a wide-area network for your own data centers — data center A in Frankfurt to data center B in Singapore, riding AWS's backbone rather than your carrier's MPLS. It's billed separately (per-hour plus data), and it's the feature that lets teams retire expensive private WAN circuits.

Encryption: private is not confidential

Because the line is private, it's tempting to treat it as secure. It isn't, on its own — the bytes cross the fibre in the clear. Two ways to add confidentiality:

  • MACsec — IEEE 802.1AE Layer-2 encryption, near line rate, on dedicated 10 and 100 Gbps connections at supporting locations. Low overhead, but tied to specific ports and locations.
  • IPsec VPN over a public VIF — run a Site-to-Site VPN across the Direct Connect's public VIF. Works anywhere, but you pay tunnel overhead and a throughput ceiling.
·
Gotcha Claiming "Direct Connect is encrypted" is the fastest way to fail a security review — and an interview. It's a private path, not an encrypted one. If the data is regulated, the encryption story (MACsec or IPsec over the link) is a design-time decision, not an afterthought.

The cost model, with the break-even math

Two charges: port-hours for the connection, billed every hour whether it's busy or idle, and data transfer out per gigabyte — but at a rate materially lower than internet egress (the exact rate depends on the Direct Connect location and the source Region; data transfer in is free, as on the internet). The Direct Connect gateway itself is free; you pay the port and the egress.

The decision is arithmetic, not faith. The port is a fixed monthly cost; the saving is the gap between internet egress and Direct Connect egress on the volume you actually move:

break-even GB/month  =  monthly port cost
                        ─────────────────────────────────────
                        (internet $/GB)  −  (Direct Connect $/GB)

Below that volume, a VPN over the internet is cheaper. Above it — sustained, predictable egress — Direct Connect is both faster and less expensive, and the financial case writes itself. (Plug your location's published rates into the formula; the per-GB figures vary by location and Region, so verify them on the pricing page rather than memorizing a number.)

Standing it up: Terraform and monitoring

The dedicated connection and its cross-connect happen partly in the physical world, but the AWS-side objects — the connection request, the VIF, the gateway — are all API-driven and belong in code.

terraform {
  required_version = ">= 1.6"
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.40" }
  }
}

# 1. Request the dedicated connection (AWS then issues the LOA-CFA).
resource "aws_dx_connection" "primary" {
  name      = "dc1-primary"
  bandwidth = "10Gbps"
  location  = "EqDC2"            # a Direct Connect location code
}

# 2. A Region-agnostic gateway, so one connection reaches many VPCs/Regions.
resource "aws_dx_gateway" "core" {
  name            = "core-dxgw"
  amazon_side_asn = 64512        # AWS-side BGP ASN for this gateway
}

# 3. A private VIF: your VLAN + BGP session into the gateway.
resource "aws_dx_private_virtual_interface" "vif" {
  connection_id    = aws_dx_connection.primary.id
  dx_gateway_id    = aws_dx_gateway.core.id
  name             = "vif-prod"
  vlan             = 4094
  address_family   = "ipv4"
  bgp_asn          = 65000       # YOUR on-prem ASN (private 64512–65534, or public)
  # bgp_auth_key   = "..."       # optional MD5 auth — set it in production
}

After the cross-connect is patched and the VIF is up, confirm the BGP session is established and that you're learning the VPC CIDR on-prem and advertising your prefixes to AWS. Then wire monitoring: Direct Connect publishes CloudWatch metrics per connection and per VIF — ConnectionState, ConnectionBpsEgress/Ingress, and the BGP state — and the alarm that actually pages you is on BGP peer state dropping from up to down. A connection that's "up" at the physical layer but down at BGP carries no traffic; alarm on the session, not just the light.

·
Enterprise Angle For leadership the pitch is three sentences: Direct Connect replaces an unpredictable internet path with consistent, low latency and a lower egress rate; the cost is a fixed monthly port plus weeks of lead time; and a single connection is not production — budget for two, in two locations. The TCO case against VPN turns on sustained egress volume and whether the workload needs a latency floor a best-effort tunnel can't promise. For HIPAA / PCI / financial workloads, pair it with MACsec or IPsec and the private path becomes a compliant one.

Direct Connect, VPN, or both

DimensionSite-to-Site VPNDirect Connect
PathIPsec tunnel over the public internetPrivate dedicated fibre
LatencyVariable, best-effortConsistent, low
BandwidthPer-tunnel ceiling1–400 Gbps
EncryptionBuilt in (IPsec)None by default (add MACsec/IPsec)
Lead timeMinutesWeeks to months
Egress costInternet rateLower per-GB rate

Direct Connect earns its cost in three situations: sustained bandwidth a tunnel can't hold; workloads that need a consistent latency floor rather than best-effort; and large, ongoing egress where the lower per-GB rate pays back the port. Outside those, a VPN is cheaper, encrypted by default, and live in an hour. The common production answer is both — Direct Connect for the steady load, a Site-to-Site VPN as the automatic backup — which also neatly covers the weeks-long gap if a Direct Connect ever has to be reprovisioned. Whatever you choose, treat a lone Direct Connect as a single link, never a finished design: pair it with a second path before you call it production.

System design scenarios

1
Reach VPCs in five Regions over one connection
A private VIF reaches one VPC — a dead end at five. Land a transit VIF on a Direct Connect gateway and associate up to six Transit Gateways (one per Region); each TGW fans out to its Region's VPCs. The physical port stays single; the DXGW is the global join. For VPC-direct without TGWs, the same gateway can associate up to 20 virtual private gateways, but the TGW path scales further and keeps routing centralized.
2
A connection you can lose without an outage
Maximum Resiliency: two connections on separate AWS devices in two Direct Connect locations, BGP plus BFD for sub-second failover, and a LAG within a location where you also need capacity. Keep a Site-to-Site VPN as a cold backup for total Direct Connect loss. Make one path primary with AS-path prepending if you want active/passive; advertise equally for active/active. Test by draining one path on purpose — untested failover is a hypothesis.
3
Migrate from VPN to Direct Connect with no cutover outage
Stand the Direct Connect up alongside the existing VPN. Bring up BGP on the new private VIF but keep the VPN advertising the same prefixes; with both paths live, make Direct Connect preferred (more-specific routes or higher local preference) and let traffic shift gradually. Watch CloudWatch egress move from the VPN to the connection, soak it for a few days, then demote the VPN to backup rather than deleting it. The VPN you migrated off of is the backup you wanted anyway.
4
Encrypt a regulated workload over the link
If the location and port support it, MACsec gives near-line-rate Layer-2 encryption on a dedicated 10/100 Gbps connection with negligible throughput cost. Where MACsec isn't available, run an IPsec VPN over a public VIF — encryption everywhere, at the cost of tunnel overhead and a throughput ceiling. Decide by whether the location offers MACsec and whether the workload can absorb IPsec's overhead.
5
Connect two data centers across continents without MPLS
Enable SiteLink on a VIF at each data center's Direct Connect location; traffic between them rides the AWS backbone instead of a carrier's private WAN, often at lower cost and with AWS's global reach. Size each connection for the inter-site load, and remember SiteLink is billed separately (per-hour + data). This is the pattern that lets teams retire legacy MPLS while keeping a private, predictable path.

Interview questions

The questions an interviewer actually probes on hybrid connectivity, and what a strong answer covers.

Direct Connect or Site-to-Site VPN — when each?
VPN is an IPsec tunnel over the public internet: cheap, encrypted by default, live in minutes, but variable in latency and capped in throughput. Direct Connect is a private physical line: consistent latency, up to 400 Gbps, cheaper sustained egress — but it costs port-hours, takes weeks to provision, and isn't encrypted by default. Use Direct Connect for steady high-volume or latency-sensitive traffic; use a VPN for modest/bursty traffic or as Direct Connect's backup. Production designs commonly run both.
Explain the three virtual interface types.
Private VIF → a VPC (via a virtual private gateway or a Direct Connect gateway). Public VIF → AWS public services (S3, DynamoDB, public endpoints) in any Region, over the private line. Transit VIF → a Transit Gateway through a Direct Connect gateway, reaching many VPCs across Regions. One dedicated connection carries up to 51 VIFs.
What is the LOA-CFA and where does it fit?
The Letter of Authorization and Connecting Facility Assignment is the document AWS issues after you request a dedicated connection; it authorizes a cross-connect to a specific AWS port in the Direct Connect location. You hand it to the colo provider, who patches the fibre from your cage to AWS's. No LOA-CFA, no cross-connect — it's the hinge of provisioning a dedicated connection.
How do you reach VPCs in multiple Regions over one connection?
A Direct Connect gateway. Land a transit VIF on it and associate up to six Transit Gateways — one per Region — each fanning out to its VPCs; or associate up to 20 virtual private gateways for VPC-direct. The port and VIF stay single; the gateway provides the geography.
Is Direct Connect encrypted? How would you encrypt it?
No — private but unencrypted by default. Use MACsec (Layer-2, near line rate) on a dedicated 10/100 Gbps connection at a supporting location, or run IPsec over a public VIF where MACsec isn't available. "Private path" and "encrypted path" are different guarantees; for regulated data you need the second explicitly.
A single Direct Connect — failure modes and how you design around them?
It's a SPOF at the cross-connect, the AWS device, and the whole location, and reprovisioning takes weeks — so a failure is a prolonged outage. Design with the Resiliency Toolkit: Maximum Resiliency (separate devices, two locations) for critical workloads, High Resiliency otherwise; LAG for capacity plus device redundancy; BGP with BFD for fast detection; and a Site-to-Site VPN as a cheap backup path.
Why BFD, and what does it change?
Default BGP hold timers take roughly 90 seconds to declare a dead peer; BFD detects a failed path in well under a second. With two connections, that's the difference between a sub-second failover and a minute-and-a-half outage. Enable it on both ends of each session.
Active/active vs active/passive across two connections?
BGP picks the path. Advertise equally from both for active/active (load shares, both carry traffic). For active/passive, make one path less preferred with AS-path prepending or by advertising more-specific routes on the primary. Active/active uses the bandwidth you're paying for; active/passive gives deterministic failover behavior — choose by whether you need predictable path selection or maximum throughput.
Dedicated vs hosted — how do you choose?
Dedicated: you order a 1/10/100/400 Gbps port from AWS and run your own cross-connect — right when you have gear in a DX location and need high bandwidth or MACsec. Hosted: a partner carves you 50 Mbps–25 Gbps from their port — right when you have no presence in the location and want it live in days. Hosted is partner-managed; dedicated is yours end to end on the AWS side.
When would you reach for SiteLink?
When you need to connect your own data centers to each other — not just to AWS — and want to ride AWS's global backbone instead of a carrier MPLS. Enable SiteLink on VIFs at two or more DX locations and traffic flows between them bypassing AWS Regions, billed separately. It's the lever for retiring expensive private WAN circuits.