Understanding and reducing AWS data transfer costs (pt1)
AWS Data transfer costs can become a massive part of your monthly infrastructure bill if left unchecked. Often, some of the data transfer costs can be reduced or entirely eliminated by understanding what is charged and where and then slightly modifying your infrastructure or application logic.
This blog post is meant to be a quick introduction for novice AWS users and a refresher for some of the more experienced ones. We will briefly cover the main points you should consider when building your application and infrastructure on AWS. We will also explore some AWS bill cost categories when dealing with data transfer costs. In the posts that should follow, we will introduce some alternatives to help reduce or eliminate some of those costs.
Quick disclaimer: there are no silver bullets in this post. While one solution might work for some, it can make things worse for others. Therefore it is essential first to gain insights into AWS data transfer pricing models and your workload data traffic patterns before applying any suggestions.
To keep things relatively simple and readable, this topic is organized in a series of posts. Therefore, we invite you to check back periodically and subscribe to our newsletter so that you don’t miss out on new posts in this series.
Introduction to data transfer
All of the services on the Internet use data transfer to communicate between end users and servers. In some scenarios, services can speak directly to other services and will still need to transfer data between them. AWS is an excellent platform for delivering your data to end users across the globe using the most optimal routes and the lowest possible latencies at a large scale. AWS data transfer pricing model can be confusing at first. This is why we must adequately define some terms in this blog series.
Without further ado, let us first establish a baseline in understanding how the traffic is classified and billed in your AWS environment.
Direction of data flow
The first two terms we will define describe data flow direction in context to services running within your AWS VPC.
Data flow can be either:
- Ingress – data flowing from some destination toward your infrastructure (for example, EC2)
- Egress – data flowing from your infrastructure (for example, EC2) to some destination
As a rule of thumb, Ingress traffic on AWS is free, while Egress traffic is subject to data transfer pricing. There are some exceptions to the rule, which we will cover later.
Data flow source/destination
The second thing we should consider is where the data flows to or from.
In this classification source or destination can be:
- Internet – data is flowing to/from the Internet (usually end users or some third-party services not hosted on AWS)
- Region – data is flowing to/from or within AWS Region (the region is a physical location where AWS availability zones are clustered)
- Availability zone – data is flowing to/from or within the availability zone (AZ is a logical cluster of data centers working as a stand-alone unit within the AWS region)
- On-premises data center – data is flowing to/from the on-premises data center (either via site-to-site VPN tunnel or direct connect)
Considering the destinations mentioned above, data traffic charges have different price tags.
For example, data traffic pricing differs based on the region where your infrastructure is located and where to/from data flows.
|Region \ Destination
|Inter region traffic
|Inter AZ traffic
|Intra AZ traffic
|us-east-1 (N. Virginia)
Most of the regions have similar data transfer costs. However, some have more expensive data transfer costs that should be considered when planning your infrastructure.
A keen eye might have caught that traffic within the availability zone is free and that cross-availability zone traffic is less expensive than cross-region traffic.
Service data transfer costs
We also need to distinguish between terms like Data transfer and Data processing cost.
Data transfer cost is raw data flowing in one or both directions between one or more locations. Data processing cost is added to data transfer cost when flowing in one or both directions between one or more locations by utilizing some AWS-managed service.
One good example of the above sentence is NAT Gateway. Even though ingress traffic from the Internet is free in terms of data transfer costs, if that traffic flows through the NAT Gateway service, it is subject to additional data processing costs.
Things get worse if you provision NAT gateway in different availability zone than your workload. In that scenario you end up paying Inter-AZ traffic on top of data processing price.
On the other hand, services like RDS, which can be deployed as Multi-AZ (spanning multiple availability zones), have no data transfer costs between availability zones for database replication traffic. However, they still incur data transfer costs when communicating with EC2 instances in different availability zones.
Each managed service has its data transfer policy. Before choosing the service, it is best to consult the service pricing page for additional details.
Data transfer volume cost
In most cases, pricing will change based on the monthly traffic volume. One simple example is EC2 data transfer cost, where standard pricing of $0.09 per GB is applied for the first 10TB of data transferred. Next, 40TB of data transfer is priced at a reduced cost of $0.085.
If our application/service, based on EC2, is consuming 50TB of traffic, then tiered pricing would apply where the first 10TB is priced at $0.09 and the next 40TB at $0.085, totaling $4403,20.
Volume cost reduction varies based on managed service used. Cloudfront for example rates egress traffic at much lower rates than EC2. This enables us to utilize Cloudfront as a reverse proxy for our applications, reducing data transfer costs. Since EC2 to Cloudfront traffic is free of charge, we are billed just for traffic from Cloudfront to the Internet.
In addition to Cloudfront’s lower base prices, Sysbee can offer additional Cloudfront savings based on traffic commitment for an extra discount of up to 60% on egress data traffic.
For low-volume traffic, it is also worth noting free-tier plans. EC2 offers 100GB monthly for free, while Cloudfront provides 1TB of free egress data transfers. Some managed services offer their free-tier benefits, so before committing to the service, please consult the pricing page for that particular service.
Data transfer charges should be addressed while architecting a solution in AWS. Considering data transfer charges while making architectural decisions can help save costs. Based on the examples above, you should consider what type of traffic is predominant in your infrastructure (ingress or egress), what region you should choose for primary infrastructure, should you utilize multiple regions, availability zones, or some other AWS-managed services to reduce your data transfer costs.
As a best practice, try utilizing compression in data transfer streams where possible and reduce unnecessary chatter and anti-patterns when communicating between parts of your infrastructure or microservices.
In the following posts, we will dive deeper into understanding some potential savings by utilizing NAT instances vs. NAT gateways, introducing Cloudfront for cost and latency-optimized content delivery to end users, as well as using endpoint services and VPC peerings, so stay tuned and as always drop us a line if you need infrastructure review and assessment.