Heavily regulated enterprises involved with sensitive commercial and customer data often treat public cloud services as low-trust and enforce inspection, filtering and data loss prevention on data transfer to services such as S3. This introduces complexity and performance overheads for development teams looking to build applications in AWS that rely on S3 for storage.
AWS has released a number of features to support private access to S3, including VPC Endpoints and a variety of directives and conditions for bucket policies. There is also a well-defined pattern for private connectivity to S3 from corporate data centres and a blog post on how to add domain whitelisting for Internet-bound traffic from inside a VPC using Squid proxies. In this article, I will explore how these solutions can work together to enable private and secure data transfers from on-premises data centres to S3 over a DirectConnect link or a trusted VPN connection.
The solution enables access from corporate data centres to a set of approved S3 buckets, allowing teams to privately and securely upload data to S3.
The diagram below describes a network zone design for a VPC used for transferring data communicating with S3 VPC Endpoints. Working from the Internet Gateway inwards:
- The Public Subnet layer contains NAT gateways for routing traffic to an Internet Gateway
- The Proxy Subnet contains outbound proxies for filtering outbound traffic from private instances in AWS
- The Private Subnet contains private proxies that only allow access to the regional S3 endpoint and uses upstream outbound proxies for S3 data transfers outside of the current region.
The Private S3 Proxy Layer
The underlying architecture for the private S3 proxies farm consists of:
- An auto-scaling group spanning across multiple Availability Zones able to scale to meet utilisation requirements with scaling metrics
- A hardened AMI with all the binaries and configuration files embedded to ensure failed instances are replaced in minimal time
- A load-balancer with health checks for the auto-scaling group instances to ensure maximum uptime for served traffic
- A set of Route 53 records are used for DNS resolution and to enable blue-green re-deployment of the stateless proxies
S3 Bucket Policies
S3 supports the creation of bucket policies with object-level controls that restrict access exclusively from designated VPC Endpoints.
- Take care when restricting bucket actions to the VPC Endpoints. If locked out, use the account’s root user as a break-glass procedure to reset the bucket policy.
- Additional condition keys should be used for securing data at rest and in transit.
- Block Public Access settings limit public access to S3 resources.
Figure 3: Example bucket policy for restricted object access
S3 VPC Endpoints and Endpoint Policies
Some AWS services can be accessed privately through VPC Endpoints. In the case of S3, it can be accessed through gateway VPC Endpoints that are exposed through VPC route tables. Unlike interface VPC Endpoints that are available for some services, the gateway endpoints used for accessing S3 and DynamoDB services cannot be used from outside of their immediate VPC.
Below is an example of how the S3 VPC Endpoints are exposed via a VPC route table.
|pl-1a2b3c4d (com.amazonaws.ap-southeast-2.s3, 18.104.22.168/22, 22.214.171.124/24, 126.96.36.199/21)||vpce-01abc23456de78f9g||active||No|
Figure 4: Example Route Table entry for S3 Endpoint
Multiple S3 VPC Endpoints can be created, however these will need to be attached to different route tables and subnets. This is important as it allows granular control of S3 VPC Endpoint access from different network zones within your environment. In this topology, it is implemented with the following outcomes;
- The VPC Endpoint attached to the Proxy Subnet zone allows retrieval (read-only) of objects from private or public buckets within AWS
- The VPC Endpoint attached to Private Subnet zone allows read and write access only to specific private whitelisted buckets
S3 VPC Endpoint Policy for Proxy Subnet Zone
The sample endpoint policy below is used to restrict S3 read-only access of the outbound proxies originating from instances deployed in the VPC.
Figure 6: Example endpoint policy for S3 read-only access
S3 VPC Endpoint Policy for Private Subnet Zone
The sample endpoint policy below is used to restrict access to whitelisted S3 buckets by IAM users and roles within the same AWS Organization, using aws:PrincipalOrgId global IAM condition key.
Figure 7: Example endpoint policy for S3 read-write access within the same AWS Organization
Network Access Control Lists (NACLs)
VPC network security configuration also needs to be considered if an organisation uses NACLs as an additional network control. As the S3 VPC Endpoint functions by creating routes to public IP address ranges of the regional S3 service, it is important to ensure that the NACL rules allow the appropriate traffic.
AWS public IP ranges for the S3 service can be obtained here and you can also use the AWS Tools for Windows PowerShell cmdlet “Get-AWSPublicIpAddressRange”.
Note: These IP ranges are susceptible to change and notifications are sent through SNS.
A sample NACL configurations to allow use of S3 VPC Endpoint would look like:
|Rule #||Type||Protocol||Port Range||Source||Allow / Deny|
|10000||Custom TCP Rule||TCP (6)||1024-65535||188.8.131.52/22||ALLOW|
|10001||Custom TCP Rule||TCP (6)||1024-65535||184.108.40.206/24||ALLOW|
|10002||Custom TCP Rule||TCP (6)||1024-65535||220.127.116.11/22||ALLOW|
|10003||Custom TCP Rule||TCP (6)||1024-65535||18.104.22.168/21||ALLOW|
Figure 8. Sample Ingress NACL rules for S3 VPC Endpoint access
|Rule #||Type||Protocol||Port Range||Source||Allow / Deny|
|10000||HTTPS (443)||TCP (6)||443||22.214.171.124/22||ALLOW|
|10001||HTTPS (443)||TCP (6)||443||126.96.36.199/24||ALLOW|
|10002||HTTPS (443)||TCP (6)||443||188.8.131.52/22||ALLOW|
|10003||HTTPS (443)||TCP (6)||443||184.108.40.206/21||ALLOW|
Figure 9. Sample Egress NACL rules for S3 VPC Endpoint access
Squid Proxy Domain Whitelist
In order to restrict proxy access exclusively to the S3 VPC Endpoints, the regional S3 domains have to be added to the Squid whitelist configuration. It is important to note that locking down access in this way enforces explicit assignment of the region parameter for AWS CLI or other AWS SDK based tools.
Sample Squid whitelist file
Figure 10. Sample Squid whitelist file
In the provided example, the STS service has also been whitelisted for tools that require generation of tokens against Simple Token Service (STS). For example, the AWS Tools for Windows PowerShell may have profiles mapped to use SAML federated roles.
Testing the Solution
The following test cases demonstrate the functionality of the solution. In the first example, we attempt to write to a bucket whitelisted in the S3 VPC Endpoint policy:
In this example, we attempt to write to a bucket that is not whitelisted in the S3 VPC Endpoint policy:
In this example, we attempt to write to a bucket in another region:
And finally, we attempt to write to a private whitelisted bucket from a public IP address:
When building a solution using this approach, it is important to keep in mind the following:
If you are building with CloudFormation, consider using the AWS::Include Transform to maintain static mappings. For example, the public IP ranges allocated for the S3 service.
The mappings file can be stored in S3 in the following format:
These mappings can be referenced in a CloudFormation template using the following syntax:
This approach is particularly useful if deploying different network stacks across multiple accounts that need to maintain the same set of IP addresses. When using this approach, ensure the S3 bucket used for storing these boiler-plate files have cross-account access enabled.
Note: Additional automation can be added to respond to changes in these public ranges.
Extending the Solution to Other AWS Services
At the time of writing, the only other gateway-type endpoint is for the DynamoDB service, which would look almost identical in terms of implementation.
The interface endpoints can be attached to multiple subnets, are exposed as ENIs deployed directly in the subnet with private IP addresses and use security groups to whitelist allowed IP ranges. If private DNS is enabled for the interface-type endpoints, the default endpoint is used for requests. If private DNS is not used, the VPC Endpoint identifier has to be whitelisted in Squid and specified via CLI or AWS SDK using service endpoint-url parameter.
Note: Multiple interface endpoints can be attached to the same subnet, but only one can have private DNS enabled.
Integrating the Solution with Data Monitoring
Data Loss Prevention (DLP) is a common consideration for enterprises who want to protect their (or their customer’s) sensitive data. The outlined solution can be integrated with active data monitoring tools over ICAP (supported in Squid 3.0 and above) to inspect and filter HTTP/HTTPS traffic based on policies defined by the enterprise.
As demonstrated, VPC Gateway Endpoints in combination with a well-defined proxy layer can be used to privately access AWS services such as S3. With a strong understanding of AWS networking and supporting services, it is possible to architect a highly-governed, private access path to AWS services from on-premise data centres and secure data transfers to meet compliance demands of the most regulated enterprises.