Many organizations begin their cloud journey to AWS by moving a few applications to demonstrate the power and flexibility of AWS. This initial application architecture includes building security groups that control the network ports, protocols, and IP addresses that govern access and traffic to their AWS Virtual Private Cloud (VPC). When the architecture process is complete and an application is fully functional, some organizations forget to revisit their security groups to optimize rules and help ensure the appropriate level of governance and compliance. Not optimizing security groups can create less-than-optimal security, with ports open that may not be needed or source IP ranges set that are broader than required.
Last year, I published an AWS Security Blog post that showed how to optimize and visualize your security groups. Today’s post continues in the vein of that post by using Amazon Kinesis Firehose and AWS Lambda to enrich the Vpc Flow Logs dataset and enhance your ability to optimize security groups. The capabilities in this post’s solution are based on the Lambda functions available in this VPC Flow Log Appender GitHub repository.
Removing unused rules or limiting source IP addresses requires either an in-depth knowledge of an application’s active ports on Amazon EC2 instances or analysis of active network traffic. In this blog post, I discuss a method to:
- Use VPC Flow Logs to capture information about the IP traffic in an Amazon VPC.
- Enrich the VPC Flow Logs dataset with security group IDs by using Firehose and Lambda.
- Demonstrate how to visualize and analyze network traffic from VPC Flow Logs by using Amazon Elasticsearch Service (Amazon ES).
Using this approach can help you remediate security group rules to necessary source IPs, ports, and nested security groups, helping to improve the security of your AWS resources while minimizing the potential risk to production environments.
As illustrated in the preceding diagram, this is how the data flows in this model:
- The VPC posts its flow log data to Amazon CloudWatch Logs.
- The Lambda ingestor function passes the data to Firehose.
- Firehose then passes the data to the Lambda decorator function.
- The Lambda decorator function performs a number of lookups for each record and returns the data to Firehose with additional fields.
- Firehose then posts the enhanced dataset to the Amazon ES endpoint and any errors to Amazon S3.
Step 1: Set up your Amazon ES cluster and VPC Flow Logs
Create an Amazon ES cluster
The first step in this solution is to create an Amazon ES cluster. Do this first because it takes some time for the cluster to become available. If you are new to Amazon ES, you can learn more about it in the Amazon ES documentation.
To create an Amazon ES cluster:
- In the AWS Management Console, choose Elasticsearch Service under Analytics.
- Choose Create a new domain or Get started.
- Type es-flowlogs for the Elasticsearch domain name.
- Set Version to 1 in the drop-down list. Choose Next.
- Set Instance count to 2 and select the Enable zone awareness check box. (This ensures cluster stability in the event of an Availability Zone outage.) Accept the defaults for the rest of the page.
- [Optional] If you use this domain for production purposes, I recommend using dedicated master nodes. Select the Enable dedicated master check box and select medium.elasticsearch from the Instance type drop-down list. Leave the Instance count at 3, which is the default.
- Choose Next.
- From the Set the domain access policy to drop-down list on the next page, select Allow access to the domain from specific IP(s). In the dialog box, type or paste the comma-separated list of valid IPv4 addresses or Classless Inter-Domain Routing (CIDR) blocks you would like to be able to access the Amazon ES domain.
- For more information about enabling access for specific AWS Identity and Access Management (IAM) users or roles, see Configuring Access Policies. Also, see How to Control Access to Your Amazon Elasticsearch Service Domain for an in-depth treatment of security with Amazon ES. See Set Access Control for Amazon Elasticsearch Service and Secure Your Elasticsearch Development Domain Using Amazon WorkSpaces for lighter treatments focused on getting started quickly.
- Choose Next.
- On the next page, choose Confirm and create.
It will take a few minutes for the cluster to be available. In the meantime, you can begin enabling VPC Flow Logs.
Enable VPC Flow Logs
VPC Flow Logs is a feature that lets you capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data is stored using Amazon CloudWatch Logs. For more information about VPC Flow Logs, see VPC Flow Logs and CloudWatch Logs.
To enable VPC Flow Logs:
- In the AWS Management Console, choose CloudWatch under Management Tools.
- Click Logs in the navigation pane.
- From the Actions drop-down list, choose Create log group.
- Type Flowlogs as the Log Group Name.
- In the AWS Management Console, choose VPC under Networking & Content Delivery.
- Choose Your VPCs in the navigation pane, and select the VPC you would like to analyze. (You can also enable VPC Flow Logs on only a subnet if you do not want to enable it on the entire VPC.)
- Choose the Flow Logs tab in the bottom pane, and then choose Create Flow Log.
- In the text beneath the Role box, choose Set Up Permissions (this will open an IAM management page).
- Choose Allow on the IAM management page. Return to the VPC Flow Logs setup page.
- Choose All from the Filter drop-down list.
- Choose flowlogsRole from the Role drop-down list (you created this role in steps 3 and 4 in this procedure).
- Choose Flowlogs from the Destination Log Group drop-down list.
- Choose Create Flow Log.
Step 2: Set up AWS Lambda to enrich the VPC Flow Logs dataset with security group IDs
If you completed Step 1, VPC Flow Logs data is now streaming to CloudWatch Logs. Next, you will deploy two Lambda functions. The first, the ingestor function, moves the data into Firehose, and the second, the decorator function, adds three new fields to the VPC Flow Logs dataset and returns records to Firehose for delivery to Amazon ES.
The new fields added by the decorator function are:
- Direction – By comparing the primary IP address of the elastic network interface (ENI) in the destination IP address, you can set the direction for the IP connection.
- Security group IDs – Each ENI can be associated with as many as five security groups. The security group IDs are added as an array in the record.
- Source – This includes a number of fields that result from looking up srcaddr from a free service for geographical lookups.
- The Source includes:
- source-location, latitude, and longitude.
- The Source includes:
Follow the instructions in this GitHub repository to deploy the two Lambda functions and the associated permissions that are required.
Step 3: Set up Firehose
Firehose is a fully managed service that allows you to transform flow log data and stream it into Amazon ES. The service scales automatically with load, and you only pay for the data transmitted through the service.
To create a Firehose delivery stream:
- In the AWS Management Console, choose Kinesis under Analytics.
- Choose Go to Firehose and then choose Create Delivery Stream.
Step 3.1: Define the destination
- Choose Amazon Elasticsearch Service from the Destination drop-down list.
- For Delivery stream name, type VPCFlowLogsToElasticSearch (the name must match the default environment variable in the ingestion Lambda function).
- Choose es-flowlogs from the Elasticsearch domain drop-down list. (The Amazon ES cluster configuration state needs to be Active for es-flowlogs to be available in the drop-down list.)
- For Index, type cwl.
- Choose OneDay from the Index rotation drop-down list.
- For Type, type log.
- For Backup mode, select Failed Documents Only.
- For S3 bucket, select New S3 bucket in the drop-down list and type a bucket name of your choice. Choose Create bucket.
- Choose Next.
Step 3.2: Configure Lambda
- Choose Enable for Data transformation.
- Choose vpc-flow-log-appender-dev-FlowLogDecoratorFunction-xxxxx from the Lambda function drop-down list (make sure you select the Decorator function).
- Choose Create/Update existing IAM role, Firehose delivery IAM roll from the IAM role drop-down list.
- Choose Allow. This takes you back to the Firehose Configuration.
- Choose Next and then choose Create Delivery Stream.
Step 4: Stream data to Firehose
The next step is to enable the data to stream from CloudWatch Logs to Firehose. You will use the Lambda ingestion function you deployed earlier: vpc-flow-log-appender-dev-FlowLogIngestionFunction-xxxxxxx.
- In the AWS Management Console, choose CloudWatch under Management Tools.
- Choose Logs in the navigation pane, and select the check box next to Flowlogs under Log Groups.
- From the Actions menu, choose Stream to AWS Lambda. Choose vpc-flow-log-appender-dev-FlowLogIngestionFunction-xxxxxxx (select the Ingestion function). Choose Next.
- Choose Amazon VPC Flow Logs from the Log Format drop-down list. Choose Next.
- Choose Start Streaming.
VPC Flow Logs will now be forwarded to Firehose, capturing information about the IP traffic going to and from network interfaces in your VPC. Firehose appends additional data fields and forwards the enriched data to your Amazon ES cluster.
Data is now flowing to your Amazon ES cluster, but be patient because it can take up to 30 minutes for the data to begin appearing in your Amazon ES cluster.
Step 5: Verify that the flow log data is streaming through Firehose to the Amazon ES cluster
You should see VPC Flow Logs with ENI IDs under Log Streams (see the following screenshot) and Stored Bytes greater than zero in the CloudWatch log group.
Do you have logs from the Lambda ingestion function in the CloudWatch log group? As shown in the following screenshot, you should see START, END and REPORT records. These show that the ingestion function is running and streaming data to Firehose.
Do you have logs from the Lambda decorator function in the CloudWatch log group? You should see START, END, and REPORT records as well as entries similar to: “Processing completed. Successful records XXX, Failed records 0.”
Do you have cwl-* indexes in the Amazon ES dashboard, as shown in the following screenshot? If you do, you are successfully streaming through Firehose and populating the Amazon ES cluster, and you are ready to proceed to Step 6. Remember, it can take up to 30 minutes for the flow logs from your workloads to begin flowing to the Amazon ES cluster.
Step 6: Using the SGDashboard to analyze VPC network traffic
You now need set up a Kibana dashboard to monitor the traffic in your VPC.
To find the Kibana URL:
- In the AWS Management Console, click Elasticsearch Service under Analytics.
- Choose es-flowlogs under Elasticsearch domain name.
- Click the link next to Kibana, as shown in the following screenshot.
The first time you access Kibana, you will be asked to set the defaultindex. To set the defaultindex in the Amazon ES cluster:
- Set the Index name or pattern to cwl-*.
- For Time-field name, type @timestamp.
- Choose Create.
Load the SGDashboard:
- Download this JSON file and save it to your computer. The file includes a dashboard and visualizations I created for this blog post’s purposes.
- In Kibana, choose Management in the navigation pane, choose Saved Objects, and then import the file you just downloaded.
- Choose Dashboard and Open to load the SGDashboard you just imported. (You might have to press Enter in the top search box to have the dashboard load the first time.)
The following screenshot shows the SGDashboard after it has loaded.
The SGDashboard is composed of a set of visualizations. Each visualization contains a view or summary of the underlying data contained in the Amazon ES cluster, as shown in the preceding screenshot. You can control the timeframe for the dashboard in the upper right corner. By clicking the timeframe, the dashboard exposes alternative timeframes that you can select.
The SGDashboard includes a list of security groups, destination ports, source IP addresses, actions, protocols, and connection directions as well as raw VPC Flow Log records. This information is useful because you can compare this to your security group configurations. Ports might be open in the security group but have no network traffic flowing to the instances on those ports, which means the corresponding rules can probably be removed. Also, by evaluating IP ranges in use, you can narrow the ranges to only those IP addresses required for the application. The following screenshot on the left shows a view of the SGDashboard for a specific security group. By comparing its accepted inbound IP addresses with the security group rules in the following screenshot on the right, you can ensure the source IP ranges are sufficiently restrictive.
Analyze VPC Flow Logs data
Amazon ES allows you to quickly view and filter VPC Flow Logs data to determine what network traffic is flowing in your VPC. This analysis requires an understanding of security groups and elastic network interfaces (ENIs). Let’s say you have two security groups associated with the same ENI, and the first security group has traffic it will register for both groups. You will still see traffic to the ENI listed in the second security group because it is allowing traffic to the ENI. Therefore, when you click a security group that you want to filter, additional groups might still be on the list because they are included in the VPC Flow Logs records.
The following screenshot on the left is a view of the SGDashboard with a security group selected (sg-978414e8). Even though that security group has a filter, two additional security groups remain in the dashboard. The following screenshot on the right shows the raw log data where each record contains all three security groups and demonstrates that all three security groups share a common set of flow log records.
Also, note that security groups are stateful, so if the instance itself is initiating traffic to a different location, the return traffic will be displayed in the Kibana dashboard. The best example of this is port 123 Network Time Protocol (NTP). This type of traffic can be easily removed from the display by choosing the port on the right side of the dashboard, and then reversing the filter, as shown in the following screenshot. By reversing the filter, you can exclude data from the view.
Example: Unused security groups
Let’s say that some security groups are no longer in use. First, I change the time range by clicking the current time range in the top right corner of the dashboard, as shown in the following screenshot. I select Week to date.
As the following screenshot shows, the dashboard has identified five security groups that have had traffic during the week to date.
As you can see in the following screenshot, I have many security groups in my test account that are not in use. Any security groups not in the SGDashboard are candidates for removal.
Example: Unused inbound rules
Let’s take a look at security group sg-63ed8c1c from the preceding screenshot. When I click sg-63ed8c1c (the security group ID) in the dashboard, a filter is applied that reduces the security groups displayed to only the records with that security group included. We can compare the traffic associated with this security group in the SGDashboard (shown in the following screenshot) to the security group rules in the EC2 console.
As the following screenshot of the EC2 console shows, this security group has only 2 inbound rules: one for HTTP on port 80 and one for RDP. The SGDashboard shows that traffic is not flowing on port 80, so I can safely remove that rule from the security group.
It can be challenging to help ensure that your AWS Cloud environment allows only intended traffic and is as secure and manageable as possible. In this post, I have shown how to enable VPC Flow Logs. I then showed how to use Firehose and Lambda to add security group IDs, directions, and locations to the VPC Flow Logs dataset. The SGDashboard then enables you to analyze the flow log data and compare it with your security group configurations to improve your cloud security.
If you have comments about this blog post, submit them in the “Comments” section below. If you have implementation or troubleshooting questions about the solution in this post, please start a new thread on the AWS WAF forum.