My first Terraform AWS open-source contribution

Hello! It's been some time since I last posted on my old blog, so I decided to write again to improve my writing and communication skills.
I want to show you a problem that I faced at my current job.
The problem
I'm working at a data engineering project and the infrastructure is managed through Terraform. The Lake Formation permissions are managed there as well, but we were facing a really strange problem. Every time a terraform plan was triggered, it forced the replacement of the aws_lakeformation_permissions resource:

I could see a few people reporting this problem as well:
https://github.com/hashicorp/terraform-provider-aws/issues/22570
https://github.com/hashicorp/terraform-provider-aws/issues/31096
The Lake Formation permission is the guardrail for controlling who has access to the Glue Tables and which kind of permissions they can use. When the pipeline is replacing that every time, if a job is running, it can lose access until the permission is recreated. If you don't have a proper mechanism of retrying, it would mean you would have missing data due to the non-completion of the job.
Fix on Terraform application side
To fix the problem on Terraform it's pretty simple, you need to define the permissions in alphabetical order, like:
# Bug is triggered
resource "aws_lakeformation_permissions" "permission" {
permissions = ["SELECT", "DESCRIBE"]
}
# Bug is fixed
resource "aws_lakeformation_permissions" "permission" {
permissions = ["DESCRIBE", "SELECT"]
}
Even though this is a simple fix, the underlying problem in Terraform shouldn't exist. The order of the permissions shouldn't matter in this case. So, I decided to read the code for the AWS provider to provide a fix for the project.
Understanding Terraform AWS Provider
Terraform AWS Provider docs are pretty good! I left a few good links here so you're able to contribute to the project as well:
https://hashicorp.github.io/terraform-provider-aws/running-and-writing-acceptance-tests/
https://hashicorp.github.io/terraform-provider-aws/raising-a-pull-request/
https://hashicorp.github.io/terraform-provider-aws/changelog-process/
https://developer.hashicorp.com/terraform/plugin/sdkv2/testing/acceptance-tests/teststep
A good thing to understand when you start to work in a new project, is to understand the conventions that are used on it. It's good to see previous commits, previous merged pull requests and see how other modules inside the repository are implementing the same stuff.
Here is an example of a merged pull request and its files: https://github.com/hashicorp/terraform-provider-aws/pull/18203/files
Steps to Fix the Problem
Evaluate a minimal configuration to reproduce the problem;
Check for similar problems in the project's issue tracker;
See the code for the resource (aws_lakeformation_permissions) resource and understand its structure;
Create a failing acceptance test (tests with real resource creation in an AWS environment);
Fix the bug;
Make sure the acceptance test is passing.
To create the failing acceptance test, I just needed to change the order of permissions, making sure they were not alphabetical:
=== RUN TestAccLakeFormation_diogenes/PermissionsBasic/sequentialPlan
permissions_test.go:818: Step 1/2 error: After applying this test step, the non-refresh plan was not empty.
stdout:
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# aws_lakeformation_permissions.test must be replaced
-/+ resource "aws_lakeformation_permissions" "test" {
~ id = "403332802" -> (known after apply)
~ permissions = [ # forces replacement
- "DESCRIBE",
"SELECT",
+ "DESCRIBE",
]
~ permissions_with_grant_option = [] -> (known after apply)
# (2 unchanged attributes hidden)
~ table_with_columns {
~ catalog_id = "705532038218" -> (known after apply)
name = "tf-acc-test-7670134761911835478"
# (2 unchanged attributes hidden)
}
}
Plan: 1 to add, 0 to change, 1 to destroy.
The problem is related to the resource's schema using a list, which makes the order of elements important. When AWS returns the information to Terraform for comparison, it indicates that they are different, forcing recreation:
your resource: ["SELECT", "DESCRIBE"]
AWS return call: ["DESCRIBE", "SELECT"]
The fix is quite simple: use a "set" instead of a list, so the order of elements is not considered. You can read the code here:
https://github.com/hashicorp/terraform-provider-aws/pull/38047/files
Conclusion
Contributing to open-source projects is a great way to improve your skills. You learn new ways to write code, explore new tools (like this one), and expand your knowledge.
It was a good motivator for me to start sending more PRs to them and to other projects. Some of them have already been accepted, but unfortunately, not this one yet.