Cleaning up code using ast-grep

There exists a neat tool called ast-grep that finds and replaces code. What distinguishes it from other tools like sed is that instead of merely matching on the text, ast-grep looks for a match based on abstract syntax tree, aka AST. This makes ast-grep much more expressive than sed when it comes to finding and modifying program source code, opening up possibilities like cleaning up a large codebase according to some set of rules.

Let's take a look at a few examples of using ast-grep on source code written in Elixir programming language.

In front of me there's a reasonably large Elixir codebase that survived years of development. Many people with varying degrees of attention to detail worked on it throughout the years, which resulted in the codebase being full of various small inconsistencies.

I was looking to brush up a test suite a little, when code like below caught my eye:

test "returns 401 for unauthenticated request", %{conn: conn} do
conn = get(conn, category_path(conn, :index))
assert json_response(conn, 401)
end

There's couple of tiny, stylistic issues with this code that I am not particular fan of:

Ideally, I wish the above code to be re-written to this:

test "returns 401 for unauthenticated request", %{conn: conn} do
conn
|> get(category_path(conn, :index))
|> json_response(401)
end

There's plenty of places like this in the code, and it would be nice to standardize by making them all look alike. We could write a sed like this:

rg -F -l "conn = get(conn, category_path(conn, :index))" | xargs sed -i -e '/conn = get(conn, category_path(conn, :index))/{N;s/.*conn = get(conn, category_path(conn, :index))\n.*assert json_response(conn, 401)/ conn\n |> get(category_path(conn, :index))\n |> json_response(401)/;}'

It has the following downsides:

Let's see if ast-grep can achieve the result while addressing sed's shortcomings.

First, in the root folder of my project I will create a file called rules.yaml with the following contents:

id: conn-pipe-syntax-1
language: elixir
rule:
pattern: |
conn = get(conn, category_path(conn, :index))
assert json_response(conn, 401)
fix:
template: |
conn
|> get(category_path(conn, :index))
|> json_response(conn, 401)

Then attempt to apply this "rule" by calling ast-grep in the root folder of the project:

ast-grep scan --rule rules.yaml --update-all

Sadly, the command fails with the following output:

Error: Cannot parse rule rules.yaml
Help: The file is not a valid ast-grep rule. Please refer to doc and fix the error.
See also: https://ast-grep.github.io/guide/rule-config.html
Caused by
╰▻ Fail to parse yaml as Rule.
╰▻ `rule` is not configured correctly.
╰▻ Rule contains invalid pattern matcher.
╰▻ Multiple AST nodes are detected. Please check the pattern source `conn = get(conn, category_path(conn, :index))
assert json_response(conn, 401)
`.

The error says something about "multiple AST nodes". Turns out that we are specifying multiple AST nodes at once - I am not a tree-sitter expert (ast-grep uses tree-sitter to parse the code), but will assume that each line is parsed as a separate AST node here.

Instead, if we want to tell ast-grep to look at multiple nodes, we should go for expressing how they are positioned relevant to each other. For example, by expressing that the target piece of code precedes some other piece of code like this:

id: conn-pipe-syntax-1
language: elixir
rule:
pattern: conn = get(conn, category_path(conn, :index))
precedes:
pattern: assert json_response(conn, 401)
fix:
template: |
conn
|> get(category_path(conn, :index))
|> json_response(conn, 401)

By using precedes we can say our rule just became relational.

Re-running the command no longer raises an error:

ast-grep scan --rule rules.yaml --update-all .
# Applied 1 changes

Wonderful, something worked! Let's check what it did:

diff --git a/test/api/controllers/category_controller_test.exs b/test/api/controllers/category_controller_test.exs
index 7f3809922..6511b39f7 100644
--- a/test/api/controllers/category_controller_test.exs
+++ b/test/api/controllers/category_controller_test.exs
@@ -53,7 +53,10 @@ defmodule Api.CategoryControllerTest do
end
test "returns 401 for unauthenticated request", %{conn: conn} do
- conn = get(conn, category_path(conn, :index))
+ conn
+ |> get(category_path(conn, :index))
+ |> json_response(conn, 401)
+
assert json_response(conn, 401)
end
end

Kind of nice!

Before we move on, I should mention that ast-grep offers an online playground, where one can play around with different snippets of code and matching rules. Conveniently, playground will also show the AST for the code snippets - an indispensable tool for building and debugging rules. Here's how the playground looks like for the code snippet, as well as the rule above:

You can even play with this yourself by using the following link: playground. Click on the "Diff" tab in the top left corner to see the result!

We're now facing another problem: assert json_response(conn, 401) line was not removed. Apparently, only code that matches the pattern gets removed. But how do we tell ast-grep to also remove assert json_response(conn, 401)? To do this, we can use an option called extendEnd in the fix section like this:

diff --git a/rules.yaml b/rules.yaml
index 23a11fb42..0cfef889a 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -10,3 +10,6 @@ fix:
conn
|> get(category_path(conn, :index))
|> json_response(conn, 401)
+ expandEnd:
+ pattern: |
+ assert json_response(conn, 401)

This will expand the fix to also grab and replace assert json_response(conn, 401). Let's reset the changes on the file made after our first attempt, and re-run the ast-grep command. We will see this:

diff --git a/test/api/controllers/category_controller_test.exs b/test/api/controllers/category_controller_test.exs
index 7f3809922..dc24111df 100644
--- a/test/api/controllers/category_controller_test.exs
+++ b/test/api/controllers/category_controller_test.exs
@@ -53,8 +53,10 @@ defmodule Api.CategoryControllerTest do
end
test "returns 401 for unauthenticated request", %{conn: conn} do
- conn = get(conn, category_path(conn, :index))
- assert json_response(conn, 401)
+ conn
+ |> get(category_path(conn, :index))
+ |> json_response(conn, 401)
+
end
end
end

Much better! However, now there's another problem with it - for some reason there the replacement produced an unnecessary newline. When dealing with Elixir code, a common way to remove newlines is to run mix format command. However, perhaps there's a way ast-grep could be asked to not produce the newline in the first place?

Turns out there is such a way. This time the modification we need to do in rules.yaml is not related to ast-grep itself, but rather to yaml - we want to use what's called a block chomping indicator, that will help remove the newline character:

diff --git a/rules.yaml b/rules.yaml
index 0cfef889a..84b0a5381 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -6,7 +6,7 @@ rule:
precedes:
pattern: assert json_response(conn, 401)
fix:
- template: |
+ template: |-
conn
|> get(category_path(conn, :index))
|> json_response(conn, 401)

After resetting the changes on the file and re-running the ast-grep we get this:

diff --git a/test/api/controllers/category_controller_test.exs b/test/api/controllers/category_controller_test.exs
index 7f3809922..4c9d11d93 100644
--- a/test/api/controllers/category_controller_test.exs
+++ b/test/api/controllers/category_controller_test.exs
@@ -53,8 +53,9 @@ defmodule Api.CategoryControllerTest do
end
test "returns 401 for unauthenticated request", %{conn: conn} do
- conn = get(conn, category_path(conn, :index))
- assert json_response(conn, 401)
+ conn
+ |> get(category_path(conn, :index))
+ |> json_response(conn, 401)
end
end
end

Finally, a nice and clean change!

There's another problem with it - it only fixes a single place in a single test file. There's so many more places in the tests that look very similar to the pattern above. For example:

test "returns 404 for unauthenticated request", %{conn: conn} do
contract = insert(:contract)
conn = get(conn, contract_path(conn, :show, contract.id))
assert json_response(conn, 404)
end

If only we could somehow explain to ast-grep that we'd like to perform find-and-replace disregarding the method, path or response code used.

Turns out there's just a way to do that - by parametrizing parts of the patterns. Let's proceed with modifying our rules.yaml like this:

diff --git a/rules.yaml b/rules.yaml
index 0cfef889a..f6531e9c0 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -2,14 +2,14 @@ id: conn-pipe-syntax-1
language: elixir
rule:
- pattern: conn = get(conn, category_path(conn, :index))
+ pattern: conn = $METHOD(conn, $PATH)
precedes:
- pattern: assert json_response(conn, 401)
+ pattern: assert json_response(conn, $STATUS)
fix:
- template: |
+ template: |-
conn
- |> get(category_path(conn, :index))
- |> json_response(conn, 401)
+ |> $METHOD($STATUS)
+ |> json_response(conn, $STATUS)
expandEnd:
pattern: |
- assert json_response(conn, 401)
+ assert json_response(conn, $STATUS)

After re-run the ast-grep command. We will see this output:

Applied 37 changes

Now we're getting somewhere! Reviewing the code reveals various fixes that looks like this:

diff --git a/test/api/controllers/contract_controller_test.exs b/test/api/controllers/contract_controller_test.exs
index cfabef8f2..0996690d3 100644
--- a/test/api/controllers/contract_controller_test.exs
+++ b/test/api/controllers/contract_controller_test.exs
@@ -658,9 +658,9 @@ defmodule Api.ContractControllerTest do
test "returns 401 for unauthenticated request", %{conn: conn} do
contract = insert(:contract)
- conn = get(conn, contract_path(conn, :show, contract.id))
-
- assert json_response(conn, 401)
+ conn
+ |> get(contract_path(conn, :show, contract.id))
+ |> json_response(conn, 401)
end
test "validates UUID", %{conn: conn} do
@@ -791,9 +791,9 @@ defmodule Api.ContractControllerTest do
test "returns 401 for unauthenticated request", %{conn: conn} do
contract = insert(:contract)
- conn = get(conn, contract_path(conn, :preview, contract.id))
-
- assert json_response(conn, 401)
+ conn
+ |> get(contract_path(conn, :preview, contract.id))
+ |> json_response(conn, 401)
end
end
@@ -1055,8 +1055,9 @@ defmodule Api.ContractControllerTest do
end
test "returns 401 for unauthenticated request", %{conn: conn} do
- conn = get(conn, contract_path(conn, :index))
- assert json_response(conn, 401)
+ conn
+ |> get(contract_path(conn, :index))
+ |> json_response(conn, 401)
end

There's however another problem. Upon inspecting the code I see patterns like this:

conn =
conn
|> authenticate_conn(account)
|> patch(contract_path(conn, :update, contract.id))
assert json_response(conn, 403)

The only difference this code has to the pattern we've been targeting so far is the presence of this line:

|> authenticate_conn(account)

In order to take this one into account too, let's add another rule:

diff --git a/rules.yaml b/rules.yaml
index b711ffbc3..309c15d7a 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -13,3 +13,23 @@ fix:
expandEnd:
pattern: |
assert json_response(conn, $STATUS)
+---
+id: conn-pipe-syntax-2
+language: elixir
+rule:
+ pattern: |
+ conn =
+ conn
+ |> authenticate_conn($ACCOUNT)
+ |> $METHOD($PATH)
+ precedes:
+ pattern: assert json_response(conn, $STATUS)
+fix:
+ template: |-
+ conn
+ |> authenticate_conn($ACCOUNT)
+ |> $METHOD($PATH)
+ |> json_response($STATUS)
+ expandEnd:
+ pattern: |
+ assert json_response(conn, $STATUS)

After re-running ast-grep we get various changes that look like this:

- conn =
- conn
- |> authenticate_conn(account)
- |> patch(contract_path(conn, :update, contract.id))
-
- assert json_response(conn, 403)
+ conn
+ |> authenticate_conn(account)
+ |> patch(contract_path(conn, :update, contract.id))
+ |> json_response(403)
end

Nice, another stray code pattern tamed. But there's another problem™ now - the code like this didn't get replaced:

conn =
conn
|> get(user_path(conn, :return_url))
assert json_response(conn, 401)

No wonder, it looks a tiny bit different to the pattern we've seen before, but in the end it's but a variation:

# previous
conn = get(conn, some_path(conn, :return_url))
# new
conn = conn |> get(some_path(conn, :return_url))

Let's cover this one too. This time, instead of adding a whole new rule, let's modify our very first rule by turning it into what's known as a composite rule:

diff --git a/rules.yaml b/rules.yaml
index 309c15d7a..5b31bdaa1 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -1,8 +1,9 @@
id: conn-pipe-syntax-1
language: elixir
rule:
- pattern: |
- conn = $METHOD(conn, $PATH)
+ any:
+ - pattern: conn = conn |> $METHOD($PATH)
+ - pattern: conn = $METHOD(conn, $PATH)
precedes:
pattern: assert json_response(conn, $STATUS)
fix:

Note how I didn't really need to format the pattern in the way that the code in question looks like. In this context pattern doesn't really care about extra spaces or newlines. That's the power of matching on AST right there. But let's move on.

Upon reviewing more tests, I see another recurring pattern that hasn't been covered so far. Here's an example:

conn =
conn
|> post(contract_signature_path(conn, :request_changes, contract.id), comment: "x", token: "invalid_token")
assert json_response(conn, 403)

It kind of looks like what we've dealt with already, but there's a key distinction: a call to path helper function is followed by another parameter, a map. The patterns we've written so far don't really know how to match this case.

We could cover problematic code patterns by adding two more rules - alternatives of rule 1 and 2, this time with $PARAMS. The rules could look like this:

id: conn-pipe-syntax-3
language: elixir
rule:
any:
- pattern: conn = conn |> $METHOD($PATH, $PARAMS)
- pattern: conn = $METHOD(conn, $PATH, $PARAMS)
precedes:
pattern: assert json_response(conn, $STATUS)
fix:
template: |-
conn
|> $METHOD($PATH, $PARAMS)
|> json_response(conn, $STATUS)
expandEnd:
pattern: |
assert json_response(conn, $STATUS)
---
id: conn-pipe-syntax-4
language: elixir
rule:
pattern: |
conn =
conn
|> authenticate_conn($ACCOUNT)
|> $METHOD($PATH, $PARAMS)
precedes:
pattern: assert json_response(conn, $STATUS)
fix:
template: |-
conn
|> authenticate_conn($ACCOUNT)
|> $METHOD($PATH, $PARAMS)
|> json_response($STATUS)
expandEnd:
pattern: |
assert json_response(conn, $STATUS)

However, this is a 100% increase in the rule footprint we need to support. If only there could be a way to express "all parameters" in ast-grep, we could cover both $PATH, $PARAMS or other optional parameters, if any. Turns out there is and it is called multi meta variable. Let's see it in action. I'll replace $PATH with a new variable I called $$$ALL_PARAMS:

diff --git a/rules.yaml b/rules.yaml
index 5b31bdaa1..e495b5e58 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -2,14 +2,14 @@ id: conn-pipe-syntax-1
language: elixir
rule:
any:
- - pattern: conn = conn |> $METHOD($PATH)
- - pattern: conn = $METHOD(conn, $PATH)
+ - pattern: conn = conn |> $METHOD($$$ALL_PARAMS)
+ - pattern: conn = $METHOD(conn, $$$ALL_PARAMS)
precedes:
pattern: assert json_response(conn, $STATUS)
fix:
template: |-
conn
- |> $METHOD($PATH)
+ |> $METHOD($$$ALL_PARAMS)
|> json_response(conn, $STATUS)
expandEnd:
pattern: |
@@ -22,14 +22,14 @@ rule:
conn =
conn
|> authenticate_conn($ACCOUNT)
- |> $METHOD($PATH)
+ |> $METHOD($$$ALL_PARAMS)
precedes:
pattern: assert json_response(conn, $STATUS)
fix:
template: |-
conn
|> authenticate_conn($ACCOUNT)
- |> $METHOD($PATH)
+ |> $METHOD($$$ALL_PARAMS)
|> json_response($STATUS)
expandEnd:
pattern: |

By utilizing a way to express rules more precisely, we just saved ourselves from the need to maintain more rule code.

Upon reviewing more code, I discovered that we have a test like this:

conn = put(conn, contract_path(conn, :move, contract.id, %{target_folder_id: folder.id}))
assert response(conn, 401)

This looks almost exactly like what we've dealt with before. However, instead of json_response helper there is now one called response. Let's use parametrisation again - this time we'll replace the name of helper function:

diff --git a/rules.yaml b/rules.yaml
index e495b5e58..b5e08456b 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -5,15 +5,15 @@ rule:
- pattern: conn = conn |> $METHOD($$$ALL_PARAMS)
- pattern: conn = $METHOD(conn, $$$ALL_PARAMS)
precedes:
- pattern: assert json_response(conn, $STATUS)
+ pattern: assert $RESPONSE_HELPER(conn, $STATUS)
fix:
template: |-
conn
|> $METHOD($$$ALL_PARAMS)
- |> json_response(conn, $STATUS)
+ |> $RESPONSE_HELPER(conn, $STATUS)
expandEnd:
pattern: |
- assert json_response(conn, $STATUS)
+ assert $RESPONSE_HELPER(conn, $STATUS)
---
id: conn-pipe-syntax-2
language: elixir
@@ -24,13 +24,13 @@ rule:
|> authenticate_conn($ACCOUNT)
|> $METHOD($$$ALL_PARAMS)
precedes:
- pattern: assert json_response(conn, $STATUS)
+ pattern: assert $RESPONSE_HELPER(conn, $STATUS)
fix:
template: |-
conn
|> authenticate_conn($ACCOUNT)
|> $METHOD($$$ALL_PARAMS)
- |> json_response($STATUS)
+ |> $RESPONSE_HELPER($STATUS)
expandEnd:
pattern: |
- assert json_response(conn, $STATUS)
+ assert $RESPONSE_HELPER(conn, $STATUS)

Nice, re-running ast-grep now finds many more instances of the pattern and makes it consistent everywhere.

Finally, upon reviewing even more tests I find this:

assert response =
conn
|> authenticate_conn(account)
|> post(document_path(conn, :create, draft.id), %{
title: "My draft",
content: "This is the content"
})
|> json_response(201)
assert response == %{"document" => %{"id" => "3ec51098-d00c-496b-bfbf-f90cb4e1a4ba"}}

The issue here is we've seen before: there's no real need assert on the body of response. This doesn't really look like a pattern we've been fighting so far, but close enough to show another cool feature of ast-grep.

First, let's try to fix this with the knowledge we already have, e.g. by introducing another rule:

diff --git a/rules.yaml b/rules.yaml
index b5e08456b..365714b45 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -34,3 +34,20 @@ fix:
expandEnd:
pattern: |
assert $RESPONSE_FUNCTION(conn, $STATUS)
+---
+id: conn-pipe-syntax-6
+language: elixir
+rule:
+ pattern: |
+ assert $VARIABLE =
+ conn
+ |> authenticate_conn($ACCOUNT)
+ |> $METHOD($$$ALL_PARAMS)
+ |> $RESPONSE_FUNCTION($STATUS)
+fix:
+ template: |-
+ $VARIABLE =
+ conn
+ |> authenticate_conn($ACCOUNT)
+ |> $METHOD($$$ALL_PARAMS)
+ |> $RESPONSE_FUNCTION($STATUS)

This successfully fixes the problem above. However, this inadvertently also "fixes" code that looks like this:

-assert %{"drafts" => draft} =
- conn
- |> authenticate_conn(owner)
- |> get(draft_path(conn, :index))
- |> json_response(200)
+%{"drafts" => draft} =
+ conn
+ |> authenticate_conn(owner)
+ |> get(draft_path(conn, :index))
+ |> json_response(200)

There's a subtle change here. While the test semantics didn't change, it's possible that in future, in case it happens that the code under test makes the test fail, the error message will be a not-so-nice-looking MatchError, as opposed to a nicely formatted diff produced by ExUnit's assert macro.

So how do we make ast-grep understand that it should only ever match code that contains a variable, and not a pattern? We can do so by introducing a constraints filter on the type of variable - the variable should be an identifier:

diff --git a/rules.yaml b/rules.yaml
index f49c23a57..0c5fd1c6e 100644
--- a/rules.yaml
+++ b/rules.yaml
@@ -44,6 +44,9 @@ rule:
|> authenticate_conn($ACCOUNT)
|> $METHOD($$$ALL_PARAMS)
|> $RESPONSE_FUNCTION($STATUS)
+constraints:
+ VARIABLE:
+ kind: identifier
fix:
template: |-
$VARIABLE =

After resetting changes on the modified test files and re-running the ast-grep command above, the unnecessary change is visible no longer. Lovely :)

In the end, we ended up with rules.yaml file that looks like this:

id: conn-pipe-syntax-1
language: elixir
rule:
any:
- pattern: conn = conn |> $METHOD($$$ALL_PARAMS)
- pattern: conn = $METHOD(conn, $$$ALL_PARAMS)
precedes:
pattern: assert $RESPONSE_FUNCTION(conn, $STATUS)
fix:
template: |-
conn
|> $METHOD($$$ALL_PARAMS)
|> $RESPONSE_FUNCTION($STATUS)
expandEnd:
pattern: |
assert $RESPONSE_FUNCTION(conn, $STATUS)
---
id: conn-pipe-syntax-2
language: elixir
rule:
pattern: |
conn =
conn
|> authenticate_conn($ACCOUNT)
|> $METHOD($$$ALL_PARAMS)
precedes:
pattern: assert $RESPONSE_FUNCTION(conn, $STATUS)
fix:
template: |-
conn
|> authenticate_conn($ACCOUNT)
|> $METHOD($$$ALL_PARAMS)
|> $RESPONSE_FUNCTION($STATUS)
expandEnd:
pattern: |
assert $RESPONSE_FUNCTION(conn, $STATUS)
---
id: conn-pipe-syntax-3
language: elixir
rule:
pattern: |
assert $VARIABLE =
conn
|> authenticate_conn($ACCOUNT)
|> $METHOD($$$ALL_PARAMS)
|> $RESPONSE_FUNCTION($STATUS)
constraints:
VARIABLE:
kind: identifier
fix:
template: |-
$VARIABLE =
conn
|> authenticate_conn($ACCOUNT)
|> $METHOD($$$ALL_PARAMS)
|> $RESPONSE_FUNCTION($STATUS)

This can set of rule can now be added to the CI script:

ast-grep scan --rule rule.yaml

In case there are violations, the following output will be produced:

test/api/controllers/category_controller_test.exs
help[conn-pipe-syntax-1]:
@@ -52,8 +52,9 @@
53 53│ end
54 54│
55 55│ test "returns 401 for unauthenticated request", %{conn: conn} do
56 │- conn = get(conn, category_path(conn, :index))
57 │- assert json_response(conn, 401)
56│+ conn
57│+ |> get(category_path(conn, :index))
58│+ |> json_response(401)
58 59│ end
59 60│ end
60 61│ end
test/api/controllers/category_controller_test.exs
help[conn-pipe-syntax-5]:
@@ -10,12 +10,11 @@
11 11│ insert(:category, owner: account_1)
12 12│ insert(:category, owner: account_2)
13 13│
14 │- conn =
14│+ assert %{"categories" => categories} =
15 15│ conn
16 16│ |> authenticate_conn(account_1)
17 17│ |> get(category_path(conn, :index))
18 │-
19 │- assert %{"categories" => categories} = json_response(conn, 200)
18│+ |> json_response(200)
20 19│
21 20│ assert Enum.count(categories) == 2
22 21│ end

This concludes our short journey into grep-ast. Doing all these substitutions manually would be a chore, to put mildly.

We've looked at any, precedes, expandEnd, multi metavariable and constraints (using kind). But we've barely scratched the surface of what's possible, with ast-grep offering many more options to build rules, and of abitrary depth.

The website offers comprehensive documentation with many examples: ast-grep.github.io. The author is very helpful too, and is quite active with the tool's official Discord channel.

Happy find-and-replacing!