Today I Learned: RegexBuilder

Here’s a quick little today I learned about the new RegexBuilder framework in Swift. I haven’t paid that much attention to it so far, as it requires iOS 16 / MacOS 13. It also turns out that I’m pretty good at writing cryptic regular expression syntax, so I haven’t felt a huge need to change.

However, it turns out the new RegexBuilder can do a cool trick: It can transform a matched substring into some other type for you.

Recently, I’ve been dusting off my Captain’s Log project. The core of that app is just a text file with a bunch of lines that look like this:

2023-02-16 📖 Read (20.0 min)

I parsed that line with this regular expression:

private let lineRegex = try! NSRegularExpression(pattern: #"^(\d{4}-\d{2}-\d{2}) (.*?)(\(.*\))?$"#, options: [])

And part of parsing involved transforming data from one type to another. For example, I don’t want to deal with the string 2023-02-16, I want to deal with a Day struct that contains a year/month/day. So in my parsing logic, I have to check to make sure I can build a valid Day from the string, like so:

   guard
       let result = lineRegex.matches(in: line, options: [], range: NSRange(location: 0, length: line.utf16.count)).first,
       // It's not enough to parse the string; it needs to be a valid Day
       let day = Day(line[result.range(at: 1)])
   else { return nil }

Now I admit that the regex ^(\d{4}-\d{2}-\d{2}) (.*?)(\(.*\))?$ was easier to write than read, and I first wrote it over two years ago, so when I was looking to add some features to the project I also thought I’d try the new RegexBuilder to see if it would make the regular expression easier to read and maintain. And that’s when I discovered a cool trick: RegexBuilder lets you put the string matching and data transformation in one place, where it’s much easier to read and maintain. For example, I now have the following code:

private enum LogEntryRegex {
  let day = Regex {
     TryCapture {
       Regex {
         Repeat(count: 4) {
           One(.digit)
         }
         "-"
         Repeat(count: 2) {
           One(.digit)
         }
         "-"
         Repeat(count: 2) {
           One(.digit)
         }
       }
     } transform: { dateString in
       Day(dateString)
     }
   }
}

Now, together in one place, I get to say that “a day regex is supposed to parse a string of this particular format and produce a Day struct.” If it can’t make the Day, it doesn’t parse. When I match something against LogEntryRegex.day, the resulting output is a Day struct, not a substring.

This is definitely something I’ll remember on any projects that do a lot of text processing!