Here’s a quick little today I learned about the new RegexBuilder framework in Swift. I haven’t paid that much attention to it so far, as it requires iOS 16 / MacOS 13. It also turns out that I’m pretty good at writing cryptic regular expression syntax, so I haven’t felt a huge need to change.
However, it turns out the new RegexBuilder
can do a cool trick: It can transform a matched substring into some other type for you.
Recently, I’ve been dusting off my Captain’s Log project. The core of that app is just a text file with a bunch of lines that look like this:
2023-02-16 📖 Read (20.0 min)
I parsed that line with this regular expression:
private let lineRegex = try! NSRegularExpression(pattern: #"^(\d{4}-\d{2}-\d{2}) (.*?)(\(.*\))?$"#, options: [])
And part of parsing involved transforming data from one type to another. For example, I don’t want to deal with the string 2023-02-16
, I want to deal with a Day
struct that contains a year/month/day. So in my parsing logic, I have to check to make sure I can build a valid Day
from the string, like so:
guard
let result = lineRegex.matches(in: line, options: [], range: NSRange(location: 0, length: line.utf16.count)).first,
// It's not enough to parse the string; it needs to be a valid Day
let day = Day(line[result.range(at: 1)])
else { return nil }
Now I admit that the regex ^(\d{4}-\d{2}-\d{2}) (.*?)(\(.*\))?$
was easier to write than read, and I first wrote it over two years ago, so when I was looking to add some features to the project I also thought I’d try the new RegexBuilder
to see if it would make the regular expression easier to read and maintain. And that’s when I discovered a cool trick: RegexBuilder
lets you put the string matching and data transformation in one place, where it’s much easier to read and maintain. For example, I now have the following code:
private enum LogEntryRegex {
let day = Regex {
TryCapture {
Regex {
Repeat(count: 4) {
One(.digit)
}
"-"
Repeat(count: 2) {
One(.digit)
}
"-"
Repeat(count: 2) {
One(.digit)
}
}
} transform: { dateString in
Day(dateString)
}
}
}
Now, together in one place, I get to say that “a day regex is supposed to parse a string of this particular format and produce a Day
struct.” If it can’t make the Day
, it doesn’t parse. When I match something against LogEntryRegex.day
, the resulting output
is a Day
struct, not a substring.
This is definitely something I’ll remember on any projects that do a lot of text processing!