Regular Expressions in Julia

Julia provides built-in support for regular expressions. Here are some examples of common regexp-related tasks in Julia.

using Regex

# This tests whether a pattern matches a string.
match = occursin(r"p([a-z]+)ch", "peach")
println(match)

# Above we used a string pattern directly, but for
# other regexp tasks you'll need to create a Regex object.
r = Regex("p([a-z]+)ch")

# Many methods are available on these objects. Here's
# a match test like we saw earlier.
println(occursin(r, "peach"))

# This finds the match for the regexp.
println(match(r, "peach punch").match)

# This also finds the first match but returns the
# start and end indexes for the match instead of the
# matching text.
m = match(r, "peach punch")
println("idx: $(m.offset):$(m.offset + length(m.match) - 1)")

# The match function includes information about
# both the whole-pattern matches and the submatches
# within those matches. For example this will return
# information for both p([a-z]+)ch and ([a-z]+).
m = match(r, "peach punch")
println([m.match, m.captures...])

# Similarly this will return information about the
# indexes of matches and submatches.
m = match(r, "peach punch")
println([m.offset, m.offsets...])

# The eachmatch function applies to all
# matches in the input, not just the first. For
# example to find all matches for a regexp.
println([m.match for m in eachmatch(r, "peach punch pinch")])

# These variants are available for the other
# functions we saw above as well.
println("all:", [[m.offset, m.offsets...] for m in eachmatch(r, "peach punch pinch")])

# Providing a range as the third argument to eachmatch
# will limit the number of matches.
println([m.match for m in eachmatch(r, "peach punch pinch", 1:2)])

# Our examples above had string arguments and used
# names like match. We can also provide
# Vector{UInt8} arguments and use the same function names.
println(occursin(r, Vector{UInt8}("peach")))

# When creating global variables with regular
# expressions you can use the @r_str macro
# which is equivalent to Regex().
r = r"p([a-z]+)ch"
println("regexp: ", r)

# The replace function can be used to replace
# subsets of strings with other values.
println(replace("a peach", r => "<fruit>"))

# The Func variant allows you to transform matched
# text with a given function.
in = Vector{UInt8}("a peach")
out = replace(String(in), r => s -> uppercase(s))
println(out)

To run the program, save it as regular_expressions.jl and use julia.

$ julia regular_expressions.jl
true
true
peach
idx: 1:5
["peach", "ea"]
[1, 2]
["peach", "punch", "pinch"]
all: [[1, 2], [7, 8], [13, 14]]
["peach", "punch"]
true
regexp: p([a-z]+)ch
a <fruit>
a PEACH

For a complete reference on Julia regular expressions, check the Regex module documentation.