A Guide to Ruby Pattern Matching

Pattern matching is the big new feature coming to Ruby 2.7. It has been committed to the trunk so anyone who is interested can install Ruby 2.7.0-dev and check it out. Please bear in mind that none of these are finalized and the dev team is looking for feedback so if you have any, you can let the committers know before the feature is actually out.

I hope you will understand what pattern matching is and how to use it in Ruby after reading this article.

What Is Pattern Matching?

Pattern matching is a feature that is commonly found in functional programming languages. According to Scala documentation, pattern matching is “a mechanism for checking a value against a pattern. A successful match can also deconstruct a value into its constituent parts.”

This is not to be confused with Regex, string matching, or pattern recognition. Pattern matching has nothing to do with string, but instead data structure. The first time I encountered pattern matching was around two years ago when I tried out Elixir. I was learning Elixir and trying to solve algorithms with it. I compared my solution to others and realized they used pattern matching, which made their code a lot more succinct and easier to read.

Because of that, pattern matching really made an impression on me. This is what pattern matching in Elixir looks like:

[a, b, c] = [:hello, "world", 42]
a #=> :hello
b #=> "world"
c #=> 42

The example above looks very much like a multiple assignment in Ruby. However, it is more than that. It also checks whether or not the values match:

[a, b, 42] = [:hello, "world", 42]
a #=> :hello
b #=> "world"

In the examples above, the number 42 on the left hand side isn’t a variable that is being assigned. It is a value to check that the same element in that particular index matches that of the right hand side.

[a, b, 88] = [:hello, "world", 42]
** (MatchError) no match of right hand side value

In this example, instead of the values being assigned, MatchError is raised instead. This is because the number 88 does not match number 42.

It also works with maps (which is similar to hash in Ruby):

%{"name": "Zote", "title": title } = %{"name": "Zote", "title": "The mighty"}
title #=> The mighty

The example above checks that the value of the key name is Zote , and binds the value of the key title to the variable title.

This concept works very well when the data structure is complex. You can assign your variable and check for values or types all in one line.

Furthermore, It also allows a dynamically typed language like Elixir to have method overloading:

def process(%{"animal" => animal}) do
  IO.puts("The animal is: #{animal}")

def process(%{"plant" => plant}) do
  IO.puts("The plant is: #{plant}")

def process(%{"person" => person}) do
  IO.puts("The person is: #{person}")

Depending on the key of the hash of the argument, different methods get executed.

Hopefully, that shows you how powerful pattern matching can be. There are many attempts to bring pattern matching into Ruby with gems such as noaidi, qo, and egison-ruby.

Ruby 2.7 also has its own implementation not too different from these gems, and this is how it’s being done currently.

Ruby Pattern Matching Syntax

Pattern matching in Ruby is done through a case statement. However, instead of using the usual when , the keyword in is used instead. It also supports the use of if or unless statements:

case [variable or expression]
in [pattern]
in [pattern] if [expression]

Case statement can accept a variable or an expression and this will be matched against patterns provided in the in clause. If or unless statements can also be provided after the pattern. The equality check here also uses === like the normal case statement. This means you can match subsets and instance of classes. Here is an example of how you use it:

Matching Arrays

translation = ['th', 'เต้', 'ja', 'テイ']

case translation
in ['th', orig_text, 'en', trans_text]
  puts "English translation: #{orig_text} => #{trans_text}"
in ['th', orig_text, 'ja', trans_text]
  # this will get executed
  puts "Japanese translation: #{orig_text} => #{trans_text}"

In the example above, the variable translation gets matched against two patterns:

['th', orig_text, 'en', trans_text] and ['th', orig_text, 'ja', trans_text] . What it does is to check if the values in the pattern match the values in the translation variable in each of the indices. If the values do match, it assigns the values in the translation variable to the variables in the pattern in each of the indices.

Matching Hashes

translation = {orig_lang: 'th', trans_lang: 'en', orig_txt: 'เต้', trans_txt: 'tae' }

case translation
in {orig_lang: 'th', trans_lang: 'en', orig_txt: orig_txt, trans_txt: trans_txt}
  puts "#{orig_txt} => #{trans_txt}"

In the example above, the translation variable is now a hash. It gets matched against another hash in the in clause. What happens is that the case statement checks if all the keys in the pattern matches the keys in the translation variable. It also checks that all the values for each key match. It then assigns the values to the variable in the hash.

Matching subsets

The quality check used in pattern matching follows the logic of === .

Multiple Patterns

  • | can be used to define multiple patterns for one block.
translation = ['th', 'เต้', 'ja', 'テイ']
case array
in {orig_lang: 'th', trans_lang: 'ja', orig_txt: orig_txt, trans_txt: trans_txt} | ['th', orig_text, 'ja', trans_text]
  puts orig_text #=> เต้
  puts trans_text #=> テイ

In the example above, the translation variable is match against both the {orig_lang: 'th', trans_lang: 'ja', orig_txt: orig_txt, trans_txt: trans_txt} hash and the ['th', orig_text, 'ja', trans_text] array.

This is useful when you have slightly different types of data structures that represent the same thing and you want both data structures to execute the same block of code.

Arrow Assignment

In this case, => can be used to assign matched value to a variable.

case ['I am a string', 10]
in [Integer, Integer] => a
  # not reached
in [String, Integer] => b
  puts b #=> ['I am a string', 10]

This is useful when you want to check values inside the data structure but also bind these values to a variable.

Pin Operator

Here, the pin operator prevents variables from getting reassigned.

case [1,2,2]
in [a,a,a]
  puts a #=> 2

In the example above, variable a in the pattern is matched against 1, 2, and then 2. It will be assigned to 1, then 2, then to 2. This isn’t an ideal situation if you want to check that all the values in the array are the same.

case [1,2,2]
in [a,^a,^a]
  # not reached
in [a,b,^b]
  puts a #=> 1
  puts b #=> 2

When the pin operator is used, it evaluates the variable instead of reassigning it. In the example above, [1,2,2] doesn’t match [a,^a,^a] because in the first index, a is assigned to 1. In the second and third, a is evaluated to be 1, but is matched against 2.

However [a,b,^b] matches [1,2,2] since a is assigned to 1 in the first index, b is assigned to 2 in the second index, then ^b, which is now 2, is matched against 2 in the third index so it passes.

a = 1
case [2,2]
in [^a,^a]
  #=> not reached
in [b,^b]
 puts b #=> 2

Variables from outside the case statement can also be used as shown in the example above.

Underscore ( _ ) Operator

Underscore ( _ ) is used to ignore values. Let’s see it in a couple of examples:

case ['this will be ignored',2]
in [_,a]
  puts a #=> 2
case ['a',2]
in [_,a] => b
  puts a #=> 2
  Puts b #=> ['a',2]

In the two examples above, any value that matches against _ passes. In the second case statement, => operator captures the value that has been ignored as well.

Use Cases for Pattern Matching in Ruby

Imagine that you have the following JSON data:

  nickName: 'Tae'
  realName: {firstName: 'Noppakun', lastName: 'Wongsrinoppakun'}
  username: 'tae8838'

In your Ruby project, you want to parse this data and display the name with the following conditions:

  1. If the username exists, return the username.
  2. If the nickname, first name, and last name exist, return the nickname, first name, and then the last name.
  3. If the nickname doesn’t exist, but the first and last name do, return the first name and then the last name.
  4. If none of the conditions apply, return “New User.”

This is how I would write this program in Ruby right now:

def display_name(name_hash)
  if name_hash[:username]
  elsif name_hash[:nickname] && name_hash[:realname] && name_hash[:realname][:first] && name_hash[:realname][:last]
    "#{name_hash[:nickname]} #{name_hash[:realname][:first]} #{name_hash[:realname][:last]}"
  elsif name_hash[:first] && name_hash[:last]
    "#{name_hash[:first]} #{name_hash[:last]}"
    'New User'

Now, let’s see what it looks like with pattern matching:

def display_name(name_hash)
  case name_hash
  in {username: username}
  in {nickname: nickname, realname: {first: first, last: last}}
    "#{nickname} #{first} #{last}"
  in {first: first, last: last}
    "#{first} #{last}"
    'New User'

Syntax preference can be a little subjective, but I do prefer the pattern matching version. This is because pattern matching allows us to write out the hash we expect, instead of describing and checking the values of the hash. This makes it easier to visualize what data to expect:

`{nickname: nickname, realname: {first: first, last: last}}` 

Instead of:

`name_hash[:nickname] && name_hash[:realname] && name_hash[:realname][:first] && name_hash[:realname][:last]`.

Deconstruct and Deconstruct_keys

There are two new special methods being introduced in Ruby 2.7: deconstruct and deconstruct_keys . When an instance of a class is being matched against an array or hash, deconstruct or deconstruct_keys are called, respectively.

The results from these methods will be used to match against patterns. Here is an example:

class Coordinate
  attr_accessor :x, :y

  def initialize(x, y)
    @x = x
    @y = y

  def deconstruct
    [@x, @y]

  def deconstruct_key
    {x: @x, y: @y}

The code defines a class called Coordinate . It has x and y as its attributes. It also has deconstruct and deconstruct_keys methods defined.

c = Coordinates.new(32,50)

case c
in [a,b]
  p a #=> 32
  p b #=> 50

Here, an instance of Coordinate is being defined and pattern matched against an array. What happens here is that Coordinate#deconstruct is called and the result is used to match against the array [a,b] defined in the pattern.

case c
in {x:, y:}
  p x #=> 32
  p y #=> 50

In this example, the same instance of Coordinate is being pattern-matched against a hash. In this case, the Coordinate#deconstruct_keys result is used to match against the hash {x: x, y: y} defined in the pattern.

An Exciting Experimental Feature

Having first experienced pattern matching in Elixir, I had thought this feature might include method overloading and implemented with a syntax that only requires one line. However, Ruby isn’t a language that is built with pattern matching in mind, so this is understandable.

Using a case statement is probably a very lean way of implementing this and also does not affect existing code (apart from deconstruct and deconstruct_keys methods). The use of the case statement is actually similar to that of Scala’s implementation of pattern matching.

Personally, I think pattern matching is an exciting new feature for Ruby developers. It has the potential to make code a lot cleaner and make Ruby feel a bit more modern and exciting. I would love to see what people make of this and how this feature evolves in the future.