Why are emoji characters like 👩โ€👩โ€👧โ€👦 treated so strangely in Swift strings?

Cover Image for Why are emoji characters like 👩โ€👩โ€👧โ€👦 treated so strangely in Swift strings?
Matheus Mello
Matheus Mello
published a few days ago. updated a few hours ago

Why are emoji characters like ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ treated so strangely in Swift strings?

๐Ÿค” Have you ever wondered why emoji characters like ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ are not behaving as expected in Swift strings? It can be quite frustrating when you try to perform operations like checking if a string contains a certain emoji character, and Swift gives unexpected results. ๐Ÿ˜ฉ

In this blog post, we will explore why Swift treats emoji characters with zero-width joiners (ZWJ) like ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ in such a strange manner. We will also provide easy solutions and tips to work around this issue. So, let's dive in! ๐ŸŠโ€โ™€๏ธ

The Encoding Mystery:

First, let's examine how the emoji character ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ is encoded. It consists of the following Unicode characters:

  1. U+1F469 WOMAN

  2. U+200D ZWJ (Zero-Width Joiner)

  3. U+1F469 WOMAN

  4. U+200D ZWJ

  5. U+1F467 GIRL

  6. U+200D ZWJ

  7. U+1F466 BOY

This character encoding is interesting, but unfortunately, Swift doesn't handle it as expected. Let's take a look at some examples that illustrate this behavior.

Unexpected Results in Swift:

Consider the following Swift code snippets:

"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ") // true
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ฉ") // false
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("\u{200D}") // false
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ง") // false
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ฆ") // true

๐Ÿ’ญ It's confusing, right? Swift claims that the string "๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ" contains itself and the boy emoji "๐Ÿ‘ฆ", but not the woman emoji "๐Ÿ‘ฉ", girl emoji "๐Ÿ‘ง", or the ZWJ character "โ€". ๐Ÿค”

The Culprit: Grapheme Clusters:

To understand this strange behavior, we need to dive into the concept of grapheme clusters in Swift. A grapheme cluster is the smallest unit of text that is perceived as a single unit by users. In simple terms, emojis combined with ZWJ are treated as a single grapheme cluster in Swift.

When you call the contains method on a string, Swift looks for complete grapheme clusters. But in the case of characters like "๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ", Swift treats it as a single grapheme cluster, and hence, it searches for the entire cluster in the string. This explains why the search for individual components like "๐Ÿ‘ฉ" or "๐Ÿ‘ง" fails. ๐Ÿ˜ฑ

Solutions and Workarounds:

Now that we understand why Swift behaves strangely, let's explore some solutions and workarounds to deal with this issue.

  1. Splitting the String: To work with individual components, you can split the string into an array using the characters property. Let's take a look at an example:

let manual = "\u{1F469}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}"
Array(manual.characters) // ["๐Ÿ‘ฉโ€", "๐Ÿ‘ฉโ€", "๐Ÿ‘งโ€", "๐Ÿ‘ฆ"]

By splitting the string, you can access individual components, but remember that the ZWJ characters might not be reflected in the resulting array. So, searching for individual components using methods like contains may still give unexpected results.

  1. Using Unicode Scalars: Another approach is to work with Unicode scalars directly. You can access individual components using Unicode scalar representations. Here's an example:

let manual = "\u{1F469}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}"
let unicodeScalars = manual.unicodeScalars.map { String($0) }
unicodeScalars.contains("๐Ÿ‘ฉ") // true
unicodeScalars.contains("๐Ÿ‘ง") // true
unicodeScalars.contains("๐Ÿ‘ฆ") // true

By accessing the individual scalars, you can perform operations more accurately.

  1. Using Regular Expressions: If you need more complex operations on emoji strings, you can leverage regular expressions. Regular expressions offer powerful pattern matching capabilities to search for specific components or patterns in a string. Here's an example using regular expressions:

let manual = "\u{1F469}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}"
let pattern = "๐Ÿ‘ฉ|๐Ÿ‘ง|๐Ÿ‘ฆ"
let regex = try NSRegularExpression(pattern: pattern)
let matches = regex.numberOfMatches(in: manual, range: NSRange(location: 0, length: manual.utf16.count))

Regular expressions give you flexibility and control over matching specific components or patterns in emoji strings.

Join the Conversation:

๐ŸŽ‰ Now that you have discovered why Swift treats emoji characters with ZWJ strangely, it's time to join the conversation! Share your thoughts, experiences, and any other workarounds you have found in the comments section below. Let's help each other make the most out of Swift and emoji characters!

Remember to spread the word by sharing this blog post with your fellow Swift enthusiasts and developers. Together, we can conquer the emoji encoding mysteries! ๐Ÿ’ช

Stay tuned for more exciting and informative blog posts on our tech blog. Happy coding! ๐Ÿ˜Šโœจ


More Stories

Cover Image for How can I echo a newline in a batch file?

How can I echo a newline in a batch file?

updated a few hours ago
batch-filenewlinewindows

๐Ÿ”ฅ ๐Ÿ’ป ๐Ÿ†’ Title: "Getting a Fresh Start: How to Echo a Newline in a Batch File" Introduction: Hey there, tech enthusiasts! Have you ever found yourself in a sticky situation with your batch file output? We've got your back! In this exciting blog post, we

Matheus Mello
Matheus Mello
Cover Image for How do I run Redis on Windows?

How do I run Redis on Windows?

updated a few hours ago
rediswindows

# Running Redis on Windows: Easy Solutions for Redis Enthusiasts! ๐Ÿš€ Redis is a powerful and popular in-memory data structure store that offers blazing-fast performance and versatility. However, if you're a Windows user, you might have stumbled upon the c

Matheus Mello
Matheus Mello
Cover Image for Best way to strip punctuation from a string

Best way to strip punctuation from a string

updated a few hours ago
punctuationpythonstring

# The Art of Stripping Punctuation: Simplifying Your Strings ๐Ÿ’ฅโœ‚๏ธ Are you tired of dealing with pesky punctuation marks that cause chaos in your strings? Have no fear, for we have a solution that will strip those buggers away and leave your texts clean an

Matheus Mello
Matheus Mello
Cover Image for Purge or recreate a Ruby on Rails database

Purge or recreate a Ruby on Rails database

updated a few hours ago
rakeruby-on-railsruby-on-rails-3

# Purge or Recreate a Ruby on Rails Database: A Simple Guide ๐Ÿš€ So, you have a Ruby on Rails database that's full of data, and you're now considering deleting everything and starting from scratch. Should you purge the database or recreate it? ๐Ÿค” Well, my

Matheus Mello
Matheus Mello