Converting data to a string is common task in the context of networking or file input/output. In this episode, you learn how to convert a Data
object to a String
in Swift.
What Are Data and String?
Before we write code, I want you to understand what Data
and String
(both structs) are. Understanding how these types are designed makes it easier to follow along. A Data
object represents a collection of bytes and is is commonly used to work with binary data. A String
object represents a collection of characters, Character
objects.
Creating a Data
Object
You are interested in converting a Data
object to a String
object, but let me start by showing you how you can convert a String
object to a Data
object. Fire up Xcode and create a playground. Add an import statement for the Foundation framework.
import Foundation
Declare a string.
import Foundation
let string = "Hello from Cocoacasts!"
You can convert the String
object to a Data
object by invoking the data(using:)
method on the String
object. The data(using:)
method defines one parameter of type String.Encoding
. We pass in utf8
, but note that there are plenty more options.
import Foundation
let string = "Hello from Cocoacasts!"
let data = string.data(using: .utf8)
The data(using:)
method returns an optional Data
object. If the conversion was successful, then the Data
object contains a representation of the String
object encoded using the encoding you defined, utf8
in this example.
How to Convert a Data to a String In Swift
Let's now focus on converting the Data
object back to a String
. We create a String
object from a Data
object by invoking one of the initializers of the String
struct, init(data:encoding:)
. It is important that we use the same encoding we used to encode the string.
import Foundation
let string = "Hello from Cocoacasts!"
let data = string.data(using: .utf8)
if let data, let string = String(data: data, encoding: .utf8) {
print(string)
}
There are a few things I want to point out. We use optional binding to access the Data
object because the data(using:)
method returns an optional Data
object. We also need to safely unwrap the result of the initializer because the initializer is failable.
Why UTF-8?
You may be wondering why we opted for the UTF-8 encoding. There are several reasons for choosing UTF-8.
Backward Compatibility with ASCII: UTF-8 is backward-compatible with ASCII. In the past, ASCII was widely used to represent English characters. Any file that is valid ASCII is also a valid UTF-8 file.
Unicode Support: UTF-8 can represent any character in the Unicode standard. This is important if you consider the wide range of characters used in different languages around the globe.
Efficiency for English Text: English is still the most common language used on the web. For English text UTF-8 uses the same amount of space as ASCII, one byte per character. One byte per character is more space-efficient compared to other Unicode encodings, such as UTF-16 or UTF-32.
Self-Synchronization: UTF-8 has a self-synchronization property. Euh ... what? This simply means that if bytes are lost due to errors or corruption, it is easier to find the start of the next character compared to fixed-size encodings. Self-synchronization makes UTF-8 more resilient in environments where errors can occur.
Widespread Adoption: The widespread adoption of UTF-8 in various systems and technologies (e.g., operating systems, programming languages) has strengthened its position as the most common encoding. For example, it is the default charset for XML and HTML 5.
Variable Width Character Encoding: UTF-8 uses a variable width character encoding that can represent every character in the Unicode standard, yet is backward-compatible with ASCII. The benefit is that it can represent characters from a wide range of scripts and languages without wasting space for scripts with fewer characters.
Standardization: Many, many standards and protocols have adopted UTF-8 as the default (or recommended) encoding. As you may have guessed, that further promotes or encourages its use, making it a de facto standard in many areas of computing and technologies.
For the Curious ...
UTF-8 stands for Unicode Transformation Format - 8-bit. Unicode refers to the Unicode standard, a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
Transformation Format indicates that the encoding is a transformation format. What does that mean? It simply means that it is a way of transforming a set of Unicode code points into a series of bytes.
The 8 in UTF-8 means that it uses 8-bit blocks to represent a character. UTF-8 is a variable-width character encoding that can represent every character in the Unicode standard. It does this by using one to four 8-bit bytes to encode characters.