Skip to main content

Strings

While Move does not have a built-in type to represent strings, it does offer two standard implementations for strings:

  • The std::string module defines a String type and methods for working with UTF-8 encoded strings.
  • The std::ascii module provides an ASCII String type along with its methods.
tip

The IOTA execution environment automatically converts byte vectors into String types in transaction inputs. Therefore, in many cases, you don't need to manually construct a String in the Transaction Block.

Strings Are Bytes

Regardless of the string type you choose, it's important to understand that strings are fundamentally just bytes. The string and ascii modules provide wrappers for these bytes, offering safety checks and methods to work with strings. However, at the core, they remain vectors of bytes.

module book::custom_string {
/// Anyone can implement a custom string-like type by wrapping a vector.
public struct MyString {
bytes: vector<u8>,
}

/// Implement a `from_bytes` function to convert a vector of bytes to a string.
public fun from_bytes(bytes: vector<u8>): MyString {
MyString { bytes }
}

/// Implement a `bytes` function to convert a string to a vector of bytes.
public fun bytes(self: &MyString): &vector<u8> {
&self.bytes
}
}

Working with UTF-8 Strings

Although the standard library includes two types of strings, the string module should generally be considered the default choice. It includes native implementations of many common operations, making it more efficient than the ascii module, which is fully implemented in Move.

Definition

The String type in the std::string module is defined as follows:

    public struct String has copy, drop, store {
bytes: vector<u8>,
}

Creating a String

To create a new UTF-8 String instance, you can use the string::utf8 method.

        // the module is `std::string` and the type is `String`
use std::string::{Self, String};

// strings are normally created using the `utf8` function
// type declaration is not necessary, we put it here for clarity
let hello: String = string::utf8(b"Hello");

// The `.to_string()` alias on the `vector<u8>` is more convenient
let hello = b"Hello".to_string();

Common Operations

The UTF-8 String type provides several methods to work with strings. The most common operations include:

Additionally, for custom string operations, the as_bytes() method can be used to access the underlying byte vector.

        // append(String) adds the content to the end of the string
str.append(another);

// `sub_string(start, end)` copies a slice of the string
str.sub_string(0, 5); // "Hello"

// `length()` returns the number of bytes in the string
str.length(); // 12 (bytes)

// methods can also be chained! Get the length of a substring
str.sub_string(0, 5).length(); // 5 (bytes)

// check if the string is empty
str.is_empty(); // false

// get the underlying byte vector for custom operations
let bytes: &vector<u8> = str.as_bytes();

Safe UTF-8 Operations

The default utf8 method may abort if the bytes passed into it are not valid UTF-8. If you're unsure whether the bytes are valid, you should use the try_utf8 method instead. It returns an Option<String>, which contains no value if the bytes are not valid UTF-8, and a String otherwise.

Try

Functions that start with try_* typically return an Option with the expected result or none if the operation fails. This naming convention is borrowed from Rust.

        let hello = b"Hello".try_to_string();

assert!(hello.is_some(), 0); // abort if the value is not valid UTF-8

// this is not a valid UTF-8 string
let invalid = b"\xFF".try_to_string();

assert!(invalid.is_none(), 0); // abort if the value is valid UTF-8

UTF-8 Limitations

The string module does not provide a way to access individual characters in a string. This limitation arises because UTF-8 is a variable-length encoding, and the length of a character can range from 1 to 4 bytes. Similarly, the length() method returns the number of bytes in the string, not the number of characters.

However, methods like sub_string and insert ensure character boundaries are respected and will abort if the index is in the middle of a character.

Question 1/4

What are the two standard implementations for strings in Move?