std::mblen

From Cppreference

Jump to: navigation, search
Defined in header <cstdlib>

int mblen( const char* s, std::size_t n );

Determines the size, in bytes, of the multibyte character whose first byte is pointed to by s

If s is a null pointer, determines if the current locale's multibyte character encoding is state-dependent.

This function is equivalent to the call std::mbtowc((wchar_t*)0, s, n), except that conversion state of std::mbtowc is unaffected.

Contents

[edit] Parameters

s - pointer to the multibyte character
n - limit on the number of bytes in s that can be examined

[edit] Return value

If s is not a null pointer, returns the number of bytes that are contained in the multibyte character or -1 if the first bytes pointed to by s do not form a valid multibyte character or 0 if s is pointing at the null charcter '\0'.

If s is a null pointer, returns 0 if multibyte encoding is not state-dependent or a non-zero value if multibyte encoding is state-dependent.

[edit] Example

#include <clocale>
#include <string>
#include <iostream>
#include <cstdlib>
#include <stdexcept>
 
// the number of characters in a multibyte string is the sum of mblen()'s
std::size_t strlen_mb(const std::string& s)
{
    std::size_t result = 0;
    const char* ptr = &s[0];
    const char* end = ptr + s.size();
    while(ptr < end)
    {
        int next = std::mblen(ptr, end-ptr);
        if(next == -1)
            throw std::runtime_error("strlen_mb(): conversion error");
        ptr += next;
        ++result;
    }
   return result;
}
int main()
{
    // allow mblen() to work with UTF-8 multibyte encoding
    std::setlocale(LC_ALL, "en_US.utf8");
    // UTF-8 narrow multibyte encoding
    std::string str = u8"z\u00df\u6c34\U0001d10b"; // or u8"zß水𝄋"
                      // or "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9d\x84\x8b";
    std::cout << str << " is " << str.size() << " bytes, but only "
              << strlen_mb(str) << " characters\n";
}

Output:

zß水𝄋 is 10 bytes, but only 4 characters

[edit] See also

converts the next multibyte character to wide character
(function)
returns the number of bytes in the next multibyte character, given state
(function)
[virtual]
calculates the length of the externT string that would be consumed by conversion into given internT buffer
(virtual protected member function of std::codecvt<wchar_t, char, std::mbstate_t>)