Completely confused trying to render UTF-8/unicode strings


#1

Working in C++, Visual Studio 2015, ImGui version 1.67

I have text strings coming from a database, and these strings could contain UTF-8 (unicode) represented characters. The strings are names, people and location names.

At this point, I’m only trying to get the extended western characters for the Latin languages working. I’m loading the glyphs correctly, as ImGui text entry fields are correctly displaying UTF-8 umlauts and so on when interactively entered, but when I try to render text acquired from the database using ImGui::Text() and just having them appear inside an ImGui::ListBox() I am not seeing the correct characters.

My first bit of confusion is Omar advice on this topic to use ImTextStrToUtf8() to convert std::strings to UTF-8, yet all the UTF-8 utilities are not exposed by the header. Is one supposed to add externs to their local source files for these utilities? (https://github.com/ocornut/imgui/issues/2046)

The strings I am trying to render begin as std::string variables. I’d show the multiple versions I’ve tried, but I’m all confused now and don’t know if I need to be using the ImTextStrFromUtf8() or the ImTextStrToUtf8() version.

What I’d like to see is a simple example that demonstrates a std::string containing UTF-8 text being rendered. Preferably, the std::string itself ends up holding the converted representation. I’d like to not have to change from a std::string holding all the display strings in the application to something like ImVector member variables.

Part of my confusion is their correct rendering inside some widgets, such as their appearing correct inside ImGui::InputText() and ImGui::InputTextMultiLine(), but not when rendered by ImGui::Text() nor when inside an ImGui::ListBox(). In all these widgets, the same string bytes are passed to the widgets…

My last failing version just renders a single character. I was trying to put the converted representation back in a std::string with this:

const int stdstring_byte_count = safe_version.size() + 1;
const int utf8_char_count      = ImTextCountCharsFromUtf8(safe_version.c_str(), NULL) + 1;
std::string rendering_text;
rendering_text.resize( stdstring_byte_count * 2 ); // no idea what this should be
ImTextStrFromUtf8( (ImWchar *)rendering_text.c_str(), utf8_char_count, safe_version.c_str(), NULL, NULL);

// at some later point same render:
ImGui::Text(rendering_text.c_str());

#2

[quote=“bsenftner, post:1, topic:144”]
use ImTextStrToUtf8() to convert std::strings to UTF-8,[/quote]

This is not what this function is doing, it is converting UTF-16 (16-bit wchar) to UTF-8.
This is basically equivalent to WideCharToMultiByte() in windows.

I may be wrong but I believe this statement is incorrect - the difference doesn’t lie in the widget (all the render code is the same) but the fact that using InputText your live inputs from the keyboard are correctly encoded whereas your own data sourced from your database are not UTF-8 encoded. If you feed your database data into InputText() it’ll likely also look incorrect.

std::string are holding char and COULD already be UTF-8 but it doesn’t mandate an encoding AFAIK.
If your data is not UTF-8 (which is surprising to be honest) maybe it is using legacy local codepage encoding.

You should first PRINT OUT the hexadecimal value of every character of an example string data using one of the faulty accent to see how it is encoded! Loop on every character and use printf("%02X", the_character) and see what you get. Most likely your database doesn’t give you UTF-8 data. From an hexadecimal dump of each character we can easily tell if it looks like UTF-8 or UTF-16 or a local codepage.

You may use the Memory Editor to inspect your string data as well and understand how encoding work: