MP3::Info and Unicode
We figured we could just identify which strings are UTF-16 (the default for ID3v2; UTF-8 is not even supported until ID3v2.4.0, which most software doesn't even support yet) and convert them to UTF-8.
if ($uniconvert && ($encoding eq "\001" || $encoding eq "\002")) { # UTF-16, UTF-16BE
my $u = Unicode::String::utf16($data);
$data = $u->utf8;
}
That worked fine, until we relalized that Unicode::String was leaving in the byte-order mark (BOM) and we don't want that. So we strip it out after the fact:
$data =~ s/^\xEF\xBB\xBF//; # strip BOM
Hopefully, that's the right thing. And it seems to work.
But then we realize that some tags might be Latin-1 and others might be UTF-8; so what to do? Well, we can convert everything to UTF-8, which will be fine, except that it will break things that want everything to be in Latin-1.
Bah.
I think we're going to make a switch of some kind to tell MP3::Info to convert everything to UTF-8. Bah, again, I say!
Leave a comment