MP3::Info and Unicode

| | Comments (0)
So Che_Fox wants MP3::Info to handle Unicode strings. Well, he and others had recently helped me fix some problems with MP3::Info on ID3v2 tags and encoding bytes, so sure, let's look at it.

We figured we could just identify which strings are UTF-16 (the default for ID3v2; UTF-8 is not even supported until ID3v2.4.0, which most software doesn't even support yet) and convert them to UTF-8.

if ($uniconvert && ($encoding eq "\001" || $encoding eq "\002")) {  # UTF-16, UTF-16BE
    my $u = Unicode::String::utf16($data);
    $data = $u->utf8;
}

That worked fine, until we relalized that Unicode::String was leaving in the byte-order mark (BOM) and we don't want that. So we strip it out after the fact:

    $data =~ s/^\xEF\xBB\xBF//;    # strip BOM

Hopefully, that's the right thing. And it seems to work.

But then we realize that some tags might be Latin-1 and others might be UTF-8; so what to do? Well, we can convert everything to UTF-8, which will be fine, except that it will break things that want everything to be in Latin-1.

Bah.

I think we're going to make a switch of some kind to tell MP3::Info to convert everything to UTF-8. Bah, again, I say! use.perl.org

Leave a comment

<pudge/*> (pronounced "PudgeGlob") is thousands of posts over many years by Pudge.

"It is the common fate of the indolent to see their rights become a prey to the active. The condition upon which God hath given liberty to man is eternal vigilance; which condition if he break, servitude is at once the consequence of his crime and the punishment of his guilt."

About this Entry

This page contains a single entry by pudge published on February 15, 2002 8:58 AM.

MacPerl, 5.6.2, and Thomas Wegner was the previous entry in this site.

Journal / Comment Stats is the next entry in this site.

Find recent content on the main index or look in the archives to find all content.