Unless it's UTF-16. Or UTF-32. Or UTF-7. And so on. Even disregarding that, your assumption is awful shaky - the file can easily have comments, a <title> tag, and whatever else before that.
Also, using a module to decode entities is ridiculous overkill for something that can be done in two regexps.
$text=~s/&#([0-9]+);/chr($1)/ge;
$text=~s/&#x([0-9a-fA-F]+);/chr(hex($1))/ge;
Or just one regexp, if you want to be tricky:
$text=~s/&#(([0-9]+)|x([0-9a-fA-F]+));/chr($2 or hex($3))/ge;