UTF-8 all the way through
I’m setting up a new server and want to support UTF-8 fully in my web application. I have tried this in the past on existing servers and always seem to end up having to fall back to ISO-8859-1.
Where exactly do I need to set the encoding/charsets? I’m aware that I need to configure Apache, MySQL, and PHP to do this — is there some standard checklist I can follow, or perhaps troubleshoot where the mismatches occur?
This is for a new Linux server, running MySQL 5, PHP, 5 and Apache 2.
Unicode support in PHP is still a huge mess. While it’s capable of converting an ISO8859 string (which it uses internally) to utf8, it lacks the capability to work with unicode strings natively, which means all the string processing functions will mangle and corrupt your strings. So you have to either use a separate library for proper utf8 support, or rewrite all the string handling functions yourself.
The easy part is just specifying the charset in HTTP headers and in the database and such, but none of that matters if your PHP code doesn’t output valid UTF8. That’s the hard part, and PHP gives you virtually no help there. (I think PHP6 is supposed to fix the worst of this, but that’s still a while away)
If you want MySQL server to decide character set, and not PHP as a client (old behaviour; preferred, in my opinion), try adding skip-character-set-client-handshake
to your my.cnf
, under [mysqld]
, and restart mysql
.
This may cause troubles in case you’re using anything other than UTF8.
The top answer is excellent. Here is what I had to on a regular debian/php/mysql setup:
// storage // debian. apparently already utf-8 // retrieval // the mysql database was stored in utf-8, // but apparently php was requesting iso. this worked: // ***notice "utf8", without dash, this is a mysql encoding*** mysql_set_charset('utf8'); // delivery // php.ini did not have a default charset, // (it was commented out, shared host) and // no http encoding was specified in the apache headers. // this made apache send out a utf-8 header // (and perhaps made php actually send out utf-8) // ***notice "utf-8", with dash, this is a php encoding*** ini_set('default_charset','utf-8'); // submission // this worked in all major browsers once apache // was sending out the utf-8 header. i didnt add // the accept-charset attribute. // processing // changed a few commands in php, like substr, // to mb_substr
that was all !
if you want a mysql solution, I had similar issues with 2 of my projects, after a server migration. After searching and trying a lot of solutions i came across with this one /nothing before this one worked):
mysqli_set_charset($con,"utf8");
After adding this line to my config file everything works fine!
I found this solution https://www.w3schools.com/PHP/func_mysqli_set_charset.asp when i was looking to solve a insert from html query
good luck!
Just a note:
You are facing the problem of your non-latin characters is showing as ?????????
, you asked a question, and it got closed with a reference to this canonical question, you tried everything and no matter what you do you still get ??????????
from MySQL
.
That is mostly because you are testing on your old data which has been inserted to the database using the wrong charset and got converted and stored to actually the question mark characters ?
. Which means you lost your original text forever and no matter what you try you will get ???????
.
re applying what you have learned from the answers of this question on a fresh data could solve your problem.