Updated the default character set on mysql server to utf8mb4 💩

I had to update the default character set  to utf8mb4 on my mysql server.  I was getting quite a few submissions on my tomcat hosted systems with emojies like the 💩 symbol being inserted.  These always failed with a database error.  Reading up about this it appears the utf8 support of mysql was for up to a 3 byte utf character.  Full support requires 4 bytes, and this was a late addition to mysql.

I dutifully converted the database, and tables to utf8mb4, running into issues with the referential integrity I use.  Database, tables and columns all converted, set the default client and mysql to utf8mb4 and problem solved.  I could now add the 💩 into the application.

Along came the boss, who now complained about all the capital As, with a hat on top, in this the blasdale.com blog.  Arrgh.  Yes in the blasdale database there were some latin1 tables from a really old install of wordpress.  So a conversion of this database and tables to utf8mb4 but still the capital A with a hat appeared.  More googling, yes I had UTF data stored into a latin1 column, so had to run some sql to convert the data. 

update wp_posts SET post_content=convert(cast(convert(post_content using  latin1) as binary) using utf8mb4);

In total I ran the following SQL statements on the server:

ALTER DATABASE blasdale_blog CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE wp_blc_filters CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_blc_links CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_blc_synch CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_commentmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE wp_comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_email_list CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_email_list_config CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_email_list_future CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_hl_twitter_replies CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_hl_twitter_tweets CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_hl_twitter_users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_links CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_ngg_album CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE wp_ngg_gallery CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_ngg_pictures CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE wp_options CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_postmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE wp_subscribe2 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_term_relationships CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_term_taxonomy CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_termmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_terms CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_usermeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfBadLeechers CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfBlockedIPLog CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfBlocks CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfBlocksAdv CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfConfig CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfCrawlers CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfFileMods CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfHits CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfHoover CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfIssues CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfKnownFileList CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfLeechers CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfLockedOut CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfLocs CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfLogins CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfNet404s CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfNotifications CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfReverseCache CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfSNIPCache CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfScanners CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfStatus CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfThrottleLog CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
ALTER TABLE wp_wfVulnScanners CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

SELECT column_name,character_set_name FROM information_schema.`COLUMNS` 
WHERE table_schema = "blasdale_blog"
 AND table_name = "wp_hl_twitter_tweets";
 
create table wp_posts_bkp LIKE wp_posts; 
insert wp_posts_bkp select * from wp_posts;
 
update wp_posts SET post_content=convert(cast(convert(post_content using latin1) as binary) using utf8mb4);
update wp_posts SET post_title=convert(cast(convert(post_title using latin1) as binary) using utf8mb4);
update wp_posts SET post_content_filtered=convert(cast(convert(post_content_filtered using latin1) as binary) using utf8mb4);

I also updated the WordPress configuration to include utf8mb4 instead of utf8:

/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8mb4');

 

The problem was not caused by the initial conversion, but by the statements I had placed in the my,cnf configuration file which caused WordPress to assume all connections are utf8mb4.

[mysqld]
local-infile=0
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci


[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

I think it is all working now, tomcat is happy and so is my WordPress install.   I do though have many tables in other databases which are only utf8.  I expect they will function unless someone tries to insert the 💩 into a page or post.  I expect I will get around to converting those databases,  should be straightforward…..

 


Leave a Reply

Your email address will not be published. Required fields are marked *