Category: unicode

Python replace a string from Dictionary/Hash mapping

In a situation where we have a dictionary mapping and want to use the mapping to replace a particular string we can use the below code to do the same.

Have a look at below code:

import re

my_dict = {
    "\u0c1C" : "ja",
    "\u0c15" : "ka"
}

string = "కరజ"


for unicode, roman in my_dict.items():
    string = string.replace(unicode, roman)

print(string)

The above code will replace the string from the dictionary. Note that the dictionary has unicode code points that are mapped to specific values. This code will be useful for natural language processing when we deal with unicode range points.

Saving unicode or utf8 data using PHP-MYSQL

Saving data in MYSQL is almost common in every website. When it comes to unicode date there is a bit of overhead that needs to be taken care of. I am listing those settings step by step.

1. Set table’s collation to “utf8_general_ci”

 ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

2. Set the column’s collation to “utf8_general_ci”

 ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;

3. In PHP use the below code while the data is being inserted into the table.

 mysqli_query($conn,"SET names 'utf8'");