Friday, September 14, 2018

Text originating from Microsoft software may contain smart quotes or other characters that are not escaped using standard php functions.

Question/Problem: 
 
Text originating from Microsoft software may contain smart quotes or other characters that are not escaped using standard php functions.

Answer/Solution:
 
Microsoft has introduced several characters in their charset extension that you may want to convert to their ASCII equivalents. Here is sample code on how to do it for both Windows-1252 and UTF-8 encoded strings:

// First, replace UTF-8 characters.
$text = str_replace(
 array("\xe2\x80\x98", "\xe2\x80\x99", "\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x93", "\xe2\x80\x94", "\xe2\x80\xa6"),
 array("'", "'", '"', '"', '-', '--', '...'),
 $text);
// Next, replace their Windows-1252 equivalents.
 $text = str_replace(
 array(chr(145), chr(146), chr(147), chr(148), chr(150), chr(151), chr(133)),
 array("'", "'", '"', '"', '-', '--', '...'),
 $text);
 
If you want to preserve the Microsoft special characters, you can also try to replace them with their escaped equivalents. For example the left double curly quote (") is “
 
Here are a few sites that have additional code, which may be useful:
 
http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql
 
http://www.joelonsoftware.com/articles/Unicode.html
 
http://shiflett.org/blog/2005/oct/convert-smart-quotes-with-php

No comments:

Post a Comment