我想从数组中获取最常用的单词.唯一的问题是瑞典字符(Å,Ä和Ö)只会显示为 .
$string = 'This is just a test post with the Swedish characters Å, Ä, and Ö. Also as lower cased characters: å, ä, and ö.';
echo '<pre>';
print_r(array_count_values(str_word_count($string, 1, 'àáãâçêéíîóõôúÀÁÃÂÇÊÉÍÎÓÕÔÚ')));
echo '</pre>';
该代码将输出以下内容:
Array
(
[This] => 1
[is] => 1
[just] => 1
[a] => 1
[test] => 1
[post] => 1
[with] => 1
[the] => 1
[Swedish] => 1
[characters] => 2
[�] => 1
[�] => 1
[and] => 2
[�] => 1
[Also] => 1
[as] => 1
[lower] => 1
[cased] => 1
[�] => 1
[�] => 1
[�] => 1
)
如何让它“看到”瑞典字符和其他特殊字符?
解决方法:
这是一个使用正则表达式使用Unicode标点符号来分割“单词”然后只是常规数组出现次数的解决方案.
array_count_values(preg_split('/[[:punct:]\s]+/u', $string, -1, PREG_SPLIT_NO_EMPTY));
生产:
Array
(
[This] => 1
[is] => 1
[just] => 1
[a] => 1
[test] => 1
[post] => 1
[with] => 1
[the] => 1
[Swedish] => 1
[characters] => 2
[Å] => 1
[Ä] => 1
[and] => 2
[Ö] => 1
[Also] => 1
[as] => 1
[lower] => 1
[cased] => 1
[å] => 1
[ä] => 1
[ö] => 1
)
这是在unicode控制台中测试的,如果您使用的是浏览器,则可能需要使用编码.制作一个< meta>在浏览器中标记或设置编码,或发送PHP标头.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。