背景
在工程中需要把ansi编码转utf-8,引入icu4c库,默认编译出来的.dat文件有30多M,由于仅仅需在MacOS系统下要把Windows中的ansi编码转成utf-8,需要进行裁剪。
编译icu4c工程
源码下载:https://github.com/unicode-org/icu,本文基于71.1版本编译,icu分c和java两个版本,以下都是基于c版本编译。
1.在终端更改运行icu4c/source目录
cd icu4c/source
2.给对应编译脚本提供执行权限
chmod +x runConfigureICU configure install-sh
3.在source下创建编译目录并进入
mkdir buildMacOS && cd buildMacOS
4.执行编译前的配置,编译系统目标为MacOS
../runConfigureICU MacOSX
5.编译
gnumake
编译出来的icudtl.dat文件默认为33.4MB
裁剪icudtl.dat
1.在buildMac目录下创建filters.json文件,把所有的模块都移除,剩下conversion_mappings只支持ansi编码,内容如下
{
"featureFilters": {
// Based on the ICU63 version of
"brkitr_dictionaries": {
"filterType": "exclude"
},
// # List of break iterator files (brk).
"brkitr_rules": {
"filterType": "exclude"
},
// Need to explicitly add "root"
"brkitr_tree": { "filterType": "exclude" },
"conversion_mappings": {
"includelist": [
// UCM_SOURCE_CORE=...
"windows-936-2000"
]
},
"coll_tree": { "filterType": "exclude" },
"coll_ucadata": { "filterType": "exclude" },
"confusables": { "filterType": "exclude" },
"curr_tree": { "filterType": "exclude" },
"lang_tree": { "filterType": "exclude" },
"locales_tree": { "filterType": "exclude" },
"misc": { "filterType": "exclude" },
"normalization": { "filterType": "exclude" },
"rbnf_tree": { "filterType": "exclude" },
"rbnf_index": { "filterType": "exclude" },
"region_tree": { "filterType": "exclude" },
"stringprep": { "filterType": "exclude" },
"translit": { "filterType": "exclude" },
"unames": { "filterType": "exclude" },
"unit_tree": { "filterType": "exclude" },
"zone_tree": { "filterType": "exclude" }
},
"resourceFilters": [
{
"categories": [
"brkitr_tree",
"coll_tree",
"curr_tree",
"lang_tree",
"region_tree",
"unit_tree",
"zone_tree"
],
"rules": [ "-/Version" ]
}
]
}
2.安装hjson解释库,如果不想使用带注释的json格式,可以把上面的//相关行删除也行,就不需要安装hjson
pip3 install --user hjson jsonschema
3.删除icu4c/source/buildMac/data下的所有文件,其它的保留,避免其它模块重新编译,只编译data模块就好了
4.需要把filters.json文件建立一个ICU_DATA_FILTER_FILE临时环境变量
export ICU_DATA_FILTER_FILE="/Users/nickname/icu-release-71-1/icu4c/source/buildMac/filters.json"
5.重新更改编译配置
../runConfigureICU MacOSX
#最终提到输出以下信息表示filters.json文件配置成功
#Note: Applying filters from /Users/nickname/icu-release-71-1/icu4c/source/buildMac/filters.json.
6.重新编译
gnumake
7.编译成功,最终剪裁icudtl.dat文件只有133KB
查看icudtl.dat所有支持的编码方式
Available converters: 4
0 name:UTF-8 alias: 0: UTF-8 1: unicode-1-1-utf-8 2: utf8
1 name:utf-16be alias: 0: utf-16be
2 name:utf-16le alias: 0: utf-16le 1: utf-16
3 name:windows-936-2000 alias: 0: windows-936-2000 1: GBK 2: chinese 3: iso-ir-58 4: GB2312 5: GB_2312-80 6: gb_2312 7: csGB2312 8: csiso58gb231280 9: x-gbk