
前言
在一年前关注了Rasa NLU,参考开源自然语言理解框架Rasa NLU,然后在中间持续关注的过程中发现已经有人基于Spacy训练中文语言模型,于是就顺藤摸瓜,找到了Rasa NLU Chi,以及博客文档 用Rasa_NLU构建自己的中文NLU系统,于是尝试在Windows上搭建中文NLU服务,以下是详细步骤以及中间遇到问题的解决办法。
环境
操作系统:windows10 wsl版本: Ubuntu 18.04 LTS python版本:3.6.7
安装
1.安装wsl
进入到Microsoft Store 搜索wsl 安装Ubuntu 18.04 LTS
2.更改源
https://www.linuxidc.com/Linux/2018-08/153709.htm
3.安装python和pip
Ubuntu 18.04 LTS默认是没有python的需要自己安装
python的安装是
sudo apt-get install python3
pip的安装参考 https://www.linuxidc.com/Linux/2018-05/152390.htm
4.修改pip的源为国内这里使用的是清华的源
linux下,修改 ~/.pip/pip.conf (没有就创建一个), 修改 index-url至tuna,内容如下: [global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple
5.安装rasa core,sklearn,mitieinstall rasa_core, this will install rasa nlu too, and now support chinese.
pip3 install rasa_core
install sklearn and MITIE
pip install -U scikit-learn sklearn-crfsuite
pip install git+https://github.com/mit-nlp/MITIE.git
6.clone rasa_nlu_chi
git clone https://github.com/crownpku/rasa_nlu_chi
7.下载训练好的model文件 copy到~/rasa_nlu_chi/data目录
链接:https://pan.baidu.com/s/1kNENvlHLYWZIddmtWJ7Pdg 密码:p4vx
8.训练NLU数据
python3 -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml --data data/examples/rasa/demo-rasa_zh.json --path models
会报Warning,查了一下好像并没有什么关系
/home/sh/.local/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for)
9.启动server
python3 -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml --path models
10.请求报错,查找是由于是新版的scikit-learn与twist框架冲突造成的
按照issue中的办法解决
https://github.com/crownpku/Rasa_NLU_Chi/issues/73
curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "default", "model": "model_20181212-141830"}' | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 138 0 44 100 94 22 47 0:00:02 0:00:01 0:00:01 69
{
"error": "bad value(s) in fds_to_keep"
}
11.重新训练数据,再次请求
curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "default", "model": "model_20181212-144335"}' | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 813 0 719 100 94 89 11 0:00:08 0:00:08 --:--:-- 188
{
"intent": {
"name": "medical",
"confidence": 0.500349594001549
},
"entities": [
{
"entity": "disease",
"value": "发烧",
"start": 1,
"end": 3,
"confidence": null,
"extractor": "ner_mitie"
}
],
"intent_ranking": [
{
"name": "medical",
"confidence": 0.500349594001549
},
{
"name": "restaurant_search",
"confidence": 0.1943808602354033
},
{
"name": "affirm",
"confidence": 0.1207785366984987
},
{
"name": "goodbye",
"confidence": 0.11322856287182206
},
{
"name": "greet",
"confidence": 0.07126244619272717
}
],
"text": "我发烧了该吃什么药?"
}