推导题
- 试给出图1中所有关于Z={e}与变量a条件独立的变量。
编程题
- 有一个美国医生使用Bayes网络诊断胸部疾病,其掌握的数据信息如图2所示,其中包括:
- 有50%的病人吸烟(smoking),1%患有肺结核(Tuberculosis),5.5% 得了肺癌(Lung Cancer),45% 患有不同程度支气管炎(Bronchitis)。
- 来诊病人有1%的概率去过亚洲。
- 各种行为之间的概率关系下:
要求:参照图2,使用Python语言和Pgmpy包编程,建立贝叶斯网络(BN)模型。若一个病人A去过亚洲,并且吸烟,推理其X光(XRay)结果异常的概率及其患感冒(Dyspnea)的概率。
图2 用于胸部诊断得Bayes网络
参考程序:
from pgmpy.models import BayesianModel
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
# 需要安装pgmpys
# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianNetwork([('VA-Visit To Asia', 'T-Tuberculousis'), ('S-Smoking', 'L-Lung Cancer'), ('S-Smoking', 'B-Bronchitis'),('T-Tuberculousis', 'TC- Tuberculousis or Cancer'), ('L-Lung Cancer', 'TC- Tuberculousis or Cancer'),('TC- Tuberculousis or Cancer','X-XRay'),('TC- Tuberculousis or Cancer','D-Dyspnea'),('B-Bronchitis','D-Dyspnea')])
# Defining individual CPDs.
cpd_v = TabularCPD(variable='VA-Visit To Asia', variable_card=2, values=[[0.01], [0.99]])
cpd_s = TabularCPD(variable='S-Smoking', variable_card=2, values=[[0.5], [0.5]])
cpd_t = TabularCPD(variable='T-Tuberculousis', variable_card=2,
values=[[0.0104, 0.001],
[0.9896,0.999]],
evidence=['VA-Visit To Asia'],
evidence_card=[2])
cpd_l = TabularCPD(variable='L-Lung Cancer', variable_card=2,
values=[[0.055, 0.02],
[0.945, 0.98]],
evidence=['S-Smoking'],
evidence_card=[2])
cpd_b = TabularCPD(variable='B-Bronchitis', variable_card=2,
values=[[0.45, 0.30],
[0.55, 0.70]],
evidence=['S-Smoking'],
evidence_card=[2])
cpd_tc = TabularCPD(variable='TC- Tuberculousis or Cancer', variable_card=2,
values=[[0.0648,0.05,0.06, 0.02],
[0.9352, 0.95,0.94,0.98]],
evidence=['T-Tuberculousis','L-Lung Cancer'],
evidence_card=[2,2])
cpd_x = TabularCPD(variable='X-XRay', variable_card=2,
values=[[0.45, 0.60],
[0.55, 0.40]],
evidence=['TC- Tuberculousis or Cancer'],
evidence_card=[2])
cpd_d = TabularCPD(variable='D-Dyspnea', variable_card=2,
values=[[0.436,0.30,0.80, 0.65],
[0.564, 0.70,0.20,0.35]],
evidence=['TC- Tuberculousis or Cancer','B-Bronchitis'],
evidence_card=[2,2])
# Associating the CPDs with the network
model.add_cpds(cpd_v, cpd_s, cpd_t, cpd_l, cpd_b, cpd_tc,cpd_x,cpd_d)
# check_model checks for the network structure and CPDs and verifies that the CPDs are correctly
# defined and sum to 1.
print(model.check_model())
model.get_cpds()
print(cpd_d)
#获得T-Tuberculousis的基数
print(model.get_cardinality('T-Tuberculousis'))
#获取整个贝叶斯网络的局部依赖:
print(model.local_independencies(['VA-Visit To Asia', 'T-Tuberculousis', 'S-Smoking', 'L-Lung Cancer', 'B-Bronchitis','TC- Tuberculousis or Cancer','X-XRay']))
#推理后验概率
from pgmpy.inference import VariableElimination
infer = VariableElimination(model)
posterior_z = infer.query(['X-XRay'], evidence={'VA-Visit To Asia': 1, 'S-Smoking': 1}) #根据跟定条件推理门后有车得概率
print(posterior_z)
posterior_z = infer.query(['D-Dyspnea'], evidence={'VA-Visit To Asia': 1, 'S-Smoking': 1}) #根据跟定条件推理门后有车得概率
print(posterior_z)
程序输出: