( A, B )---2*30*2---( 1, 0 )( 0, 1 )
用网络分类A和B,让A是(0,0)(1,1),让B是(1,0)(1,1)。测试集均为(0,0)(0,1)(1,0)(1,1).记为网络0323.
A | 0 | 0 | B | 1 | 0 | |
1 | 1 | 1 | 1 | |||
0 | 0 | 50% | 50% | 0 | 50% | |
0 | 1 | 50% | 50% | 0 | 50% | |
1 | 0 | 50% | 50% | 100% | 50% | |
1 | 1 | 50% | 50% | 100% | 50% |
比较两列相似性, (0,0)(0,1)与A的两列相似性都是50%,50%。而与B的两列相似性为0,50%。据此判断(0,0)(0,1)与A更为接近。同样(1,0)(1,1)与A的两列相似性为50%,50%而与B的两列相似性为100%,50%,因而(1,0)(1,1)应被分为B。
实验验证是否如此,得到表格
0 | 0 | 1 | 0 | 1b | 0 | ||||
1 | 1 | 1 | 1 | k | k | ||||
0323 | 30 | 200 | |||||||
f2[0] | f2[1] | 迭代次数n | 平均准确率p-ave | 1-0 | 0-1 | δ | 耗时ms/次 | 耗时ms/199次 | 耗时 min/199 |
0.5019 | 0.4975 | 15.2563 | 0.5 | 0.60427 | 0.39573 | 0.5 | 1.678392 | 337 | 0.005617 |
0.60254 | 0.3973 | 1032.32 | 0.5 | 0.82915 | 0.17085 | 0.4 | 9.5527638 | 1901 | 0.031683 |
0.70278 | 0.29722 | 1323.87 | 0.5 | 0.5 | 0.5 | 0.3 | 11.19598 | 2230 | 0.037167 |
0.80209 | 0.19798 | 1621.6 | 0.5 | 0.5 | 0.5 | 0.2 | 13.592965 | 2713 | 0.045217 |
0.90084 | 0.09913 | 2213.42 | 0.5 | 0.5 | 0.5 | 0.1 | 21.693467 | 4319 | 0.071983 |
0.99005 | 0.00995 | 7464.56 | 0.5 | 0.59296 | 0.40704 | 0.01 | 86.155779 | 17147 | 0.285783 |
0.98396 | 0.01604 | 45216.2 | 0.5 | 0.74623 | 0.25377 | 0.001 | 523.64322 | 104206 | 1.736767 |
0.99409 | 0.00591 | 50207.6 | 0.5 | 0.74874 | 0.25126 | 9.00E-04 | 417.44221 | 83073 | 1.38455 |
0.98415 | 0.01585 | 55872.3 | 0.5 | 0.74623 | 0.25377 | 8.00E-04 | 475.22111 | 94573 | 1.576217 |
0.97923 | 0.02077 | 62169.8 | 0.5 | 0.74497 | 0.25503 | 7.00E-04 | 486.32663 | 96780 | 1.613 |
0.96929 | 0.03071 | 71936.5 | 0.5 | 0.74246 | 0.25754 | 6.00E-04 | 682.67839 | 135853 | 2.264217 |
0.96938 | 0.03062 | 84370.5 | 0.5 | 0.74246 | 0.25754 | 5.00E-04 | 672.8794 | 133903 | 2.231717 |
0.93935 | 0.06065 | 103430 | 0.5 | 0.73492 | 0.26508 | 4.00E-04 | 690.85427 | 137480 | 2.291333 |
0.89424 | 0.10576 | 135549 | 0.5 | 0.72362 | 0.27638 | 3.00E-04 | 901.55779 | 179425 | 2.990417 |
0.86418 | 0.13582 | 195461 | 0.5 | 0.71608 | 0.28392 | 2.00E-04 | 1589.7688 | 316364 | 5.272733 |
0.69343 | 0.30657 | 372835 | 0.5 | 0.67337 | 0.32663 | 1.00E-04 | 3523.1809 | 701113 | 11.68522 |
0.68841 | 0.31159 | 409775 | 0.5 | 0.67211 | 0.32789 | 9.00E-05 | 3061.809 | 609301 | 10.15502 |
0.64319 | 0.35681 | 461388 | 0.5 | 0.6608 | 0.3392 | 8.00E-05 | 3055.7437 | 608093 | 10.13488 |
0.63817 | 0.36183 | 520512 | 0.5 | 0.65955 | 0.34045 | 7.00E-05 | 3445.4623 | 685647 | 11.42745 |
0.55778 | 0.44222 | 603678 | 0.5 | 0.63945 | 0.36055 | 6.00E-05 | 3988.9799 | 793807 | 13.23012 |
0.57788 | 0.42212 | 715163 | 0.5 | 0.64447 | 0.35553 | 5.00E-05 | 4725.7588 | 940426 | 15.67377 |
0.48241 | 0.51759 | 888578 | 0.5 | 0.6206 | 0.3794 | 4.00E-05 | 6663.2915 | 1325995 | 22.09992 |
0.35679 | 0.64321 | 1161299 | 0.5 | 0.5892 | 0.4108 | 3.00E-05 | 8755.2764 | 1742300 | 29.03833 |
0.31157 | 0.68843 | 1705598 | 0.5 | 0.57789 | 0.42211 | 2.00E-05 | 11343.045 | 2257266 | 37.6211 |
0.23116 | 0.76884 | 3305165 | 0.5 | 0.55779 | 0.44221 | 1.00E-05 | 22443.578 | 4466272 | 74.43787 |
0.24624 | 0.75376 | 3633986 | 0.5 | 0.56156 | 0.43844 | 9.00E-06 | 26965.653 | 5366168 | 89.43613 |
0.22111 | 0.77889 | 4069937 | 0.5 | 0.55528 | 0.44472 | 8.00E-06 | 30685.492 | 6106414 | 101.7736 |
观察1-0位和0-1位的分类准确率
尽管并未最终达到50%,50%但这确实是收敛的方向。具体统计当收敛误差为8e-6时的分类情况
(0,0) | 199 | |||||||||||
(1,1) | 44 | 155 | 0.28387 | |||||||||
199 | ||||||||||||
A | 155 | B | A | 44 | B | |||||||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | |
2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | 0 | |
3 | 1 | 1 | 3 | 1 | 1 | 3 | 1 | 1 | 3 | 1 | 1 |
有155次01被分为A,23被分为B,44次013被分为A,2被分为B。0都被分为A,(1,1)有44次被分为A,155次被分为B,有78%的(1,1)被分为B。
统计当收敛误差为4e-5时网络的分类情况
(0,0) | 199 | |||||||||||
(1,1) | 96 | 103 | 0.93204 | |||||||||
199 | ||||||||||||
A | 103 | B | A | 96 | B | |||||||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | |
2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | 0 | |
3 | 1 | 1 | 3 | 1 | 1 | 3 | 1 | 1 | 3 | 1 | 1 |
有103次01被分为A,23被分为B,96次013被分为A,2被分为B。(1,1)几乎被对半分
再次统计当收敛误差为5e-4时的分类情况
(0,0) | 199 | |||||||||||
(1,1) | 193 | 6 | 32.1667 | |||||||||
199 | ||||||||||||
A | 6 | B | A | 193 | B | |||||||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | |
2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | 0 | |
3 | 1 | 1 | 3 | 1 | 1 | 3 | 1 | 1 | 3 | 1 | 1 |
有6次01被分为A,23被分为B,193次013被分为A,2被分为B。(1,1)几乎都被分为A
所以由分类准确率的变化可以猜测网络0323在收敛误差小于9e-4的时候认为(1,1)应该是被分为A的,但是当收敛到9e-4的时候,产生了怀疑,并最终认为(1,1)更应该被分为B。并且这种观念占据了主导,使得(1,1)被分为B的占比不断的提高。并最终使得分类准确率接近0.5,0.5。但这个过程非常缓慢,表明两种观念一直在不断的交锋,网络一直在挣扎。
现在让A和B调换顺序,观察网络2303的分类过程
1 | 0 | 0 | 0 | 1 | 0 | ||||||||||
1 | 1 | 1 | 1 | k | k | ||||||||||
2303 | 30 | 200 | |||||||||||||
f2[0] | f2[1] | 迭代次数n | 平均准确率p-ave | 1-0 | 0-1 | δ | 耗时ms/次 | 耗时ms/199次 | 耗时 min/199 | ||||||
0.501228 | 0.50048 | 14.206 | 0.5 | 0.55025 | 0.44975 | 0.5 | 1.25628 | 250 | 0.00417 | ||||||
0.397344 | 0.60233 | 1046.24 | 0.5 | 0.5 | 0.5 | 0.4 | 8.03518 | 1599 | 0.02665 | ||||||
0.297413 | 0.70296 | 1317.6 | 0.5 | 0.5 | 0.5 | 0.3 | 9.49246 | 1889 | 0.03148 | ||||||
0.198032 | 0.80197 | 1613.86 | 0.5 | 0.5 | 0.5 | 0.2 | 11.4573 | 2280 | 0.038 | ||||||
0.099153 | 0.90074 | 2210.3 | 0.5 | 0.5 | 0.5 | 0.1 | 15.7688 | 3138 | 0.0523 | ||||||
0.009951 | 0.99004 | 7440.99 | 0.5 | 0.5 | 0.5 | 0.01 | 50.8492 | 10119 | 0.16865 | ||||||
9.95E-04 | 0.999 | 45645.6 | 0.5 | 0.5 | 0.5 | 0.001 | 304.884 | 60672 | 1.0112 | ||||||
0.015944 | 0.98406 | 49974.7 | 0.5 | 0.5 | 0.5 | 9.00E-04 | 330.925 | 65854 | 1.09757 | ||||||
7.96E-04 | 0.9992 | 55332.2 | 0.5 | 0.5 | 0.5 | 8.00E-04 | 365.332 | 72701 | 1.21168 | ||||||
0.005714 | 0.99429 | 62510.5 | 0.5 | 0.5 | 0.5 | 7.00E-04 | 414.709 | 82527 | 1.37545 | ||||||
0.010635 | 0.98936 | 71881.4 | 0.5 | 0.5 | 0.5 | 6.00E-04 | 480.246 | 95569 | 1.59282 | ||||||
0.035639 | 0.96436 | 84247.8 | 0.5 | 0.5 | 0.5 | 5.00E-04 | 554.266 | 110299 | 1.83832 | ||||||
0.060651 | 0.93935 | 103318 | 0.5 | 0.5 | 0.5 | 4.00E-04 | 680.648 | 135449 | 2.25748 | ||||||
0.090697 | 0.9093 | 134168 | 0.5 | 0.5 | 0.5 | 3.00E-04 | 911.065 | 181302 | 3.0217 | ||||||
0.145869 | 0.85413 | 196150 | 0.5 | 0.5 | 0.5 | 2.00E-04 | 1288.62 | 256436 | 4.27393 | ||||||
0.321644 | 0.67836 | 372731 | 0.5 | 0.5 | 0.5 | 1.00E-04 | 2442.13 | 485984 | 8.09973 | ||||||
0.316616 | 0.68338 | 414450 | 0.5 | 0.5 | 0.5 | 9.00E-05 | 2712.61 | 539810 | 8.99683 | ||||||
0.457293 | 0.54271 | 462558 | 0.5 | 0.5 | 0.5 | 8.00E-05 | 3028.06 | 602583 | 10.0431 | ||||||
0.402024 | 0.59798 | 524995 | 0.5 | 0.5 | 0.5 | 7.00E-05 | 3529.6 | 702391 | 11.7065 | ||||||
0.497488 | 0.50251 | 604700 | 0.5 | 0.5 | 0.5 | 6.00E-05 | 4499.12 | 895326 | 14.9221 | ||||||
A | 199 | B | |||||||||||||
0 | 0 | 0 | 0 | 0 | 0 | ||||||||||
1 | 0 | 1 | 1 | 0 | 1 | ||||||||||
2 | 1 | 0 | 2 | 1 | 0 | ||||||||||
3 | 1 | 1 | 3 | 1 | 1 | ||||||||||
网络2303很快就达到了峰值性能,统计当收敛误差为6e-5的时的分类情况,199次23被分为A,01被分为B。2303仅用1046次迭代就达到了峰值,而0323用了4069937次迭代才接近达到.用了3890倍的迭代。而对于相同的收敛误差两个网络迭代次数是相同的
δ | 0.5 | 0.4 | 0.3 | 0.2 | 0.1 | 0.01 | 0.001 | 9.00E-04 | 8.00E-04 | 7.00E-04 | 6.00E-04 | 5.00E-04 | 4.00E-04 | 3.00E-04 | 2.00E-04 | 1.00E-04 | 9.00E-05 | 8.00E-05 | 7.00E-05 | 6.00E-05 | |
0323 | 迭代次数n | 15.256 | 1032.3 | 1323.9 | 1621.603 | 2213.4 | 7464.6 | 45216 | 50207.56 | 55872.31 | 62169.84 | 71936.55 | 84370.46 | 103430 | 135549 | 195460.8 | 372835.2 | 409774.9 | 461388 | 520512.5 | 603678.1 |
2303 | 迭代次数n | 14.206 | 1046.2 | 1317.6 | 1613.859 | 2210.3 | 7441 | 45646 | 49974.73 | 55332.19 | 62510.52 | 71881.37 | 84247.81 | 103317.8 | 134168.3 | 196150 | 372731.1 | 414450 | 462557.6 | 524994.6 | 604700.3 |
网络0323和网络2303的峰值分类结果是相同的,迭代次数也相同。但显然网络0323的分类过程要漫长的多,所以进样顺序对网络的收敛过程有巨大的影响,一个合适的进样顺序在保证网络性能的前提下可以极大的加快网络收敛。