Matlab 验证 部分样本空间 可以预测 其总体样本的性质
由于我最近一篇paper中遇到一个问题:部分样本空间 是否 可以预测 其总体样本的性质?
于是我用matlab写了个小程序来验证它。
主验证程序的总体思想是:
1 随机产生大量的服从指数分布的随机数;
2 随机挑选出部分点的值,分别计算它们的理论平均值与真实平均值;
3 得到实验结果,画图,比较分析。
计算大量数据的平均值时,可以用期望等于其样本空间上对(1-CDF)的积分计算,shown as follows:

首先是验证程序:
clear;% --- Control messageCNT_Selected_Pnts = 20;% ------------------- Not manually controlled.meanVal = 100;CNT_numbers = 10000;% 1 Create serval rdm numbers following a certain model.rdmNumbers = exprnd(meanVal, CNT_numbers, 1);% 2 Calculate the CDF of the unmbersmin_x = min(rdmNumbers)max_x = max(rdmNumbers)[x,cdf_x] = funcCDF_rdm(CNT_Selected_Pnts, min_x, max_x, rdmNumbers);plot(x, cdf_x, 'ob')%plot(cdfNs)% -Calculate the avgVal_anaJF_Func = 1 - cdf_x;AvgVal_ana = trapz(x, JF_Func)% -Calculate the avgVal_trueAvgVal_true_lib = mean(x)
其中用到一个辅助函数 -- FuncCDF_rdm(...),其作用是随机挑选出若干个Points,并计算出在这些点处的 CDF 值,最后返回。
% ----% para@1: CNT_pnts, the number of points to denote the CDF;% para@2: Range_low, the lower bound of variable;% para@3: Range_up, the upper bound of variable;% para@4: arr_Vals, array of the values to be dealed with.% ----% Return@ x : the array of selected Pnts-value.% Return@ CDF_Vals : the array of selected Pnts' CDF_values.% ----function [x, CDF_Vals] = funcCDF_rdm(CNT_pnts, Range_low, Range_up, arr_Vals)% -- 1 Prepare the original datadata = sort( arr_Vals' ); % T', turn horizon arrays of T.sumPnts = length(data);Counter_every_pnt = zeros(1,CNT_pnts);% -- 2 Count appearance times of every point % -- 2.1_Select the random points from the original data.idxs_rdm = randperm(sumPnts);idx_selected = idxs_rdm(1:CNT_pnts); % !! Take the former CNT_pnts random numbers.idx_selected = sort( idx_selected ); % !! Must be sorted.Selected_data = data(idx_selected);% -- 2.2_Calculate the counter between every two selected adjacent points.for i = 1:1:sumPntsfor selected_id = 1:1:CNT_pntsCurrent_selected_pntVal = Selected_data(selected_id);if ( data(1,i) <= (Range_low + Current_selected_pntVal) )Counter_every_pnt(1,selected_id) = Counter_every_pnt(1,selected_id) + 1; %--Inc 1 of the counter.endendend% -- 4 calculate the CDF_val of every point.CDF = Counter_every_pnt(1,:)./sumPnts;% -- 5 return the Ret-valuesx = Selected_data;CDF_Vals = CDF';
然后是实验结果:
1. CNT_Selected_Pnts = 10 时
min_x = 0.0208
max_x = 810.8252
AvgVal_ana = 86.5878
AvgVal_true_lib = 85.4083

2. CNT_Selected_Pnts = 20 时
AvgVal_ana = 94.2869
AvgVal_true_lib = 97.8661

3. CNT_Selected_Pnts = 200 时
AvgVal_ana = 98.9270
AvgVal_true_lib = 102.1692

4. CNT_Selected_Pnts = 1000 时
AvgVal_ana = 99.0635
AvgVal_true_lib = 100.2150

结论:
1 当所选取的节点的数目越多时,理论值与实验值就越接近真实的平均值。
2 用部分样本空间可以预测出服从指数分布的大量数的总体样本空间的平均值。