Hi,
Thanks for creating the package!
I'm running some tests on the AD test, and have a question on the calculation of the AD statistics.
According to the README.md or ad_test.Rd file, the AD statistics is calculated in the following way
AD = \sum_{x \in k} \left({|E(x)-F(x)| \over \sqrt{2G(x)(1-G(x))/n} }\right)^p
It seems to me that there may be two issues: 1) the formula assumes the two samples sizes are the same; and 2) the approximation of the integral is not correctly calculated.
Let the sample sizes be n1 and n2, with corresponding ecdf E and F in your notation; n=n1+n2 and G be the ecdf of the joint, when p=2,
$AD = \frac{n1\times n2}{n} \int (E(x)-F(x))^2 / (G(x)(1-G(x))) d G(x)$
see F. W. Scholz, M. A. Stephens, (1987) K-Sample Anderson-Darling Tests
Let x_i denote the data in the joint sample, then the integral should be approximated by
$\frac{1}{n} \sum_{i \in [n]} \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))}. $
Recall that there is extra $n1*n2/n$, if you make $n1=n2=n/2$,
$AD = \frac{1}{4} \sum_{i \in [n]} \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))},$
which is different from your formula (extra $n$ is multiplied there).
Plus, tried with some simple datasets, the Test Stat returned from ad_test is related to the total sample size.
Please let me know if this makes sense, or if I am wrong.
Thanks!
Hi,
Thanks for creating the package!
I'm running some tests on the AD test, and have a question on the calculation of the AD statistics.
According to the
README.mdorad_test.Rdfile, the AD statistics is calculated in the following wayIt seems to me that there may be two issues: 1) the formula assumes the two samples sizes are the same; and 2) the approximation of the integral is not correctly calculated.
Let the sample sizes be n1 and n2, with corresponding ecdf E and F in your notation; n=n1+n2 and G be the ecdf of the joint, when p=2,
$AD = \frac{n1\times n2}{n} \int (E(x)-F(x))^2 / (G(x)(1-G(x))) d G(x)$
see F. W. Scholz, M. A. Stephens, (1987) K-Sample Anderson-Darling Tests
Let x_i denote the data in the joint sample, then the integral should be approximated by
$\frac{1}{n} \sum_{i \in [n]} \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))}. $ $n1*n2/n$ , if you make $n1=n2=n/2$ ,
$AD = \frac{1}{4} \sum_{i \in [n]} \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))},$ $n$ is multiplied there).
Recall that there is extra
which is different from your formula (extra
Plus, tried with some simple datasets, the
Test Statreturned fromad_testis related to the total sample size.Please let me know if this makes sense, or if I am wrong.
Thanks!