S-Lang Statistics Module Reference: Using the Module

2. Using the Module

To use the module in a S-lang script, it must first be loaded into the interpreter. The standard way to do this is to load it using the require function, e.g.,


   require ("stats");

To load it into a specific namespace, e.g., ``S'', use


   require ("stats", "S");

Most of the stats module's functions provide a brief usage message when called without arguments, e.g.,


   slsh> chisqr_test;
   Usage: p=chisqr_test(X,Y,...,Z [,&T])

More detailed help is available using the help function:


   slsh> help chisqr_test
   chisqr_test
   
    SYNOPSIS
     Apply the Chi-square test to a two or more datasets
   
    USAGE
     prob = chisqr_test (X_1, X_2, ..., X_N [,&t])
   
    DESCRIPTION
    This function applies the Chi-square test to the N datasets
       .
       .

To illustrate the use of the module, consider the task of comparing gaussian-distributed random numbers to a uniform distribution of numbers. In the following, the ran_gaussian function from the GNU Scientific Library module will be used to generate the gaussian distributed random numbers.

First, start by loading the stats and gslrand modules into slsh:


  slsh> require ("gslrand");
  slsh> require ("stats");

Now generate 10 random numbers with a variance of 1.0 using the ran_gaussian and assign the resulting array to the variable g:


   slsh> g = ran_gaussian (1.0, 10);

Similarly, assign u to a uniformly distributed range of 10 numbers from -3 to 3:


   slsh> u = [-3:3:#10];

These two datasets may be compared using the stats module's two-sample non-parametric tests. First the Kolmogorov-Smirnov test may be applied using ks_test2:


   slsh> ks_test2 (g,u);
   0.78693

This shows a p-value of about 0.79, which indicates that there is no significant difference between these distributions. Similary, the Kuiper and Mann-Whitney-Wilcoxon tests yield p-values of 0.46, and 0.97, respectively:


   slsh> mw_test (g,u);
   0.970512
   slsh> kuiper_test2 (g,u);
   0.462481

Instead of 10 points per dataset, perform the tests using 100 points:


   slsh> g = ran_gaussian (1.0, 100);
   slsh> u = [-3:3:#100];
   slsh> ks_test2 (g,u);
   0.00613403
   slsh> mw_test (g,u);
   0.741508
   slsh> kuiper_test2 (g,u);
   1.38757e-06

As this example shows, both the Kolmogorov-Smirnov and Kuiper tests found significant differences between the data sets, whereas the Mann-Whitney-Wilcoxon test failed to find a significant difference. The fact that the Mann-Whitney-Wilcoxon test failed to find a difference is that the test assumes that the underlying distributions have the same shape but may differ in location. Clearly the distributions represented by g and u violate this assumption.

Next Previous Contents