Saturday 31 August 2013

Two-factor (two-way) ANOVA

SciStatCalc Version 1.3 is capable of implementing two-factor ANOVA.

Consider the example reproduced from
http://www-stat.stanford.edu/~susan/courses/s141/exanova.pdf

Example Suppose you want to determine whether the brand of laundry detergent used and the temperature affects the amount of dirt removed from your laundry. To this end, you buy two different brand of detergent (“ Super” and “Best”) and choose three different temperature levels (“cold”, “warm”, and “hot”). Then you divide your laundry randomly into 6 × r piles of equal size and assign each r piles into the combination of (“Super” and “Best”) and (”cold”,”warm”, and “hot”). In this example, we are interested in testing Null Hypotheses
H0D : The amount of dirt removed does not depend on the type of detergent H0T : The amount of dirt removed does not depend on the temperature
One says the experiment has two factors (Factor Detergent, Factor Temperature) at a = 2(Super and Best) and b = 3(cold,warm and hot) levels. Thus there are ab = 3 × 2 = 6 different combinations of detergent and temperature. With each combination you wash r = 4 loads. r is called the number of replicates. This sums up to n = abr = 24 loads in total. The amounts Yijk of dirt removed when washing sub pile k (k = 1, 2, 3, 4) with detergent i (i = 1, 2) at temperature j (j = 1, 2, 3) are recorded in Table 1. 

 Table 1: 
                        Cold               Warm            Hot

Super            4,5,6,5             7,9,8,12          10,12,11,9

Best              6,6,4,4           13,15,12,12       12,13,10,13

The above information can be stored in a CSV file, and this file can be sent as an email attachment. The contents of the file are as below:-

# 3 2 4
4,7,10
5,9,12
6,8,11
5,12,9
6,13,12
6,15,13
4,12,10
4,12,13

Note that the first comment line has three numbers - the first number is the number of first factor  (temperature level) categories (Cold,Warm,Hot), which is the number of dataset columns in table 1, while the second factor (brand of detergent) categories is the number of rows (Super, Best). The number of rows with data in the CSV file is (number of second factor categories * number of samples per dataset). In the example above, the number of samples per dataset is known as the number of replicates.

A video of carrying out the two-factor ANOVA using SciStatCalc is accessible at link http://www.youtube.com/watch?v=Tji0ZTTyTsQ.

Friday 30 August 2013

Multiple datasets test for SciStatCalc 1.3

The following statistical tests require more than 2 datasets, which need to be entered in the left box:-
  1. Bartlett's test
  2. Kruskal-Wallis test
  3. One-way ANOVA test
  4. Two-factor ANOVA test
The method of entering data is by separating each dataset by a semi-colon - each dataset is made up of comma separated numbers.

Consider the one-way ANOVA example in http://cba.ualr.edu/smartstat/topics/anova/example.pdf, where the following results are shown:-

Suppose the National Transportation Safety Board (NTSB) wants to examine the safety of compact cars, midsize cars, and full-size cars. It collects a sample of three for each of the treatments (cars types). Using the hypothetical data provided below, test whether the mean pressure applied to the driver’s head during a crash test is equal for each types of car. 

Table ANOVA.1
Compact cars  Midsize cars Full-size cars

643                  469              484 
655                  427              456 
702                  525              402

The above data should be entered in the One-way ANOVA test panel as shown:-


one-way ANOVA

Pressing the calculate button results in a full breakdown of the results for the ANOVA test that populates the right (textview) box as well as post-hoc analysis should the results be significant, and is best illustrated by the video on youtube accessible at:-

The post-hoc analysis is based on the Tukey's HSD test, and is restricted to the scenario where all the populations have the same number of samples.

The above datasets can be input as a CSV file attachment - the file should have contents as below:-

# 3
643,469,484 
655,427,456 
702,525,402

Note that the first line is a comment line, starting with the character # and followed by number 3, which is the number of groups/datasets/populations. Also, each column of the CSV file corresponds to a particular group/dataset - this restricts all the groups/datasets to have the same number of samples, for file import mode.

SciStatCalc version 1.3 released

SciStatCalc v 1.3 now available on the iTunes App Store. The new version includes the following updates:-

Mean and variance estimates for each of the 19 probability distributions added to the blue CDF/inverse CDF panel in landscape mode.

Eleven extra statistical tests/capabilities added to the pink panel - all accessible with a left/right swipe, making the switch between tests so much the easier!  Note: the terms datasets and populations are used interchangeably below
  • Mann-Whitney U-test : tests if two datasets/populations are the same - can be regarded as the      non-parametric analogue to the Unpaired Student's t-test.
  • Wilcoxon Signed Rank test: tests if two datasets/populations have the same rank - can be regarded as the non-parametric analogue to the Paired Student's t-test
  • Linear Regression: Calculates values of $a$ and $b$ for model $y[n]=ax[n] + b[n]$ - the entries for $x[n]$ and $y[n]$ need to populate the left and right boxes of the test panel respectively. Also calculates various other parameters, including the coefficient of determination.
  • Spearman's Rank Correlation test: A non-parametric test to test if two populations have a monotonic relationship
  • Pearson Correlation test: Parametric test for linearity relationship between two populations 
  • Shapiro-Wilk test: Calculates the W statistic to test for Gaussianity of two datasets - the higher the value of this statistic, the more confidence we have that the data comes from a Gaussian distribution
  • Bartlett's test: tests for equality of variance amongst (more than two) datasets - multiple populations can be entered in the left hand text box as semi-colon separated datasets - each dataset consists of comma separated numbers  
  • Kruskall-Wallis test: the non-paramteric Analysis of Variance (ANOVA) test
  • One-Way ANOVA test: tests equality of means for three or more populations - assumes population variances are the same - a full breakdown of the results is displayed in the right text box. Also, post-hoc analysis using Tukey's Honestly Significant Difference (HSD) test is carried out, should the results be significant.
  • Two-factor ANOVA test: tests the effect of two independent factors on multiple datasets and their interaction - data needs to be entered as bar (|) separated groups of semi-colon separated datasets, each dataset comprising comma separated numbers, in the left text box. More on this in an upcoming posting..
  • Single dataset analysis: Simply enter comma separated numbers in the left text box - results such as number of samples, minimum,maximum, mean, variance, standard deviation, kurtosis, median and mode (for integer values) are displayed in the right test box. 

Logistic Regression Calculator and ROC Curve Plotter

This blog post implements a Logistic Regression calculator for a binary output. Consider a binary outcome response variable \(Y\...