|
Page 1 of 2 Chi-Square 2 x 2 Contingency Table The proportions of two alternatively nominal scaled variables are represented in a contingency table. The Chi-Square test examines whether there is an interrelation between the two variables or not. Requirements: Every observation must be assignable unambiguously to exactly one cell. The expected frequency should not fall below 7 for each cell. If there are expected frequencies below 7 the Fisher’s exact test should be used instead. Hypothesis: H0: The two variables are independent H1: The two variables are dependent For the assumption that the two variables are independent (H0) the following holds for the probabilities of the four cells:  The expected frequencies under H0 can be obtained by multiplication of the cell probability by the total number of observations. Probability Table: | | Variable B | | | Category 1 | Category 2 | | Variable A | Category 1 | p(A1 B1) | p(A1 B2) | p(A1) | | Category 2 | p(A2 B1 | p(A2 B2) | p(A2) | | | p(B1) | p(B2) | | Frequency Table: | | Variable B | | | Category 1 | Category 2 | | Variable A | Category 1 | fo(A1B1) fe(A1B1) | fo(A1B2) fe(A1B2) | fo(A1) | | Category 2 | fo(A2B1) fe(A2B1) | fo(A2B2) fe(A2B2) | fo(A2) | | | fo(B1) | fo(B2) | ftotal | For each cell the expected frequency is estimated in the following way: whereas fe means expected frequency, fo means observed frequency. The following term is a measure for the deviation between the observed and expected frequencies, and it is approximately Chi-Square distributed: Chi-Square is the sum over all cells of the squared cells’ residual (fo - fe ) divided by the cells’ expected frequency fe. Degrees of freedom are determined as follows: - If
, , and are known df = number of cells - 1 = 3 - If
, , and are estimated from the sample df = (number of rows - 1)*(number of columns - 1) = 1 Continuity Correction after Yates: The continuity correction after Yates considers the fact that frequency and Chi-Square values are different, the first being discrete the second being continuous. The Chi-Square value is corrected the following way: Interpretation of a significant result: The interrelation between the two variables is expressed by the deviance of the cells observed percentages from the row or column percentages and . The standardized residual is another measure for the strength of the deviance of the cells observed frequency from its expected frequency. For each cell it is calculated as follows: For sufficient big sample sizes the standardized residual is comparable to a z-value. As a rule of thumb a standardized residual of –2 or less indicates that the cells observed frequency is significantly lower than its expected frequency and a standard residual of +2 or more indicates that the cells observed frequency is significantly higher than its expected frequency.
|