header image

Home arrow Nonparametric arrow Nonparametric Tests arrow Chi-Square (2 x 2)
Chi-Square (2 x 2) Print
Article Index
Chi-Square (2 x 2)
Example

Chi-Square 2 x 2 Contingency Table

The proportions of two alternatively nominal scaled variables are represented in a contingency table. The Chi-Square test examines whether there is an interrelation between the two variables or not.

Requirements:
Every observation must be assignable unambiguously to exactly one cell.
The expected frequency should not fall below 7 for each cell. If there are expected frequencies below 7 the Fisher’s exact test should be used instead.

Hypothesis:
H0: The two variables are independent
H1: The two variables are dependent

For the assumption that the two variables are independent (H0) the following holds for the probabilities of the four cells:

 

The expected frequencies under H0 can be obtained by multiplication of the cell probability by the total number of observations.

Probability Table:

 

Variable B

 

Category 1

Category 2

Variable A

Category 1

p(A1B1)

p(A1B2)

p(A1)

Category 2

p(A2B1

p(A2B2)

p(A2)

 

p(B1)

p(B2)



 Frequency Table:

 

Variable B

 

Category 1

Category 2

Variable A

Category 1

fo(A1B1)
fe(A1B1)

fo(A1B2)
fe(A1B2)

fo(A1)

Category 2

fo(A2B1)
fe(A2B1)

fo(A2B2)
fe(A2B2)

fo(A2)

 

fo(B1)

fo(B2)

ftotal


For each cell the expected frequency is estimated in the following way:

 

whereas fe means expected frequency, fo means observed frequency.

The following term is a measure for the deviation between the observed and expected frequencies, and it is approximately Chi-Square distributed:

 

Chi-Square is the sum over all cells of the squared cells’ residual (fo - fe ) divided by the cells’ expected frequency fe.

Degrees of freedom are determined as follows:

  • If , , and are known
    df = number of cells - 1 = 3
  • If , , and are estimated from the sample
    df = (number of rows - 1)*(number of columns - 1) = 1

Continuity Correction after Yates:

The continuity correction after Yates considers the fact that frequency and Chi-Square values are different, the first being discrete the second being continuous.

The Chi-Square value is corrected the following way:

 

 

Interpretation of a significant result:

The interrelation between the two variables is expressed by the deviance of the cells observed percentages  from the row or column percentages  and .

The standardized residual is another measure for the strength of the deviance of the cells observed frequency from its expected frequency. For each cell it is calculated as follows:

 

For sufficient big sample sizes the standardized residual is comparable to a z-value. As a rule of thumb a standardized residual of –2 or less indicates that the cells observed frequency is significantly lower than its expected frequency and a standard residual of +2 or more indicates that the cells observed frequency is significantly higher than its expected frequency.






Login
Username

Password

Remember me
Password Reminder
No account yet? Create one

Graph Export fixed

You may now export graphs again up to a size limit of 1 megapixels.


Thanks for supporting my work


Advertisement