knowt logo

Chapter 11 - Goodness-of-Fit and Contingency Tables

11-1 Goodness-of-Fit

  • goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution

  • Notation for testing for goodness-of-fit:

    • O: observed frequency of an outcome

    • E: expected frequency of an outcome

    • k: number of different categories or cells

    • n: total number of trials (or total of observed sample values)

    • p: probability that a sample value falls within a particular category

  • Requirements for testing for goodness-of-fit:

    • The data have been randomly selected

    • The sample data consist of frequency counts for each of the different categories

    • For each category, the expected frequency is at least 5

  • If the expected frequencies are all equal: Calculate E = n/k

  • If the expected frequencies are NOT all equal: Calculate E = np for each individual category

  • The observed frequencies are all whole numbers because they represent actual counts, but the expected frequencies need not be whole numbers

  • "If the P is low, the null must go" (If the p-value is small, reject the null hypothesis that the distribution is as claimed)

  • X^2 test statistic is a measure of the discrepancy between observed and expected frequencies

  • The theoretical distribution of sum(O-E)^2/E is a discrete distribution because the number of possible values is finite. The distribution can be approximated by a chi-square distribution, which is continuous. This approximation is generally considered acceptable, given that all expected values E >= 5

  • The number of degrees of freedom reflects the fact that we can freely assign frequencies to k-1 categories before the frequency for every category is determined

11-2 Contingency Tables

  • contingency table (or two-way frequency table) is a table consisting of frequency counts of categorical data corresponding to two different variables (one variable used to categorize rows, the second variable used to categorize columns)

  • In a test of independence, we test the null hypothesis that in a contingency table, the row and column variables are independent

  • Notation for contingency table:

    • O: observed frequency in a cell

    • E: expected frequency in a cell

    • r: number of rows in a contingency table

    • c: number of columns in a contingency table

  • Requirements for contingency table:

    • Sample data are randomly selected

    • Sample data are represented as frequency counts in a two-way table

    • For every cell in the contingency table, the expected frequency E is at least 5

  • Degrees of freedom = (r-1)(c-1)

  • Test of independence with a contingency table are always right-tailed

  • The distribution of the test statistic X^2 can be approximated by the chi-square distribution

  • E = (row total * column total) / (grand total)

  • In a chi-square test of homogeneity, samples are randomly selected from different populations and we want to determine whether those populations have the same proportions of some characteristic being considered

  • chi-square test of homogeneity is a test that different populations have the same proportion of some characteristics

  • Fisher's exact test is often used for a 2 x 2 contingency table with one or more expected frequencies that are below 5, and Fisher's exact test provides an exact p-value and does not require an approximation technique

Chapter 11 - Goodness-of-Fit and Contingency Tables

11-1 Goodness-of-Fit

  • goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution

  • Notation for testing for goodness-of-fit:

    • O: observed frequency of an outcome

    • E: expected frequency of an outcome

    • k: number of different categories or cells

    • n: total number of trials (or total of observed sample values)

    • p: probability that a sample value falls within a particular category

  • Requirements for testing for goodness-of-fit:

    • The data have been randomly selected

    • The sample data consist of frequency counts for each of the different categories

    • For each category, the expected frequency is at least 5

  • If the expected frequencies are all equal: Calculate E = n/k

  • If the expected frequencies are NOT all equal: Calculate E = np for each individual category

  • The observed frequencies are all whole numbers because they represent actual counts, but the expected frequencies need not be whole numbers

  • "If the P is low, the null must go" (If the p-value is small, reject the null hypothesis that the distribution is as claimed)

  • X^2 test statistic is a measure of the discrepancy between observed and expected frequencies

  • The theoretical distribution of sum(O-E)^2/E is a discrete distribution because the number of possible values is finite. The distribution can be approximated by a chi-square distribution, which is continuous. This approximation is generally considered acceptable, given that all expected values E >= 5

  • The number of degrees of freedom reflects the fact that we can freely assign frequencies to k-1 categories before the frequency for every category is determined

11-2 Contingency Tables

  • contingency table (or two-way frequency table) is a table consisting of frequency counts of categorical data corresponding to two different variables (one variable used to categorize rows, the second variable used to categorize columns)

  • In a test of independence, we test the null hypothesis that in a contingency table, the row and column variables are independent

  • Notation for contingency table:

    • O: observed frequency in a cell

    • E: expected frequency in a cell

    • r: number of rows in a contingency table

    • c: number of columns in a contingency table

  • Requirements for contingency table:

    • Sample data are randomly selected

    • Sample data are represented as frequency counts in a two-way table

    • For every cell in the contingency table, the expected frequency E is at least 5

  • Degrees of freedom = (r-1)(c-1)

  • Test of independence with a contingency table are always right-tailed

  • The distribution of the test statistic X^2 can be approximated by the chi-square distribution

  • E = (row total * column total) / (grand total)

  • In a chi-square test of homogeneity, samples are randomly selected from different populations and we want to determine whether those populations have the same proportions of some characteristic being considered

  • chi-square test of homogeneity is a test that different populations have the same proportion of some characteristics

  • Fisher's exact test is often used for a 2 x 2 contingency table with one or more expected frequencies that are below 5, and Fisher's exact test provides an exact p-value and does not require an approximation technique