Two Statistical Methods
Apr 27th, 2009 by Scott Hebert
Contingency tables are used to analyze two variables. The tables are arranged in such a way that the frequency for one variable is represented in rows and the other variable is represented in columns. The primary purpose of contingency tables as a statistical method is to determine if the two variables are independent (Triola, 2008). Contingency tables are beneficial thanks to the relative simplicity of Chi Squared computations. Unfortunately, contingency tables can only indicate dependence between the two variables being analyzed. They offer no insight as to how the variables are related or what effect a change in one variable will have on another. That information can be gained through the use of correlation and regression.
Correlation is a statistical method that seeks to define the relationship between two dependent variables. The correlation coefficient is a number between -1 and +1 that describes the relationship between two variables. If the correlation coefficient is positive, then the two variables tend to display the same relative frequency. In other words, relatively high values for one variable accompany high values for another. A negative correlation coefficient describes a scenario where the variables are inversely related. In this case, a high value for one variable results in a low value for the other (Moles & Terry, 1997). Regression is simply the process of defining and graphing the slope of the line that includes the correlation data.
Correlation builds on contingency tables by accepting that variables are dependent and defining that dependence. For example, a contingency table might indicate that a comparison of years of driving experience to frequency of automobile accidents represents two variables that are not independent. A correlation and regression analysis can determine just how those two variables are related. Unfortunately, correlation cannot explain why those two variables are related. In the case of driver experience and traffic accidents, the correlation may find that increased driver experience results in few accidents. As the data is plotted, it might be uncovered that after a certain number of years of experience, the frequency of traffic accidents begins to increase. This indicates the correlation between experience and accidents changes at some point. It’s more likely, of course, that the increase in accidents is actually related to the age of the driver. Therefore, correlation can describe how to variables related, but cannot explain why.
References
Moles, P. & Terry, N. (1997). Correlation. The Handbook of International Financial Terms. Retrieved April 27, 2009.
Triola, M. F. (2008). Elementary statistics (10th ed.). Boston: Pearson.