PUB 550 Application of the Pearson Correlation and Chi-Square Test
PUB 550 Application of the Pearson Correlation and Chi-Square Test
Determining the relationship between variables is an essential process in data analysis. In most cases, Pearson correlation is applied in the determination of the relationship between variables. In this case, the Pearson correlation was applied in the determination of the relationship that exists between age and annual income. Pearson Correlation may also be defined as the linear correlation that exists between two sets of data. In other words, it is a covariance of the two variables applied in the given research divided by the product of their standard deviations. With the knowledge of the relationship between two or more variables, one is able to predict the one of the variables using any available one. Pearson correlation can best be computed with the continuous data sets. In other words, there is a need for the two variables to be continuous. In this case, both age and annual incomes are continuous variables.
Pearson correlation is one of the best methods of measuring the relationship or association between continuous variables of interest since it is based on the methods of covariance. Pearson correlation provides the magnitude of associations or correlation and the direction of the correlation/relationship. The determination of correlation often depend on the Pearson correlation coefficient determined from the same ratio scale or interval. In the process of analysis the determination of Pearson correlation was made between age and annual income. SPSS was applied in the determination of the correlation coefficient. The following results were obtained.
In the process of analysis, the determination of Pearson correlation was made between age and annual income. SPSS was applied in the determination of the correlation coefficient. The following results were obtained. From the dataset provided, age is a continuous variable and therefore, Pearson correlation can be applied in determining the correlation co-efficient. When the correlation coefficient found is greater than 0.5, it means that there is a strong relationship between the two continuous variables, in this case age and annual income. On the other hand, when the value of Pearson correlation is less than 0.5, it means that there is no strong correlation between the two continuous variables. With the correlation of between 0.4 and 0.5, there are some relationships that can be found; however, the correlation cannot be applied in the determination of whether there is relationship between the two variables. Correlation is an inferential statistic, which is a parametric test that can also be applied to prove the hypothesis made between two variables.
The application of Pearson correlation in the analysis is appropriate because the two variables, age and gender, are continuous. In most cases, parametric tests are applied in the analysis of continuous variables. Also, there is the need to determine if the two variables are normally distributed. In this case, age and annual income have a normal distribution with the mean of 39.93 and 34766.67 respectively. From the two variables, there are no outliers and this explains why the use of correlation is most appropriate in determining the relationship between age and annual income. Also, both the variables are ordinal in measurement and the use of Pearson correlation in determining the relationship between the two is appropriate. While undertaking statistical analysis, there is always the need to identify the related pairs of variables, the variables must have all the values; the mission values should be filled to ensure accuracy in the outcomes.
Null hypothesis is always stated in a negative statement while the alternative hypothesis is stated in a positive
statement. In this case, the hypothesis aided in the determination of whether there is a relationship between age and gender. In the determination of correlation, if the coefficient value is more than 0.5, then we conclude that there is a strong correlation between age and annual income. In other words, we reject the null hypothesis and use the alternative hypothesis to make conclusion. In this case, the correlation coefficient obtained was 0.139, which is far much less than 0.5. Hence, there was no strong correlation or relationship between age and annual income. It means that age is not a determinant of an individual’s annual income. The annual income may be associated with the level of work, type of the profession and the amount of time that an individual spent working.
Click here to ORDER an A++ paper from our Verified MASTERS and DOCTORATE WRITERS: PUB 550 Application of the Pearson Correlation and Chi-Square Test
Decision rules are important when examining the outcomes of data analysis. Translating the outcomes is always one of the most important activities in reporting the significance of data analysis. When the null hypothesis is rejected at significance level (α=0.05), then there is sufficient evidence to make a conclusion that there is a significance linear relationship the variables under consideration. In this case, there is significant relationship between age and annual income. In other words, the correlation coefficient is significantly different from zero. The significant levels and the correlation coefficient are important in making the understanding of the data analysis outcomes easier. In this case, the correlation coefficient obtained was 0.139, which is far much less than 0.5; hence, there was no strong correlation or relationship between age and annual income. It means that age is not a determinant of an individual’s annual income. The annual income may be associated with the level of work, type of the profession and the amount of time that an individual spent working. Null and alternative hypothesis are well stated in this case.
Table 1: Descriptive Statistics |
|||
Mean | Std. Deviation | N | |
Age | 39.93 | 11.902 | 30 |
Annual_Income* | 34766.67 | 22875.500 | 30 |
From table 1, the mean age of the participants was 39.93 with a standard deviation of 11.902 while the mean annual income for the participants was 34766.67 with a standard deviation of 22875.500
Descriptive statistics is important when analyzing the correlation between two variables. Descriptive statistics aids in the understanding of the normality of the variables under consideration. Descriptive statistics will always show whether there are outliers that may interfere with the accuracy of the outcomes. In this case, the mean and media were included in the correlation test. The mean obtained for the two variables, age and annual income were 39.93 and 34766.67 respectively. The mean and the standard deviation show that the data sets are continuous and the correlation analysis can be applied in the determination of the relationship that exist between the two variables. In general, descriptive statistics is often applied in the determination of different attributes of the variables and the best statistical approaches that can be applied in the manipulation of data. In this case, the descriptive statistics performed showed that correlation was most appropriate in comparing the two variables, age and annual income.
Table 2: Correlations |
|||
Age | Annual_Income* | ||
Age | Pearson Correlation | 1 | .139 |
Sig. (2-tailed) | .463 | ||
N | 30 | 30 | |
Annual_Income* | Pearson Correlation | .139 | 1 |
Sig. (2-tailed) | .463 | ||
N | 30 | 30 |
Table 2 shows the correlation between age and annual income. From the result, the Pearson correlation coefficient is 0.139, which is less than 0.5. We therefore conclude that there is no strong correlation between Age and annual income. While translating the outcomes from the above table, Pearson correlation coefficient is applied and compared to the standard value which is 0.5. In a correlation test, if the correlation coefficient is less than 0.5, then we conclude that there is no relationship between the variables under consideration. If the correlation was greater than 0.5, we would conclude that there is strong relationship between age and income. The above outcome shows that age is not a determinant of the annual income. In other words, age cannot be used to determine an individual income. Instead some other factors such as levels of experience, amount of time worked, professional level, and the type of employment can be used to determine annual income of an individual.
In the process of interpreting data analysis outcomes, there was consideration of both the p-value and the correlation coefficient. These attributes are important in proving the hypothesis and comparing the correlation coefficient. The patterns in the data set presented below seems to be comparable to other datasets that are found in other websites. While using SPSS, there is the need for the analyst to consider complete variables. In other words, there is the need to ensure that there are no missing values. Before undertaking the analysis, there is the need to consider data cleaning processes to ensure that data set is complete and ready for analysis. Data cleaning processes may involve elimination of outliers, ensuring that correct entries are made under each variable, and all the variables used are complete with data. The knowledge acquired from the data analysis processes can be applied in healthcare research to ensure accurate outcomes in clinical practices.
Chi-Square Test
Chi-square test is one of the many ways of showing the relationship that exist between two major variables. The variables must always be categorical. Chi-square statistics refers to a single number that show the levels of difference that exist between the observed counts and the expected counts in case there were no relationship at all in the population under study. In the above case, chi-square test was applied in the determination of the relationship that exist between sex and the smoking status.
Table 3: Sex * Smoker Cross tabulation |
|||||
Count | |||||
Smoker | Total | ||||
No | Ye | ||||
Sex | 2 | 0 | 0 | 2 | |
Female | 0 | 9 | 6 | 15 | |
Male | 0 | 10 | 5 | 15 | |
Total | 2 | 19 | 11 | 32 |
Table 4: Chi-Square Tests |
|||
Value | df | Asymp. Sig. (2-sided) | |
Pearson Chi-Square | 32.153^{a} | 4 | .000 |
Likelihood Ratio | 15.106 | 4 | .004 |
N of Valid Cases | 32 | ||
a. 5 cells (55.6%) have expected count less than 5. The minimum expected count is .13. |
From Table 4, the significant value under Pearson Chi-square is 0.000 which is less than 0.05. We therefore conclude that there is no correlation between sex and smoking status. Anyone, both male and female can get engaged in the smoking.