MATH 225N Correlation and Regression

MATH 225N Correlation and Regression

MATH 225N Correlation and Regression

I made up some data about predicting Total Cholesterol from BMI to facilitate providing and talking about an example of simple linear regression and linear correlation and coefficient of determination to get our Week 8 graded Posting area started.

Please see the attached Excel spread sheet with the data and results from the simple linear regression procedure.

PLEASE do not attach any real world significance to the regression equation or the values for r, r2, etc.

Remember please, I MADE THE DATA UP !!  ( LOL )      😉

I used BMI as the independent variable and Total Cholesterol as the dependent variable.

PLEASE note that in my example here in this Post I am treating BMI as the independent variable but if you PLEASE carefully read your Posting assignment for this Week 8 near the very top of this web page, the way that the assignment questions / prompts are written strongly suggests that you all treat BMI as the dependent variable in your Posts !!  I won’t absolutely require you all to do that necessarily but to hopefully avoid a lot of confusion about this point in the Week 8 class member Posts as a whole, I definitely wanted to point out this difference here and make a big deal out of it.        😉

 

The prediction equation was    Total Cholesterol  =  6.25BMI  +  33

 

The meaning of 6.25BMI is 6.25 times BMI .

 

So, everyone, please find predicted y-values for the following BMI values using the prediction equation.

 

( OK  LOL  not everyone but hopefully at least 2 or 3 class members      😉        )

 

Use BMI values of 21   22   23   24   25   26   27   28   29   30   31

 

What you will be calculating there are predicted values for Total Cholesterol.

 

This is called performing so-called linear interpolation.

 

Notice that we did not plug a BMI value such as a BMI value of 15 or 37 because those values for BMI would be outside of the range of the original BMI values for the original ordered pairs.

Click here to ORDER an A++ paper from our Verified MASTERS and DOCTORATE WRITERS: MATH 225N Correlation and Regression

So in other words, we are avoiding the practice of “extrapolation” here.

 

Many text books and authors caution against doing extrapolation.

 

You will see Folks saying that when using a regression equation to calculate a predicted value, stay between the

MATH 225N Correlation and Regression
MATH 225N Correlation and Regression

lowest and highest of the values of the independent variable used to calculate the regression equation ( prediction equation ) to begin with…

 

The value for the linear correlation coefficient r was approximately 0.77

 

The value for the coefficient of determination r2 was approximately 0.59

 

That means that ( for these particular sample data ) approximately 59% of the variation in Total Cholesterol was explained by the variation in BMI.

 

Question for everyone:

 

( only 3 or 4 class members literally need to Post an answer for this – NOT literally everyone !  LOL )

 

So for these data ( as a percent ) what was the approximate percent of variation in Total Cholesterol that was NOT explained by the variation in BMI ??

 

😉

 

The linear correlation coefficient was positive and was moderately strong.

 

The positive linear association is not too surprising.  A link between BMI and Total Cholesterol ( them both increasing together or decreasing together ) seems fairly reasonable.

 

It is important in Week 8 to learn that “correlation does not imply causation.”

 

So based on these sample data alone we cannot say that a higher BMI “causes” a higher Total Cholesterol.

 

But we can note the positive linear association ( “link” or “connection” ) in these particular sample data.

 

Thanks Friends and Best Wishes !

 

Enjoy Friends and Best Wishes and please don’t forget that Week 8 ends on a SATURDAY and NOT on a Sunday !!

 

Thanks and take good care !!

 

Please see attached for the data and simple linear regression results and linear correlation results.

I know you are busy in Week 8 so attached are some optional data sets that you can use for your Week 8 graded Posting assignment in case you don’t feel like you want to look around all over the internet for a data set.

 

Thanks Friends and try hard not to repeat a Post about a data set that another class member has already Posted on.

 

And note some of the data sets are large ( more than 50 ordered pairs ) so be sure you capture and analyze all the data if you choose one of the larger data sets !!

 

You have to please be careful though because the Week 8 Excel spread sheet I bet can only accommodate a fairly small data set ( fairly small number of ordered pairs ) .  So if you pick one of the data sets from this attached spread sheet here, PLEASE be sure that it does not have more ordered pairs than what your Excel spread sheet that you will use for the analysis can handle / accommodate !!!        😉

 

Thanks Friends and Best Wishes !!

 

You are encouraged to find your own data sets that interest you but if you are really pressed for time you can choose and use a data set from this attachment.

 

Some of these data sets have more to do with nursing / healthcare / nutrition / medicine than others so if you pick a data set from the attached try to pick one that ties in with your interests but do feel free to select any of them, including data sets like the house sizes / housing prices or the basketball bouncing data sets…

 

Thanks for your hard work and Best Wishes Friends !!

I used my normal calculator and came up with the following values to your first question, about predicted y values: 164.25, 170.5, 176.75, 183, 189.25, 195.5, 201.75, 208, 214.25, 220.5 and 226.75.  These numbers are quite different from the y values in your excel sheet.

Why?

Is it also safe to assume that if 59% of the variation in total cholesterol was explained by the variation in BMI then 41% of the variation in total cholesterol was not explained by the variation in BMI?

I used my normal calculator and came up with the following values to your first question, about predicted y values: 164.25, 170.5, 176.75, 183, 189.25, 195.5, 201.75, 208, 214.25, 220.5 and 226.75.  These numbers are quite different from the y values in your excel sheet.

Why?

Is it also safe to assume that if 59% of the variation in total cholesterol was explained by the variation in BMI then 41% of the variation in total cholesterol was not explained by the variation in BMI?

You did a very nice job of emphasizing that there is a difference between correlation and causation.  That is extremely important because it is important for the “consumer of the research” to know what the results of scientific studies and undertakings mean and what the results do not mean.

 

How many of you have ever watched a television newscast where the reporter / anchor says something like:

 

“A new study shows that…”

or

“A new research report proves that…”

 

These types of statements and assertions could not be more incorrect and could not be more absurd / ridiculous.      😉

 

Quantitative research does not PROVE anything ( there are some people alive on this planet who probably disagree with me on this very strong statement here ).

 

So Brennaa as you said there is a big difference between coincidence and a true causal link between two quantitative variables.

 

If anyone cares to Google “correlation versus causation” you will likely find some funny stories and some humorous examples that highlight the big difference between coincidence and where there is perhaps a true, distinguishable, relevant cause and effect link…

 

Thanks Brennaa and be well !  Wonderful work and results in the course !!