Tuesday, November 6, 2012

Presentations & Study Guide Notes!

Statistics Unit: Week 3

For our final week in our Statistics unit, we focused on the presentations/lesson plans that were given in class, as well as the study guide assigned.

Part 1: Presentations

While all of the presentations were well thought-out and executed, they all focused on "Mean/Median/Mode" - this gave us little help with the other components that will be on the exam such as Standard Deviation, Outliers, and Box & Whisker plots.

That being said, there were a couple of presentations that stood out most to me:
  1. Presentation #1 (Hailey): This presentation focused on finding the mean/median/mode of a large set of data by putting it into a dot plot. The data was based off the ages of the presenter's Facebook friends and, as such, there were several numbers that repeated multiple times. What stood out to me was that I struggled to add all 49 numbers and calculate the mean/median/mode in a strictly mathematical way. What I failed to realize until after was that I could have organized them into the dot plot FIRST. This would have made easier to find all three M's without doing as much work. Overall, I found this lesson helpful to me because it made me see that their are much easier ways to find answers to mathematical questions without listing out every single data point.
  2. Presentation #3 (Kaycie): This was the presentation where we found mean/median/mode by using different color-coded candy. Many in class found this confusing and somewhat tricky, but I feel as though the point of the lesson was to show us that sometimes, we get tricked by multiple different factors. There was a large mix of candy (from Skittles, to M&Ms, to Mike&Ikes) and it confused people because they did not know whether to sort the groups by color or type of candy. While it seems trivial, this is an important concept because oftentimes in math exams or homework, we are given multiple variables in one problem. What is important is that we distinguish what is important information and what is not.
  3. Presentation #6: The last presentation was also mean/median/mode, but was different from the other in that it incorporated the Box & Whiskers plot and outliers. This was a nice refresher because it reminded us how to create a Box & Whiskers plot, as well as calculate the positive and negative outliers by using the IQR equation.

Part 2: Study Guide

While I have not yet fully completed the study, there were two things that I found extremely useful/important to know for tomorrow's exam:

The 5-Number Summary: One of the problems in the study guide was to find what the 5-number summary was in a group of data and/or graph. I was unsure of what this meant until I looked it up and a 5-number summary is this:

Minimum - Lower Quartile - Median - Upper Quartile - Maximum

This summary is basically the main, definitive points in a Box & Whiskers Plot. This is important because a 5-number summary gives us the general range or extremes of the data set.

Bell Curve: Another thing I found to be important was the problems concerning the bell curve (Section 13.3, #11 A-C). This is especially tricky for me because I often confuse myself and make little mistakes when it comes to calculations. These problems helped reinforce my knowledge of the percentages that go into the bell curve (68%, 95%, 99.7%) as well as how to compute standard deviation when given a mean and an interval.

Outliers: The study guide also helped me practice with outliers, which I feel wasn't covered as much as I would've liked in class. If you want help on how to calculate the IQR and find the positive and negative outliers (if any), please refer to my second blog post below this.

Closing

I hope my blog has helped clarify some concepts we learned during this short Statistics unit in Math 252. I wish everyone good luck on the exam!

Sunday, November 4, 2012

Outliers and Misleading Graphs!

Statistics Unit: Week 2

Hello, all! We had only one lesson this week in Math 252, considering this previous Monday was a lab day, so this post will be on the short side.

Part 1: Outliers

By definition, an outlier is a "data point on a graph or in a set of results that is much bigger or smaller than the next nearest data point." That being said, while outliers may sound pretty obvious and easy to find, there is an equation that goes along with it. This is found by calculating/finding what is called the interquartile range (IQR), which is the distance between the quartiles.

The equation is simple enough: you subtract the upper quartile from the lower quartile, followed by multiplying that difference by 1.5.

An example would look like this:
Upper Quartile = 30, Lower Quartile = 23

First, subtract the upper quartile from the lower quartile to get the IQR:
30 - 23 = 7 --> IQR

You would then multiply the IQR by 1.5, like this:
7 x 1.5 = 10.5

Then, you would take this number to find the range of the outliers:
30 + 10.5 = 40.5 --> An outlier would be ABOVE 40.5
23 - 45 = -22 --> An outlier would be BELOW -22

** REMEMBER! An outlier is NOT just some random number on a graph/plot, or the biggest or smallest number you see when looking at a set of data. An outlier is found in a specific range and can only be done by following the equation steps above.

Part 2: Misconstrued Graphs/Data

In class this past week, we also learned of how graphs and statistics can sometimes misleading. This happens for a number of reasons from the graph having no labels, the distance between intervals being uneven, et cetera. Why this happens also has many reasons, but the main one being that those who construct the graphs only want the audience to see a certain perspective.

An example of a misconstrued graph would look like this:


The graph is made to show the difference in tax rate from now to early 2013. While the difference seems large and staggering, if someone were to look closely at this graph they would see that the difference is actually not that big. The "now" shows a 35% tax rate, while "2013" shows a rate of 39.6%. That is only a 4.6% increase, despite the graph depicting it to be much larger.

This is done by the scale of the graph. If you look at the scale to the right of graph, you will see that it starts at 34 (instead of 0) and ends at 42 - this makes it look as though there is huuuuuge difference because 35 is closer to the bottom and 39 is closer to the top.

While it is easy to miss these small details, it is also just as easy to fix. To make this graph more accurate, all you would have to do is readjust the scale (perhaps starting at 0 and ending at 50), so that it would show that the difference in tax rate is not that big.

Closing

Next week's blog post will center around the presentations in class, as well as the review homework we were assigned. Until next time!

Thursday, October 25, 2012

Graphs, correlations, and deviations!

Statistics Unit: Week 1

This week in Math 252 we definitely went through a plethora of new information and that means we have a lot to cover today, so be prepared for a very long post! I will be breaking them up in sections for easy reading.

Part 1: Box & Whiskers Plot

A Box & Whisker plot is a perfect way to look at a set of data in a broader sense by only highlighting the 5 main numerical values. This eliminates the extra numbers that may not have any prominent significance. The lower extreme is the minimum or lowest number used in the sample as a whole, while the upper extreme is the highest. 

Ex.) If you were doing a data set of the number of cookies someone has, with 1 being
the lowest and 100 being the highest, 1 = Lower Extreme and 100 = Upper Extreme.

Then, we have the quartiles and the median, which is a lot of what a Box & Whiskers Plot relies on to represent their data. The best way to describe this would be to show you:

Ex.) Take this set of numbers: 1  2  3  4  5  6  7  8  9

First you would find the median of the set, which is the middle number.
 1  2  3  4  5  6  7  8  9
In this case, the median would be the number 5.

Then you find the lower quartile, which is found by looking at the left side of 
the median, or the lower set of numbers and finding the median in that set. 
 1  2  3  4  5  6  7  8  9

Because the set of numbers is even, you would have to find the average of 
the two middle numbers. In this case, those numbers would be 2 and 3.
2+3 = 5 --> 5/2 = 2.5

The lower quartile is 2.5

Finally, you would find the upper quartile, which would be the set of numbers 
to the right of the median. You would then find the median of that set.
 1  2  3  4  5  6  7  8  9

Again, you would find the average of the two numbers.
7+8 = 15 --> 15/2 = 7.5

The upper quartile is 7.5

Now, it's time to construct your graph! 


Ultimately, the idea of a Box & Whiskers plot, when drawn out, is supposed to like the image above. The lower quartile, upper quartile, and median, should all be marked with a long, vertical lines and thus connected together in order to make a "box" shape. The purpose of this is to show the viewer that the majority of the numeric values given falls somewhere within that box.

**IMPORTANT: When creating a Box & Whiskers Plot, always keep the scale of your graph in mind. For example, your lower quartile is 4 but drawn out to look like it is closer to your upper extreme of 30, then your scale is completely unbalanced. Always remember that a unit = 1 UNIT, and to draw your points as you would expect to see them on a regular number line.

Part 2: Causation v. Correlation

We also learned the differences between causation and correlation in class in reference to scatter plots and graphs in general. 
  • Causation means that a cause and effect inevitably will happen, there is no way around it. 
    • A non-mathematical example of this would be if you fail to feed your pet - ultimately, your pet will die. 
  • Correlation, on the other hand, means that while there is a relationship between two things, it is not always correct to assume that relationship will cause something to happen. 
    • A non-mathematical example of correlation would be the relationship between hair length and how tall you are (as seen in the activity we did in class). Students found that there is no direct correlation of how tall you are being connected to your hair length, despite both being related in that they are apart of your body.
That being said, there are three types of correlation:
  • Positive Correlation: When one set of data increases, the other set of data also increases. Correspondingly, if one set of data decreases, so does the other set of data. An example of positive correlation in the form of a graph would look like this:
  • Negative Correlation: When one set of data increases, the other set of data decreases. An example of a negative correlation in the form of a graph would look like this:


  • Zero Correlation: This means there is no discernable relationship between the two sets of data. An example of zero correlation in the form of a graph would look like this:

Part 3: Standard Deviation

Finally, we learned about standard deviation. While we didn't go too in-depth, we did learn a couple basics. For example, standard deviation works in specific percentage increments of 68%, 95%, and 99.7%.  When drawn out, the percentages of data creates a curved line. This is called the Bell Curve and it looks like this:

While it is not necessary to memorize the calculation for deviation (for it's quite long and tedious amounts of works), I find that it's good to know what it is - if not for future reference. There are two formulas to calculate SD, but mathematicians tend to use the simple equation. An example of using this equation would look like, which was done in class during our activity:


I apologize for the poor quality and if it looks confusing, but I personally have trouble doing this equation so I'm definitely going to look into practicing it many more times!

Closing
Whelp, that's all for now! Hope I helped in reiterating some of the lessons we learned in class this week. Until next week!