Got data? Let’s dive into it using statistics and Excel. Although we’re discussing this in OneNote, Excel will be our primary tool. If you have access to OneNote, you can follow along with the presentation, but it’s not required.
Getting Started with Data
Our dataset involves daily calorie counts. Here are a few examples:
- 3/12/2016: 2993 calories
- 3/13/2016: 1777 calories
- 3/14/2016: 2480 calories
We’ll treat this dataset as our entire population for statistical analysis. Then, we’ll take samples from this data to run statistical tests and see if the sample information reflects the entire population.
Basic Statistical Measures
- Mean (Average):
- Sum all the calorie counts.
- Divide by the number of entries.
- Formula in Excel:
=AVERAGE(range)
- Median:
- Arrange the data in ascending or descending order.
- Pick the middle value.
- Formula in Excel:
=MEDIAN(range)
- Maximum and Minimum:
- Maximum: The highest value.
- Minimum: The lowest value.
- Formulas in Excel:
=MAX(range)
and=MIN(range)
Creating a Histogram
To visualize the distribution, we create a histogram:
- Select the entire dataset.
- Use Excel’s histogram feature to categorize the data into ranges.
Sampling from the Population
To simulate random sampling:
- Random Number Generation:
- Use
=RAND()
in Excel to generate random numbers. - Attach these random numbers to each calorie count.
- Use
- Shuffle and Sample:
- Sort the data by the random numbers.
- Select the first 10 entries as your sample.
Repeated Sampling
For robust analysis:
- Create multiple random samples.
- Repeat the sampling process (e.g., 75 times).
- Analyze each sample separately and compare.
Example in Excel
Here’s a step-by-step to create multiple samples:
- Generate random numbers and attach them to the calorie data.
- Sort and select the top entries for each sample.
- Repeat this to create several samples.
- Calculate the mean of each sample and compare to the population mean.
Visualizing with Histograms
- Create histograms for each sample.
- Compare these histograms to the population histogram.
Comparing Averages
Calculate the average of the sample means:
- Compare the sample mean to the population mean.
- Check if the sample means are close to the population mean.
Practical Use
By using Excel to handle large datasets, we can better understand statistical concepts. It allows for hands-on practice with data manipulation and visualization.
Summary
Playing with datasets in Excel provides valuable insights into statistical analysis. It helps develop an intuitive understanding of data distributions and sampling variability. In future presentations, we’ll dive deeper into mathematical descriptions and more complex statistical tests.