In this blog, we will dive into the world of statistics and Excel to compare batting averages in baseball. We will use Excel to collect data from an online resource, analyze it, and create bell curve distributions to understand the distribution of batting averages. Our goal is to compare two different years, 1920 and 2022, and see if the batting averages conform to a bell curve distribution.
- Gathering Data: To begin, we’ll collect data from an online resource, such as baseball-reference.com. We’ll focus on player standard batting statistics for the years 1920 and 2022. You can download the data as a CSV file and then import it into Excel.
- Data Cleanup: After importing the data, you may need to clean it by using the Text to Columns feature to separate the values properly. Make sure to filter out irrelevant data and eliminate outliers, such as players with zero batting averages.
- Create Tables: Create tables for both years, 1920 and 2022. This will help us organize and manipulate the data effectively.
Calculating Basic Statistics:
- Calculate the Mean: Compute the mean (average) of batting averages for both 1920 and 2022. The mean is a measure of central tendency.
- Calculate the Standard Deviation: Determine the standard deviation for both datasets. The standard deviation measures the spread or dispersion of the data.
- Calculate the Median: Find the median (middle value) of batting averages for both years. It provides insight into the central value of the data.
- Calculate the Mode (Optional): Optionally, calculate the mode, which represents the most frequently occurring batting average.
Building Bell Curve Distributions:
- Creating the X-Axis: Set up a table with values for the x-axis. We recommend using a range from -5 to 5 standard deviations for both datasets, so you capture most of the data.
- Calculating Probability Density: Use the NORM.DIST function in Excel to calculate the probability density function (PDF) for each x-value. This function gives you the probability of a data point occurring at a given x-value.
- Converting to Percentages: Convert the probability density values to percentages so that they sum up to 100%. This allows you to visualize the distribution effectively.
Creating Bell Curve Graphs:
- Create Histograms: Insert histograms for both datasets using Excel’s charting features. The histograms will represent the bell curve distribution of batting averages.
- Label the Charts: Ensure your charts are well-labeled with titles, axis labels, and legends. This will make them more informative and easy to understand.
Comparing the Bell Curves:
- Analyze the Bell Curves: Examine the shape of the bell curves for 1920 and 2022. Do they appear to follow a normal distribution?
- Differences and Insights: Look for differences and trends between the two years. Are there any significant variations in the distribution of batting averages? Analyze why these differences might exist.
In this blog, we’ve explored how to collect, clean, and analyze baseball batting average data using Excel. We’ve created bell curve distributions to understand the data’s distribution characteristics for the years 1920 and 2022. By comparing these distributions, we can gain insights into the changes in batting averages over time and identify any interesting trends or outliers.
In the next part of this blog, we will continue building the bell curves and performing a more in-depth comparison of the two datasets. Stay tuned for more statistical insights and data analysis!