When you are conducting data analysis, sample size is very important. A dataset might be too large to analyze all data, or you might only have a small amount of data available. Another important factor is the time period for the data. Data can change over time. The importance of the time factor depends on the type of data. Population growth might be relatively consistent in different time periods with few factors affecting trends. On the other hand, the stock market and other financial data may vary widely over different time periods with many factors affecting pricing and trends.
With small samples, predictions based on a small number of observations are highly variable. Random chance or outliers can heavily skew the results.
With large samples, predictions converge toward the “true” value. Random errors cancel out, and outliers have less influence.
The true value is the actual underlying quantity in the entire data set (all possible observations), not just in your sample. The true value itself does NOT change with sample size; it is fixed (but usually unknown). The estimate of the true value depends on sample size.
With small samples, the sample average may be far from the true average (lots of variability).
With larger samples, the sample average converges toward the true average. This is formalized in the Law of Large Numbers. As sample size grows, estimates get closer to the true value.
For example, a coin’s true probability of heads is 0.5, whether you flip 10 times or 10 million.
True value equals fixed population parameter (the real underlying average, probability, or growth rate).
Sample size doesn’t change the true value.
Sample size changes how close your estimate is to the true value.
Another factor that influences the reliability of a data sample is the time period . The importance of the time period depends on the type of data. Whether you need many samples in a short time period or a few samples over a large time interval depends on your objective.
Population growth of living things is relatively predictable. There are a few major sudden interruptions to trends, in most cases. Few variables influence population grow.
Financial markets such as the stock market are more complex to predict. Many samples from a trading strategy taken during a relatively short time period may not predict future long-term results. There may be a bull or a bear market in a different time period. Many variables affect stock prices including company earnings, price to earnings ratio, industry outlook, expectations, news, new government rules, interest rates and more. To complicate matters, large purchases by individuals or institutions, short covering and wide-use of a trading strategy affect stock prices.
We plan to feature and explain samples in a future post, using actual stock option trades from a trading strategy that returned excellent results.