Encyclopedia > Talk:Central limit theorem

Article Content

Talk:Central limit theorem

I removed this:

An interesting illustration of the central tendency, or Central Limit Theorem, is to compare, for a number of lifts (elevators for those on the left-hand side of the Atlantic), the maximum load and the maximum number of people. For small lifts holding only a few people, the maximum load divided by maximum number of people is usually greater than it is in large lifts holding a larger number of people. This is necessary because some small groups of people who fill the lift may well have several people who are above average weight (just as, on other occasions, other small groups may have several who are well below average weight), whereas the larger the sample (the number of people in the large lift) the nearer the proportion of overweight people will be to the norm for the whole population.

While it is a nice example, it doesn't illustrate the Central limit theorem, whose gist is that the sum is normally distributed. I don't quite know where to put this example though. Maybe in standard deviation or normal distribution? AxelBoldt 21:02 Oct 14, 2002 (UTC)

I've encountered another definition of "the" central limit theorem.

My statistics textbook (Mathematical Statistics with Applications, 6th edition, by Wackerly, Mendenhall III, and Scheaffer) defines it in this way:

If Y1, Y2, ..., Yn are iid with μ and σ, then n^1/2*(Y_bar-μ)/σ converges to a standard normal distribution as n goes to infinity. (my paraphrase)

The HyperStat on-line basic statistics text (http://davidmlane.com/hyperstat/sampling_dist) says

The central limit theorem states that given a distribution with a mean m and variance s2, the sampling distribution of the mean approaches a normal distribution with a mean (m) and a variance s2/N as N, the sample size, increases. (quoted directly)

I suppose this follows from the definition given in this article. Nonetheless, it is not identical to the one given in the article.

Is there a general trend for more basic/applied statistics books to use this mean-centric definition, while more advanced/theoretical ones use the definition given in the article? Is the definition given in the article better somehow? (I assume the mean-centric definition can be derived from it, but not vice versa.) Should the article also mention the mean-centric definition, since it seems to be somewhat popular?

--Ryguasu 10:52 Dec 2, 2002 (UTC)

No --- the "mean-centric" version and the "sum-centric" version are trivially exactly the same thing; either can be derived from the other, and it's completely trivial: Just multiply both the numerator and the denominator by the same thing; you need to figure out which thing. Michael Hardy 04:34 Feb 21, 2003 (UTC)

Right. This became obvious to me sometime after posting the question. Nonetheless, I think I'm going to stick in the mean-based formulation at some point; I've found more books using only the mean-based definition, and I imagine that some not so mathematically inclined people who nonetheless have to brush up against the CLT (certain social scientists come to mind) might like having what is not trivial to them pointed out. I agree, however, that unless proofs of the CLT typically involve the mean-based formulation, the one currently given on this page should be presented as more fundamental. --Ryguasu

Maybe I'm getting in over my head here, but do you really need to normalize S_n to say anything precise here? Can't we clarify the first "informal" claim of convergence of S_n by saying, parallel to what AxelBoldt has said for the normalized (i.e. Z_n) case

The distribution of S_n converges towards the normal distribution N(nμ,σ²n) as n approaches ∞. This means: if F(z) is the cumulative distribution function of N(nμ,σ²n), then for every real number z, we have

lim_n→∞ Pr(S_n ≤ z) = F(z).

Is there a lurking desire here to state the non-standard normal part as a corollary, rather than as central to the CLT? That might be ok, although the general-purpose version looks more useful to me.

--Ryguasu 01:18 Dec 11, 2002 (UTC)

The problem is that on one side of your equality you have a limit as n approaches infinity, so that the value of that side does not depend on anything called n, and which CDF you've got on the other side does depend on the value of n. -- Mike Hardy

Actually, the CDF on the right hand size depends on z, not on n. There are no free ns anywhere. --Ryguasu

It does depend on n, but your notation inappropriately suppresses that dependency. You defined F(z) as the cumulative distribution function of N(nμ,σ²n). AxelBoldt 02:23 Dec 14, 2002 (UTC)

Excellent point. Nonetheless, I find it suspicious that someone with more mathematical experience than me can't express the "informal" claim in a rigorous manner. At Talk:Normal distribution, you mentioned "goodness of fit" tests. Couldn't you express the informal version formally, through some limit statement about the results of such a test as the number of samples/trials goes to infinity? --Ryguasu 02:11 Jan 30, 2003 (UTC)

Probably, I don't know. But the version given in the article is also a rigorous statement of the "informal" claim you have in mind. AxelBoldt 00:55 Jan 31, 2003 (UTC)

How about adding some examples? (This is something most of the math pages are lacking.) How about an illustration involving coin flips? I.e., X_n is defined on the probability space [0, 1] so that X_n is 1 with probability 1/2 and -1 with probability 1/2. A series of graphs and equations could be given.

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!