You're right! --LMS
I don't like the examples at all. If a species shows sexual dimorphism, the size of specimens won't be a gaussian, just like the text points out about human blood pressure. Also, test scores are basically an example of the Gaussian limit of Binomials, and GPAs certainly do not follow a gaussian distribution because of grade inflation and limited range of grade points.
To me, these examples smack of the fallacy that "everything is gaussian". See Zipfs law
My rewrite of this page is still under way, in any case. -- Miguel
You could add those counter examples to the list of variables that don't follow the Normal distribution.
I don't understand the point about IQ scores and binomials, though. Which binomials are you thinking of? AxelBoldt
It's a bit more complicated than I made it sound, but the point is that the test score is basically a count of the number of correct answers, and therefore is a discrete variable like a Binomial and not a continuous variable like a Normal, so there is a Binomial-to-Normal limit involved. Here's the actual argument, written out:
Take a test that is composed of N True/False questions. Characterize a test-taker by their probability p of getting a right answer. Then their score will be a binomial B(N,p).
Now, consider a population of test takers. There will be a function F(p) on [0,1] which is the probability density that a randomly selected test-taker will have probability p of getting the right answer. Then, when you administer the test to a sample of test-takers, the probability distribution of the number of correct answers will be the convolution of B(N,p) and F(p), or
P(n) = int_[0,1] P(B(N,p)=n) F(p) dp
-- Miguel
If N is big enough, then the distribution of n/N should be pretty much the same as the distribution of p though (is that right?). So the truly interesting question is then: Why is p approximately normally distributed (or is it?). I would claim that it is, because of the central limit theorem (p pretty much describes the "intelligence" of a person, which is the result of many small mostly indepedent additive effects). AxelBoldt
If N is big enough we can use the central limit theorem to replace P(B(N,p)=n) by the density of a N(Np,Np(1-p)). In other words,
P(n/N=x) = int_[0,1] exp(-N(p-x)2/2p(1-p)) (N/2πp(1-p))1/2 F(p) dp
This is not exactly a gaussian convolution, but it comes close. The observed P(n/N) is a smoothed-out version of F(p). The two features of this that I want to stress are 1) we did use the theorem of de Moivre-Laplace; 2) the gaussian has variance p(1-p)/N, which means that the test performs best with very good or very bad test-takers, for whom the test is unnecessary anyway; 3) in the limit of infinite N you recover F(p) exactly.
Now, I don't know what the distribution of F(p) should be, but it has to be on [0,1]. The natural family of distributions on [0,1] is the Beta distribution, but that doesn't mean that it has to be a Beta.
I'll create an entry for the Beta shortly. -- Miguel
Isn't it true that the normal distribution N(μ,σ2) is the distribution with the largest entropy among all distributions with mean μ and variance σ2? That would make it the "default choice" in a certain sense. AxelBoldt
I have a question: is the sum of two normal variables always normal, even if the two variables are not independent? (Let's treat constant variables as normal for now, with σ=0.) --AxelBoldt
No. One can find two normals that are uncorrelated, i.e., their covariance is zero, the sum of which is not normal. Let X be a standard normal. Let Y be X or -X according as |X| is small or large; "large" means bigger than a specified number c. If c is big enough then cov(X,Y) < 0; if c is close to 0 then cov(X,Y) > 0; since cov(X,Y) depends continuously on c, there is an intermediate value of c for which cov(X,Y) = 0. X and Y are both standard normals.
Two random variables X, Y have a joint normal distribution if their joint distribution is such that for any two reals a and b the linear combination aX+bY is normal. A similar definition applies to more than two normals. The distribution of a vector of joint normals is determined by the means and the covariances (including the variances). The whole matrix of covariances is sometimes called the covariance matrix, but I prefer to call it the variance, since it's the natural higher-dimensional analog of the variance. It is always a non-negative-definite matrix. Michael Hardy 23:44 Jan 15, 2003 (UTC)
I just got the answer from the EFnet irc channel #math: the sum does not have to be normal. To quote:
<not_cub> How about two normals dependent in the following way, (means are 0). If X<0,
choose Y in the middle 50-th percentiles. If X>0 choose Y outside. Then
clearly X+Y is not even symmetric
Didn't you guys notice that I already gave a counterexample right here on this page? Michael Hardy 02:53 Jan 20, 2003 (UTC)
Huxley, Julian: Problems of Relative Growth (1932)
and the overwhelming biological evidence is that (as pointed out in the text) growth processes proceed by multiplicative increments, and that therefore body size should follow a lognormal rather than normal distribution. The size of plants and animals is approximately lognormal.
Of course, a sum of very many small lognormal increments approaches a normal, but except in the growth of inert appendages such as shells, hair, nails, claws and teeth, growth is best modeled by a multitude of random multiplicative increments.
One should not expect human height distributions in humans to be normal, but lognormal, and the usual statement that they are normally distributed is not supported by an application of the Central Limit Theorem.
--Miguel
Every reference describing normal distributions ultimately says this, and yet I have never seen a single normal distribution that I would describe first and foremost as bell-shaped. Some of them do perhaps mildly resemble some quite odd-shaped bells, but I think even these cases are a stretch. Does anyone else find this terminology silly, or, even better, know its historical development? --Ryguasu 10:22 Dec 2, 2002 (UTC)
The answer is that until someone comes up with a causation model that explain why it should be normal, it is just a guess. It is a common fallacy that "everything is Gaussian".
I changed "parameters, commonly called the mean and standard deviation", because they have already a meaning in general, they are not just names.
Perhaps it is better to remove widths and heights from the HTML of the table: currently there is no space between columns in a large font. - Patrick 00:35 Jan 16, 2003 (U
QUESTION: can some give me an example with nonnormal distribution and explain why it is not normal? (mia)
There are infinitely many such distributions. You might check out the binomial distribution. (It's not the best example, because with some parameters it can resemble a normal distribution. But it's relatively easy to understand.) Note that you should take care that "normal" here has two meanings:
The "Normal distribution" is normal is the only one that is normal in the first sense. Several different distributions (including the binomial distribution) can be considered normal in the second sense. I don't know if this is clear in the article. --Ryguasu 02:42 Jan 30, 2003 (UTC)
Thank you for your reply. I need an example for non normal distribution. It is for a project at college.We are on a chapter in Statistics about Gauss and how to get from discrete to continous variables and about the bell curve and normal distribution. Our teacher said that there are only 3 cases of not normal distribution and we shouldn't look in astronomy cause there everything is normal distributed. To help you understand better what I need: like the IQ scores is normally distributed on the bell curve. when we put the mean in the midlle 50% is on the left side and 50% on the right. and the bell curve is symmetrical. Well I need exactly the opposite. I need to write about something which is not normal distibuted and why it is not.from what i understood it will be a thing like temperature, height, IQ scores, etc. In the page about normal distribution it has 3 not normal cases but I don't understand why they are not normal and I don't know if they are correct. I hope I made more clear what I need and I hope you can help me on that. Thank you. (mia)
Some context would be need before that "3 cases" comment can be understood. The waiting time until the arrival of the next phone call at a swithboard is usually modelled by a memoryless exponential distribution. That can serve as another example. An exponentially distributed random variable is always positive and is memoryless; normally distributed r.v.s do not have those properties. Michael Hardy 18:47 Jan 30, 2003 (UTC)
Funny you should ask. I have exactly the same question. Every statistics textbook claims this, but they never seem to back it up with references. I posted the same question to usenet, and here (http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=40200384.0302071614.438fe80c%40posting.google.com&rnum=1&prev=/groups%3Fq%3Dauthor:axel%2Bauthor:boldt%26hl%3Den%26lr%3D%26ie%3DUTF-8%26scoring%3Dd%26selm%3D40200384.0302071614.438fe80c%2540posting.google.com%26rnum%3D1) is what I got. AxelBoldt 21:55 Feb 10, 2003 (UTC)
Somebody pointed me to History of Statistics books by Stigler, and they look promising. AxelBoldt 15:51 Feb 12, 2003 (UTC)
Why is there a table of the Gaussian cumulative distribution function here?
Search Encyclopedia
|
Featured Article
|