Encyclopedia > Talk:Normal distribution

Article Content

Talk:Normal distribution

Should "Bell curve" be capitalized? I think not, because it is not the curve of Graham Bell, but it is a curve that looks like a bell. --AxelBoldt

You're right! --LMS

I don't like the examples at all. If a species shows sexual dimorphism, the size of specimens won't be a gaussian, just like the text points out about human blood pressure. Also, test scores are basically an example of the Gaussian limit of Binomials, and GPAs certainly do not follow a gaussian distribution because of grade inflation and limited range of grade points.

To me, these examples smack of the fallacy that "everything is gaussian". See Zipfs law

My rewrite of this page is still under way, in any case. -- Miguel

You could add those counter examples to the list of variables that don't follow the Normal distribution.

I don't understand the point about IQ scores and binomials, though. Which binomials are you thinking of? AxelBoldt

It's a bit more complicated than I made it sound, but the point is that the test score is basically a count of the number of correct answers, and therefore is a discrete variable like a Binomial and not a continuous variable like a Normal, so there is a Binomial-to-Normal limit involved. Here's the actual argument, written out:

Take a test that is composed of N True/False questions. Characterize a test-taker by their probability p of getting a right answer. Then their score will be a binomial B(N,p).

Now, consider a population of test takers. There will be a function F(p) on [0,1] which is the probability density that a randomly selected test-taker will have probability p of getting the right answer. Then, when you administer the test to a sample of test-takers, the probability distribution of the number of correct answers will be the convolution of B(N,p) and F(p), or

P(n) = int_[0,1] P(B(N,p)=n) F(p) dp

-- Miguel

If N is big enough, then the distribution of n/N should be pretty much the same as the distribution of p though (is that right?). So the truly interesting question is then: Why is p approximately normally distributed (or is it?). I would claim that it is, because of the central limit theorem (p pretty much describes the "intelligence" of a person, which is the result of many small mostly indepedent additive effects). AxelBoldt

If N is big enough we can use the central limit theorem to replace P(B(N,p)=n) by the density of a N(Np,Np(1-p)). In other words,

P(n/N=x) = int_[0,1] exp(-N(p-x)²/2p(1-p)) (N/2πp(1-p))^1/2 F(p) dp

This is not exactly a gaussian convolution, but it comes close. The observed P(n/N) is a smoothed-out version of F(p). The two features of this that I want to stress are 1) we did use the theorem of de Moivre-Laplace; 2) the gaussian has variance p(1-p)/N, which means that the test performs best with very good or very bad test-takers, for whom the test is unnecessary anyway; 3) in the limit of infinite N you recover F(p) exactly.

Now, I don't know what the distribution of F(p) should be, but it has to be on [0,1]. The natural family of distributions on [0,1] is the Beta distribution, but that doesn't mean that it has to be a Beta.

I'll create an entry for the Beta shortly. -- Miguel

Isn't it true that the normal distribution N(μ,σ²) is the distribution with the largest entropy among all distributions with mean μ and variance σ²? That would make it the "default choice" in a certain sense. AxelBoldt

I have a question: is the sum of two normal variables always normal, even if the two variables are not independent? (Let's treat constant variables as normal for now, with σ=0.) --AxelBoldt

No. One can find two normals that are uncorrelated, i.e., their covariance is zero, the sum of which is not normal. Let X be a standard normal. Let Y be X or -X according as |X| is small or large; "large" means bigger than a specified number c. If c is big enough then cov(X,Y) < 0; if c is close to 0 then cov(X,Y) > 0; since cov(X,Y) depends continuously on c, there is an intermediate value of c for which cov(X,Y) = 0. X and Y are both standard normals.

Two random variables X, Y have a joint normal distribution if their joint distribution is such that for any two reals a and b the linear combination aX+bY is normal. A similar definition applies to more than two normals. The distribution of a vector of joint normals is determined by the means and the covariances (including the variances). The whole matrix of covariances is sometimes called the covariance matrix, but I prefer to call it the variance, since it's the natural higher-dimensional analog of the variance. It is always a non-negative-definite matrix. Michael Hardy 23:44 Jan 15, 2003 (UTC)

Since mean values are linear and are not affected by correlations, we can reduce this to the case where all variables involved have zero means.

If the two variables have a joint normal distribution with density proportional to

exp(-½Q(x,y))

where Q(x,y) is a positive-definite quadratic form on (x,y), then the answer is definitely yes.

However, there are other meanings you may want to give to "always normal":

The conditional probability densities p(x|y₀) and p(y|x₀) are always normal.

The marginal probability distributions p(x) and p(y) are normal

In the first case, let z=ax+by. Then, the conditional density p(z|y₀) is also normal, and integrating with respect to y gives a Gaussian density.

In the second case, I think the answer is "no" but I need to fin a counterexample. -- Miguel

Yes, I meant the second interpretation about marginal distributions. If it were true, it would be a good argument for the ubiquity of the normal distribution, since in the real world nothing is truly independent. But I doubt it too. AxelBoldt

I just got the answer from the EFnet irc channel #math: the sum does not have to be normal. To quote:

  <not_cub> How about two normals dependent in the following way, (means are 0). If X<0,

            choose Y in the middle 50-th percentiles. If X>0 choose Y outside. Then

            clearly X+Y is not even symmetric

AxelBoldt

Didn't you guys notice that I already gave a counterexample right here on this page? Michael Hardy 02:53 Jan 20, 2003 (UTC)

I was just reading

Huxley, Julian: Problems of Relative Growth (1932)

and the overwhelming biological evidence is that (as pointed out in the text) growth processes proceed by multiplicative increments, and that therefore body size should follow a lognormal rather than normal distribution. The size of plants and animals is approximately lognormal.

Of course, a sum of very many small lognormal increments approaches a normal, but except in the growth of inert appendages such as shells, hair, nails, claws and teeth, growth is best modeled by a multitude of random multiplicative increments.

One should not expect human height distributions in humans to be normal, but lognormal, and the usual statement that they are normally distributed is not supported by an application of the Central Limit Theorem.

--Miguel

its shape resembles a bell, which has led to it being called the bell curve

Every reference describing normal distributions ultimately says this, and yet I have never seen a single normal distribution that I would describe first and foremost as bell-shaped. Some of them do perhaps mildly resemble some quite odd-shaped bells, but I think even these cases are a stretch. Does anyone else find this terminology silly, or, even better, know its historical development? --Ryguasu 10:22 Dec 2, 2002 (UTC)

How is it determined that IQ scores, heights, etc. are "approximately normal"? Does someone just collect a very large sample, plot a histogram, and go "wow - it looks like a bell!" I assume there are more formal methods. Also, does anyone have references for the studies concluding the approximate normality of the variables discussed in the article? --Ryguasu 15:35 Dec 10, 2002 (UTC)

The answer is that until someone comes up with a causation model that explain why it should be normal, it is just a guess. It is a common fallacy that "everything is Gaussian".

The tests to check whether a given distribution is normal (or lognormal etc.) are called "goodness of fit" tests. One simple minded approach is to divide the variable's range into subintervals, let the theoretical distribution predict the probabilities of the various subintervals, and compare those predictions to the observed frequencies with a chi-square test. I don't have references for the relevant studies. Once you have empirically verified that a given distribution is approximately normal, you can of course dream up all sorts of explanations, typically that the given variable can be seen as the result of many small additive influences. AxelBoldt 23:36 Dec 10, 2002 (UTC)

I changed "parameters, commonly called the mean and standard deviation", because they have already a meaning in general, they are not just names.

Perhaps it is better to remove widths and heights from the HTML of the table: currently there is no space between columns in a large font. - Patrick 00:35 Jan 16, 2003 (U

QUESTION: can some give me an example with nonnormal distribution and explain why it is not normal? (mia)

There are infinitely many such distributions. You might check out the binomial distribution. (It's not the best example, because with some parameters it can resemble a normal distribution. But it's relatively easy to understand.) Note that you should take care that "normal" here has two meanings:

pertaining to the "Normal distribution"
typical, standard, normal, etc..

The "Normal distribution" is normal is the only one that is normal in the first sense. Several different distributions (including the binomial distribution) can be considered normal in the second sense. I don't know if this is clear in the article. --Ryguasu 02:42 Jan 30, 2003 (UTC)

Thank you for your reply. I need an example for non normal distribution. It is for a project at college.We are on a chapter in Statistics about Gauss and how to get from discrete to continous variables and about the bell curve and normal distribution. Our teacher said that there are only 3 cases of not normal distribution and we shouldn't look in astronomy cause there everything is normal distributed. To help you understand better what I need: like the IQ scores is normally distributed on the bell curve. when we put the mean in the midlle 50% is on the left side and 50% on the right. and the bell curve is symmetrical. Well I need exactly the opposite. I need to write about something which is not normal distibuted and why it is not.from what i understood it will be a thing like temperature, height, IQ scores, etc. In the page about normal distribution it has 3 not normal cases but I don't understand why they are not normal and I don't know if they are correct. I hope I made more clear what I need and I hope you can help me on that. Thank you. (mia)

Some context would be need before that "3 cases" comment can be understood. The waiting time until the arrival of the next phone call at a swithboard is usually modelled by a memoryless exponential distribution. That can serve as another example. An exponentially distributed random variable is always positive and is memoryless; normally distributed r.v.s do not have those properties. Michael Hardy 18:47 Jan 30, 2003 (UTC)

Several people have indicated to me that there are indeed articles and/or books out there somewhere that go through real-life data in a number of domains, showing many instances of the normal distribution being a good approximate fit. Unfortunately, nobody seems to remember where they might have seen these discussions. Has anyone seen one? I would really appreciate seeing some real data, rather than just being told what seems to be roughly normal and what doesn't. --Ryguasu 21:14 Feb 10, 2003 (UTC)

Funny you should ask. I have exactly the same question. Every statistics textbook claims this, but they never seem to back it up with references. I posted the same question to usenet, and here (http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=40200384.0302071614.438fe80c%40posting.google.com&rnum=1&prev=/groups%3Fq%3Dauthor:axel%2Bauthor:boldt%26hl%3Den%26lr%3D%26ie%3DUTF-8%26scoring%3Dd%26selm%3D40200384.0302071614.438fe80c%2540posting.google.com%26rnum%3D1) is what I got. AxelBoldt 21:55 Feb 10, 2003 (UTC)

Somebody pointed me to History of Statistics books by Stigler, and they look promising. AxelBoldt 15:51 Feb 12, 2003 (UTC)

Why is there a table of the Gaussian cumulative distribution function here?

Wikipedia is not a repository of source texts,
it is not useful as an authoritative source (as anyone can edit), and
a graph would be much more informative about its properties.

-- Anon.

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!