Statistics for Application 1 | Statistics vs. Probability, LLN vs. CLT

Series: Statistics for Application

Statistics for Application 1 | Statistics vs. Probability, LLN vs. CLT

Statistics vs. Probability

(1) Definition of the Statistics and the Probability

When you do probability, you are given a kind of truth. Somebody tells you what die god is rolling so you know exactly what the parameters of the problems are. So basically what you are trying to do is to describe what the outcomes going to be like. Then you can use the conversations or data to generate a statistical model and calculate the statistics to simulate the truth. This is the fundamental idea behind statistical modeling.

So the probability is something about the real truth and we can say that these truths hold for all the samples we would achieve from the population. However, the statistics are just the opposite, it is something that we use to describe a sample or a bunch of samples, and we use those descriptions to hopefully get an estimation of the truth (and we can never make 100% sure).

(2) Probability Problems: An Example

Previous studies showed that the drug was 80% effective (Truth). Then we can anticipate that for a study on 100 patients, on average, 80 will be cured.

Run this code, we will get the result around:

80.0

Also, at least 65% will be cured with 99.99% chances.

Run this code, we will get the result around:

99.99%

(3) Statistics Problems: An Example

In a sample of 100 patients, we observe that 78/100 patients were cured. We don’t know anything about the truth so we have to estimate the truth. By statistics, we will be able to conclude that we are 95% confident that for other studies the drugs will be effective on between 69.88% and 86.11% of patients.

Run this code, we will get:

95% CI is [69.88% ,  86.12%]

We can also draw a graph to show how we change the original standard normal distribution to the distribution of our sample context.

Run this code, we will get:

2. LLN vs. CLT

(1) The Law of Large Numbers

Let X1, X2, …, Xn be independent and identically distributed random variables (i.i.d. r.v.), and,

then, for both weak and strong laws, we have,

we can stimulate this process by,

(2) Central Limit Theorem

Let X1, X2, …, Xn be independent and identically distributed random variables (i.i.d. r.v.), and,

Basically, there are four forms of CLT and they are talking about the same thing.

Form #1

Form #2

Form #3

Form #4

Now, look back to the medicine curing issue. So of course, in this case, we have X1, X2, …, Xn ~ Ber(0.8) are independent and identically distributed random variables. Then we must have the CLT of this problem. Then,

This conclusion holds if we change the distribution of X to some other distribution.