Econometrics and Statistics: One Reality
Source is Introduction to Econometrics by Stock and Watson.
Given data, econometrics uses statistics to make measurements in economics. We define probability as the proportion of time that the outcome occurs in the long run. We define a sample space as the set of all possible outcomes. A random variable comes from a random outcome and this random variable can take on different numbers: it is a numerical summary of a random outcome. Some random variables are discrete and take on only a discrete set of values while a continuous random variable takes on continuous possible values. A probability distribution shows you the likelihoods of each outcome where the probabilities sum to one. A cumulative probability distribution function returns the probability that a random variable is less than or equal to a particular value. A Bernoulli distribution is confined to two outcomes: 0 or 1, each with a particular probability both summing to one. A probability density function shows the probabilities of various continuous outcomes and the integral between any point gives the probability the variable falls in that range. The expectation or mean of a random variable is the long run average of the random variable. The standard deviation and variance of a random variable make note of the dispersion of the random variable. Skew is a measure of the shape of a distribution and kurtosis is a measure of how much mass is in the tails of a probability distribution. Suppose there are two random variables they have a joint distribution of values they take on together with probabilities. We can calculate the marginal probability distribution of one variable alone by summing up all the probabilities of possible outcomes of the other variable per each value of the original variable. A conditional distribution of a random variable is the distribution given that another variable takes on a certain value. We can have conditional expectations given a random variable takes on a certain value, so another random variable has a certain expectation as a result. The law of iterated expectations tells us that the expectation of a random variable has a relationship to the conditional expectations of the random variable conditioned on a second variable and the probabilities of each possible outcome the second variable is conditioned upon. There also is conditional variance based on the confining of outcomes of a second variable. Two random variables are independent if the probability distribution of one does not change based on the value the other random variable takes on. Covariance is the degree two random variables vary with one another and correlation is a measure of the same type that controls for the standard deviation of both variables.
There are several interesting distributions. The normal distribution is a bell shaped distribution defined by mean and standard deviation, the rest is already defined by saying it is a Normal distribution. A joint set of random variables can be jointly normally distributed. A bivariate normal distribution for example implies a linear function of two normally distributed variables also has a normal distribution with a specific mean and variance that is a functional product of the properties of the individual normal variables. The chi squared distribution is the sum of squared independent normal variables and we specify how many there are. The number is known as the degrees of freedom which is a general statistical term for how many dimensions the data is compared to how many dimensions the estimator can take on from what I can gauge. For example sometimes when we simplify an estimator we reduce the degrees of freedom and we have to change formulas for the estimator as a result. A student t distribution is a normal distributed random variable Z divided by the square root of a chi squared distribution divided by its degrees of freedom. The student t distribution resembles a normal distribution but has fatter tails. The F distribution with m and n degrees of freedom is (W/m) / (V/n) which means W and V are chi squared distributions and both are divided by their degrees of freedom and then a ratio is taken of the two. Come to think of it degrees of freedom is best defined as the dimensions of a random variable were it to produce data.
Now we think about sampling as all econometrics is the sampling of data and the drawing of conclusions from the samples. When we sample with identical, independent draws our sample has an average which has a particular distribution which depends on the distribution of the underlying population true characteristics. We can exactly derive what a sampling distribution is for a population or we can rely on large sample approximations where we use asymptotic distributions for which if we take a lot of samples the central limit theorem tells us that the sample distribution of the sample average thereof is approximately normal. The law of large numbers says that your sample average will be near the population average when you take a lot of samples. This is known as convergence in probability or consistency. Notice when we use consistency in everyday parlance in English we mean a logical progression usually but in statistics, we see a convergence to the truth is what defines consistency. If you merely create a very consistent logical progression that doesn’t lead to the truth it isn’t true consistency in a statistical sense but we can say it is dogma or precision without accuracy. The central limit theorem introduces the normal distribution into sampling as it tells us even if the underlying population has a skewed distribution as we pull sample averages repeatedly with larger numbers per sample, we see the normal distribution show up to depict the distribution of the sample average. In real terms this means you may have the craziest bunch of friends but as I get to know more and more of them at once, each time a random selection of the set of all your friends, on any given criteria I measure averaging across the group they will seem to be quite normal when the averages are compared to each other, some subgroups of your group of friends will be on tail ends of fervor a lot of subgroups will be in the middle and it won’t be too different from any group of friends in my life where I’m looking at the distribution of the mean of any group of friends in my life if I sample random groups among all the friends in my life. We are different from one another but when we look at the distribution of the averages of one another we are quite similar, for statistical consistency reasons as we converge in probability to being in a normal distribution when our averages are compared to other averages around us. The answer to the rhetorical question: are we all the same or are we all different? We are all the same at some innate level so we are categorized together but our distributions of what’s in our life differ and we are all different in what surrounds us and our own attributes as compared to what surrounds us and our averages may even be different, some Ivy League some living at home with parents, some both, but our averages are consistent in that they converge in probability by the central limit theorem to normal distribution so when we look at averages of what is around us we look normal when we compare our distribution to itself, a lot in the middle fewer in the tails, so all of us think we are normal when we look at averages but a lot of us are quite outliers if we see that as weird or as I do remarkable.
In econometrics, the sample average measures the population average but is there bias, consistency and efficiency? How fast does your sample average converge to the population average? One estimator is the least squared estimator that minimized the square of the gap between the estimator and the sample points. This is widely used in econometrics to draw a line of best fit through scattered dots. When we have an estimator it has a distribution and we can define confidence bounds on the population mean using the estimator which tries to measure the population mean. We can also create a probability measure called a p-value that defines how much the estimator is likely to be different from zero, meaning no relationship if we are estimating a slope coefficient of a population true relationship measured by slope, and rather we are saying how likely it is to have gotten these results by chance so if the p-value is really low like below five percent we say only the most extraordinary circumstances would have led us to believe this relationship exists when it truly does not in the population so we say probably it did not result from chance our measurement of a relationship.
We frequently test the population mean deviance from the sample mean and divide it by the sample standard deviation. This is a t statistic and it is used to calculate p-values in a standardized way. Hypothesis tests use a null hypothesis which if rejected for a one sided alternative or a two sided alternative tells us our statistical standards led us to conclude that the not observed population variable is not in fact equal to a certain value by the null hypothesis but is in fact greater than or else less than depending on the test definitions or if it is a two sided alternative that is accepted, the population variable may be unable to be placed on either bounds so it is either greater than or less than the null hypothesis. We sometimes run tests to compare the means between two populations and derive confidence intervals that imply probabilities that the means are equal to one another. Notice in statistics we are usually talking about correlation not causation but we can see causal effects by seeing how the expectation of a random variable changes with the change in value of another random variable. Positive covariance or correlation basically means the scatter plots of two variables show a pattern in a diagonal line that depict a relationship with a slope.
Econometrics performs mostly linear regressions though we can square variables and run basic nonlinear regressions that way on squared independent variables. However we want to focus on linear regressions because squared relationships are usually too complex to be well modeled by such a basic modification which is also quite inflexible unless we have reason to believe for example that the dependent variable depends on the independent variables and the squares of the independent variables. A linear regression estimates intercept and slope and the distribution of the error term. All of the estimations have confidence bounds. The standard error is the average size of the error as measured by standard deviation of the error term. R squared is a measure of fit of the regression. The explained sum of squared is divided by the total sum of squares. The least squared assumptions are the conditional distribution of the errors given any value of the independent variables has mean zero, the explanatory and dependent variables are independent and identically distributed, and large outliers are unlikely.
The basic premise of all this statistics as a base for this summary of linear regression in econometrics is we shouldn’t get carried away with data exploration as that is the point of econometrics but rather to think about sampling versus the underlying, to think that what we see is merely a sample of reality and we can’t take what happens in a single day and assume that is a good measure of what reality is. We have to say there is a reality out there, and a sample we see in front of us, and ask: can we form conclusions about the reality out there based on all the samples in front of us? For example with a friend I can sample that person many days over the course of my relationship with that friend and that tells me where the reality is in terms of the relationship more than any day in front of me when I’m not even sampling from that relationship anymore but sampling for example from my family. What is measured is different from the measurement and consistency is key as to whether your estimates converge in probability to the true underlying values which can be parameters or maybe just values you want to know as you estimate properties of these values taking the degrees of freedom of your estimator into account: have you simplified the measurements and if you have you need to correct your formulas to account for you losing a degree of freedom as you lost one direction of measurement on your population mean. For example you may have forgotten someone’s cellphone number and now are approximating with your memory of what that person looks like. They say this too will pass. I say go on the wrong way per Land Before Time the animated film and I will follow the great circle and I will drink deep from the Pierian Spring or drink not for a little knowledge is a dangerous thing.