We have been using demand equations (Qd = 580 - 2P) without indicating exactly where they come from. In CH 4, we discuss how a firm can Estimate Demand for its products. It the responsibility of a firm's business economists and forecasters (or external economic consultants) to estimate and construct demand equations for a firm's products, for example:
QD = 25 + 3 Y + Pc - 2P
This assumes that we can accurately predict the firm's TR and Profits at various prices and income levels. Estimating demand and making forecasts brings up 3 issues / questions:
1. What is the best (most accurate) forecasting equation?
2. How much does the equation explain or NOT explain? How accurate is the forecast? What about the size and likelihood of forecast errors?
3. What are the profit consequences of forecast errors?
We focus on Issue #1 in this chapter.
SOURCES OF INFORMATION TO ESTIMATE DEMAND
CONSUMER SURVEYS can be used to ask consumers directly about buying behavior, face-to-face, telephone, direct mail, Internet, at checkout, etc. For example, the Houston-FL airline could ask a randomly selected, statistically valid group of consumers about their travel plans, prices, services, convenience, feelings toward the competitor, the impact of a recession/expansion on travel plans, etc. From this survey, the firm could possibly construct a demand equation like the one above.
COURTYARD BY MARRIOT: Classic example of how the consumer survey method was used to design a new type of hotel - informal, small, high-value hotel for price sensitive business and vacation travelers. See pages 138 - 139 for the features that were considered. The final design was actually much different than what Marriot envisioned. Courtyard Marriots have been successful and were subsequently copied: e.g. Holiday Inn Express.
SURVEY PITFALLS
1. Sample bias occurs when the sample is NOT RANDOM. Example: Political poll, asking only Republicans or Democrats. Poll on school vouchers, asking only school teachers, etc. 1936 election poll predicted Alfred Landon (R) would win over Roosevelt.
2. Response bias occurs when consumers' answers are biased, e.g. giving answers the questioner wants to hear, or overestimating income, or underestimating age, etc.
3. Response accuracy occurs when potential consumers cannot accurately assess their willingness to buy at various prices - answering abstract what-if questions are difficult to answer with accuracy.
4. Cost - Consumer surveys are costly.
CASE STUDY - NEW COKE: 1985, Coca-Cola introduced New Coke, after 4 years of research, 190,000 taste tests. New Coke was consistently favored over Old Coke and Pepsi in the taste tests, but failed miserably when introduced. Old Coke (Coke Classic) had to be revived. Illustrates the pitfalls of survey based research.
CONTROLLED MARKET STUDIES
A firm could conduct a "controlled market study," where the firm changes one (or more) key variables for a product sold in more than one market. For example, USA Today varied the price of its newspaper around the country to assess price elasticity of demand. A firm could also vary price and advertising, with various combinations of high-low price and high-low spending on advertising, page 141.
The firm would try to impose ceteris paribus conditions, to isolate the independent effect of one (or two) key variable(s): price or advertising, by trying to "control" for the other factors, e.g. trying to compare markets of the same approx. size, same demographics, etc.
TWO TYPES OF MARKET STUDIES:
1. Cross-sectional study, conducted at one point in time across different economic units (individuals, regions, countries, states, counties, cities, etc.). Yi = f (X1i, X2i , etc.), where i is the "unit of analysis" and i = individuals, regions, markets, cities, etc.
Example: USA Today, cross-section study of different cities at one point in time.
2. Time-series study, where the same economic unit is studied over time. Yt = f (X1t X2t , etc.), where t is the unit of analysis and t = days, weeks, months, quarters, etc.
Example: NWA (or GM) varies its prices over time, and tries to estimate demand or elasticity with time series data.
Potential issues with market studies:
1. Ceteris paribus:
a. Cross-section: Are all markets/regions the same?
b. Time-series: Advertising in one period could affect sales in future periods as well as the current period.
2. Cost. Conducting market tests are expensive due to:
a. setting up the experiment, conducting the test, collecting and interpreting the data, results.
b. the costs of the experiment itself, e.g. lowering prices in one or more market may result in foregone revenue and raising advertising in one or more market is costly.
However, market tests are used frequently and generate valuable information for a firm.
UNCONTROLLED MARKET DATA
With advances in information technology and the Internet, firms have increasing direct access to "uncontrolled" market data, i.e. data not part of a controlled market study.
Examples:
1. Bar code scanners give a company like Meijers or Target (and manufacturers) lots of "uncontrolled" data about consumer behavior (in addition to improving inventory control) including:
2. Internet shopping also provides a source of uncontrolled market data, useful to retailers and product manufacturers.
3. Published economic data including:
a) the local, regional, state and national unemployment rate. How could this be used?
b) national income data, index of leading economic indicators, consumer buying intentions from U.S. Census Bureau, Consumer Confidence Index from UMAA which includes consumer buying plans for durable items, such as:
c) forecasts of economic variables from universities, WSJ, banks, research groups, individuals, government, etc., like the Index of Leading Economic Indicators.
REGRESSION ANALYSIS
A statistical technique that allows us to empirically estimate the quantitative relationship between a dependent variable (sales) and a set of independent variables (price, income, etc.). Process:
1. Collect time-series or cross-sectional data
2. Specify the model, Qd = f (P, Income, Advertising, etc).
3. Estimate the equation coefficients, which measure the mathematical relationships between the Xs and Y.
4. Evaluate the accuracy of the results.
OLS (Ordinary Least Squares) Regression Example
Airline has quarterly time-series data, shown on page 145, Table 4.2, on quarterly ticket prices and average number of coach seats sold per quarter, for 16 quarters. It would like to use this data to quantitatively measure the relationship between P and Q, i.e. specify the firm's _________________ .
Note that ticket prices ranged from $220 - $265 (MIN to MAX), and averaged $239.70. Q (seats) ranged from 33.6 to 137.5, and averaged 87.2. From these data, we observe that there was much greater dispersion (variation) in ticket sales than ticket prices. The variance (s2) precisely measures the dispersion of a distribution, see formula on p. 145. The standard deviation (s) is the square root of the variance, and is measure in the same units as the sample (tickets sold and ticket price).
Note that s = 27 for ticket sales and s = 12.7 for ticket prices, even though the mean of ticket prices is much higher. Plot the distributions:
We specify a simple, linear model for demand, Q = f (P):
Q = a + b P, where a = constant (intercept), and b is the slope of the line.
We use linear regression analysis (OLS) to estimate a and b, values that will produce a specific equation that best fits the data shown on p. 147, Figure 4.1. Let's start by just guessing values for a and b, and just make up a demand equation to start with:
Q = 330 – P, and the Inverse Demand would be: P = 330 - Q
Knowing a and b (330 and -1) for the Inverse Demand, we can plot the line shown on p. 147, Figure 4.1. It provides a good fit, but is it the best fit? We start by comparing Predicted (Q*) vs. Actual Ticket Sales, p. 148, Table 4.3. Having a demand equation, and knowing P, we can predict Q* and compare to actual Q. For example, at P = $250 (Quarter 1), we predict Q* = 330 - 250 = 80 and we compare the predicted value (Q* = 80) to the actual value of Q = 64.8, in the first quarter of Y1.
If we do this for every quarter, we have Table 4.3 on p. 148. (Q* - Q) is the estimation error (or residual). Note that half the errors (8) are positive and half (8) are negative, meaning that our demand equation over-predicts ticket sales 50% of the time and under-predicts tickets sales 50% of the time, and the errors just about cancel each other out (mean error = 3.1).
One way to assess the accuracy of the demand equation is to compute the SSE (sum of the squared errors) in the last column.
Logic: We want to minimize errors, and so we square the errors to:
a) make them all positive, and treat positive and negative errors equally, and
b) have large errors count more than small errors, e.g. errors -3 and +6, squared errors are 9 and 36. 6 is twice as big as 3, but the square of 6 is 4x as big.
SSE = 6,027 for the demand equation Q = 330 - P.
Least Squares Regression (OLS) selects the coefficients (a and b, intercept and slope) to MINIMIZE the SSE. In this case, the OLS coefficients are: a = 478.6 and b = -1.63, so the OLS equation is:
Q = 478.6 - 1.63P, and the SSE is 4847 (vs. 6027 before).
Point: There is no other combination of coefficients a (constant) and b (slope) that would result in a lower SSE than the OLS coefficients of +478.6 (constant) and -1.63 (slope), for a linear equation.
Interpretation: For every $1 increase in ticket prices, ticket sales will fall by one ticket.
A one variable (independent) regression is a simple regression. When more independent variables are used, it is called a multiple regression, see p. 150 and p. 151.
Qd = f (P, Pc, and Income). Using the time-series data on p. 151, Table 4.5, we can estimate the OLS equation:
Qd = 28.84 - 2.12 P + 1.03 Pc + 3.09 Y
INTERPRETING REGRESSION STATISTICS:
Economic interpretation of the estimated COEFFICIENTS:
General Interpretation: If X changes by ONE of its units, Y changes by the number of units estimated by the numerical coefficient, measured in Y's units.
For the equation: Q = 28.84 - 2.12 P + 1.03 Pc + 3.09 Y
P: If price goes up by $1, average seats sold will fall by 2.12 tickets per flight. (dQ / dP = -2.12)
Pc: If competitor's price goes up by $1, average seats sold will increase by 1.03 tickets per flight.
(dQ / dPc = 1.03)
Y: If income goes up by one point, sales will increase by 3.09 tickets. (dQ / dY = 3.09)
Note: SSE from the multiple regression is 2616 vs. 4847 from the simple regression, showing a significant improvement in the explanatory power of the equation.
IMPORTANT POINT: Airline used uncontrolled market data in a time-series format, and was able to estimate a multiple regression equation that accurately estimates a precise, quantitative relationship between Q (ticket sales) and three important independent variables. Also, OLS automatically imposes the ceteris paribus condition on the model, i.e. it isolates each variable and measures the independent effect of price on ticket sales, holding Pc and Income constant, etc.
R-SQUARED (Coefficient of Determination) is a measure of goodness of fit, how well the estimated equation fits the data. R2 can range from 0 to 1. When R2 = 1.0, there is a perfect fit, all observations fit exactly on the estimated regression line. If the equation has NO explanatory power, R2 = 0. We can think of R2 in terms of percentage. For example, the R2 = .78 for the airline demand equation, p. 153, indicating that the estimated equation explains 78% of the variation in ticket sales. Or we can say that the three variables (P, Pc and Y) account for 78% of the variation in ticket sales (and 22% of the variation in ticket sales is NOT accounted for by the three variables).
R2 is a simple, convenient and popularly reported regression statistic, but has limitations, e.g. it is sensitive to the number of explanatory variables in the equation. Adding any variable ALWAYS increases R2, even when it has NO explanatory power, so Maximizing R2 is not suggested.
ADJUSTED R2 overcomes the limitation of R2 by adjusting for the degrees of freedom (d.f. = N - k), where N = number of observations and k = number of estimated coefficients (including the constant term). In airline example, N = 16 and k = 4, so d.f. = 16 - 4 = 12. Adjusted R2 = .72. If we added another independent variable like Advertising, Price of train tickets, etc., we would use up a degree of freedom, since k = 5 and d.f. = 11. Suppose that the new variable had NO explanatory power - R2 would go UP, but Adjusted R2 would go DOWN. In contrast to R2, Adjusted R2 will only go UP if an added variable increased the explanatory power of the regression.
We are also very interested to assess whether the individual independent variables have explanatory power. We test individual variables by using a t-test, which uses the estimated coefficient and the estimated "standard error" of the coefficients. Regression analysis conducts individual tests of the "null hypothesis" that the explanatory variables have NO explanatory power, null mean "no effect." For example, before we run the regression, we have the model:
Qd = a + b1 P + b2 Pc + b3 Y
OLS will then estimate numerical values for the coefficients a, b1, b2, b3, and we find that:
a = 28.84
b1 = -2.12
b2 = 1.03
b3 = 3.09
The estimated coefficients have the expected signs, but we want to answer the question: Are the estimated coefficients statistically significant? That is, do they have any explanatory power? The null hypotheses are that b1 = 0, b2 = 0 and b3 = 0, i.e. that the independent variables (P, Pc and Y) have NO effect on Sales.
Is the estimated price coefficient -2.12 significant? The standard error of the estimated coefficient for Price (-2.12) has been calculated, and is .34, see page 153. The standard error is actually the estimated "standard deviation" of the coefficient. We know from statistics that about 95% of the values in a normal distribution fall within two standard deviations of the expected value. If the true value of the Price coefficient was actually 0, there would be a 95% chance that any estimated coefficient would fall between -.64 and +.64, which is + two standard errors (deviations) from 0.
The t-test uses the t-statistic, which is calculated as follows:
t-statistic = Estimated coefficient / Estimated Standard Error of the Coefficient,
t-statistic = -2.12 / .34 = -6.24
Interpretation: The estimated coefficient of -2.12 is more than 6 standard deviations away from 0, and we can confidently reject the null hypothesis of NO EFFECT, and say that the coefficient is negative and statistically significant.
Rule of thumb: Since 95% of values fall with about two standard errors of zero, we could accept the null hypothesis when we get values between -.68 and +.68. When -.68 < b1 < +.68, we accept the null hypothesis, and say that b1 is NOT statistically different from 0, and therefore has no explanatory power. When b1 > .68 or b1 < -.68, i.e. when it falls outside the "accept" range, then we reject the null and find the variable significant. When b1 = -.68 or +.68,
t-stat = -.68 / .34 = -2
t-stat = .68 / .34 = 2
Therefore, when t-stat > | 2 |, we reject the null and find the variable to be significant. t-stat > 2 means that the estimated coefficient is MORE THAN 2 standard errors away from 0. 95% of the values for estimated coefficients fall between and -.68 and .68, and have t-stats between -2 and +2. Therefore, when t-stats > 2, the estimated coefficient and the independent variable are significant at the 5% level of statistical significance.