Reply to “Reply to Whitehead” by Desvousges, Mathews and Train: (4) My treatment of the weighted WTP is biased in favor of the DMT (2015) result/conclusion
DMT (2020) draw attention to my treatment of the weighted WTP estimates. The regression model for the second scenario has a negative sign for the constant and a positive sign for the slope. When I “mechanically” calculate WTP for the second scenario it is a positive number which adds weight to the sum of the WTP parts. This is in contrast to the unweighted data for which WTP is negative. Inclusion of the data from this scenario biases the adding-up tests in favor of the conclusion that the WTP data does not pass the adding-up test.
The motivation for my consideration of the weighted data was DMT’s (2015) claim that they found similar results with the weighted data. My analysis uncovered validity problems with two of the five scenarios which, when included in a adding-up test, led to a failure to reject adding-up. At this point in the conversation it will be instructive to visually examine the weighted data to see if it even passes the “laugh” test. In my opinion, it doesn’t.
Below are the weighted votes and theTurnbull for the whole scenario (note that the weights are scaled to equal to sub-sample sizes). The dots and dotted lines represent the raw data. Instead of a downward slope, these data are “roller-coaster” shaped (two scary hills with a smooth ride home). The linear probability model (with weighted data) has a constant equal to 0.54 (t=9.73) and a slope equal to -0.00017 (t=-0.69). This suggests to me that the whole scenario data, once weighted, lacks validity. While lacking validity, the solid line Turnbull illustrates how a researcher can obtain a WTP estimate with data that does not conform to rational choice theory. The Turnbull smooths the data over the invalid stretches of the bid curve (the “non-monoticities” using the CVM jargon) and the WTP estimate is the area of the rectangles. In this case WTP = $191 which is very close to the unweighted Turnbull estimate. But, a researcher should consider this estimate questionable since the underlying data does not conform to theory. As a reminder, the WTP for the whole scenario is key to the adding up test as it is compared to the sum of the parts. The WTP estimate from linear logit model is $239 with the Delta Method [-252, 731] and Krinsky-Robb [-8938, 9615] confidence intervals. Given the statistical uncertainty of the WTP estimate, it is impossible to conduct any sort of hypothesis test with these data.
Below are the weighted votes and the (pooled) Turnbull for the second scenario. The dots and dotted lines represent the raw data. Instead of a downward slope, these data are “Nike swoosh” shaped. The linear probability model (with weighted data) has a constant equal to 0.13 (t=2.46) and a slope equal to 0.00107 (t=4.19). This suggests to me that the second scenario data, once weighted, lacks validity. Again, the Turnbull estimator masks the weakness of the underlying data. In this case, the Turnbull is essentially a single rectangle. With pooling the probability of a vote in favor is equal to 28.06% for the lower bid amounts. With pooling the probability is 27.56% for the higher bids. The Turnbull WTP estimate is $112 which appears to be a reasonable number, hiding the problems with the underlying data.
DMT reestimated the full data model with the cost coefficients constrained to be equal. In a utility difference model the cost coefficient is the estimate for the marginal utility of income. There is no reason for marginal utility of income to vary across treatments unless the clean-up scenarios and income are substitutes or complements. This theoretical understanding does not explain why the weighted models for the whole and second scenarios are not internally valid (i.e., the cost coefficient is not negative and statistically different from zero). The model that DMT refer to passes a statistical test, i.e., the model that constains the cost coefficient to be equal is not worse statistically than an unconstrained model, but it should be considered inappropriate due to the lack of validity in the weighted whole and second scenario data sets. Use of the model with a constrained cost coefficient amounts to hiding a poor result. The reason that the weighted model with the full data set takes the correct sign is because the scenarios with correct signs outweigh the scenarios with incorrect or statistically insignificant signs. The reader should attach little import to DMT’s (2015) claim that their result is robust to the use of sample weights.