cd "C:\Users\ssarma2\Documents\Books\Henry Glick\Textbook materials" clear /*Glick HA, Doshi JA, Sonnad SS, Polsky D. Economic Evaluation in Clinical Trials (2007), Chapter5*/ /*website: http://www.uphs.upenn.edu/dgimhsr/eeinct.htm*/ /*save the data file and program files for Chapter 5 from the above website*/ use rchapter5 /*variable description: desc*/ describe /*descriptive stats: sum*/ summarize /*distribution of cost data: graphs*/ histogram cost, frequency by(treat) /*Inspect the cost data*/ sum cost if treat==0,detail sum cost if treat==1,detail /*Alternative syntax*/ bysort treat: sum cost ,detail /*You can see 6 outliers with costs> $7000*/ count if cost>7000 /*Skewed cost distribution is not due to outliers alone*/ histogram cost if cost<7000, frequency by(treat) bysort treat: sum cost if cost<7000,detail /*Univariate analysis*/ /*Determine whether or not costs are normally distributed; use joint test of skewness andkurtosis: sktest*/ sktest cost if treat==0 sktest cost if treat==1 /*Determine whether or not the standard deviations of costs are similar across two groups*/ sdtest cost, by(treat) /*Perform t-test*/ ttest cost, by(treat) unequal /*Can use non-parametric tests if the assumption of normality is violated*/ ranksum cost, by(treat) ksmirnov cost, by(treat) /*Log Transformation: does not necessarily satisfy normality as in this example*/ gen lncost=ln(cost) histogram lncost, frequency by(treat) sktest lncost if treat==0 sktest lncost if treat==1 /*Can use bootstrapping methods*/ bootstrap, rep(4000): regress cost treat /*Run Univariate program (PROGRAM: RUNUNIVAR) written by Glick et al from the website*/ log using univariateoutput, text rununivar cost qaly treat log close /*Multivariable Analysis to adjust for other covariates of interest using OLS*/ regress cost treat dissev blcost blqaly race, robust glm cost treat dissev blc blq race,link(identity) family(gauss) /*OLS Results: Predicted cost difference is $21.99*/ /*Family specifies mean-variance relationship. Poisson, gamma, and inverse Gausian families relax the assumption of constant variance*/ /*Modified Park test is a useful starting point*/ /*Glick et al provide glmdiagnostic command*/ do glmdiagnostic glm cost treat dissev blc blq race,link(identity) family(gauss) glmdiag return list glm cost treat dissev blc blq race,link(identity) family(pois) glmdiag /*Results: Predicted cost difference is $113.115! */ /*log link with gamma family more commonly used*/ glm cost treat dissev blc blq race,link(log) family(gam) glmdiag /* The estimated coefficient from GLM, log link, gamma family is .0447 which does not equal incremental cost*/ /* Use recyled predictions for non-linear models: assume as if everyone in the control group and make a prediction; then assume as if they were in treatment group and make a prediction*/ gen temp=treat glm cost temp dissev blc blq race,link(log) family(gam) replace temp=0 predict glmgcontrol replace temp=1 predict glmgtreat sum glmg* /*predicted cost-difference: 3099.465-2964.034 = $135!*/ /*Conclusion: change in family leads to differences in point estimates (shouldn't happen in a correctly specified model)*/ /*We are not sure if the link function is correct: no tests available, can evaluate power links between 0 (log link)and 1 (identity link) as done here*/ glm cost treat dissev blc blq race,link(power 0.65) family(poi) glmdiag /*Can use bootstraping method: see the program and results for this website*/ /*Need more flexible GLM with richer set of link and family to avoid bias: estimate mean and variance power functions jointly */ /*Basu and Rathouz’s extended estimating equation approach (Biostatistics, 2005) and The STATA Journal 5(4). Install pglm from Basu's website at: http://faculty.washington.edu/basua/index.html */ /*PGLM program needs large # of observations for convergence*/