Stata: Mark missing cases with -mark- and -markout-, or with e(sample)

On one hand, this is a confession that I didn’t know about -mark- and -markout- after years of working with Stata. On the other hand, none of the smart folks I’ve worked with, who ostensibly know more Stata than I do, ever mentioned it. That’s sad: they’re really useful.

Let’s say you want to compare two models with different right-hand variables, and those different RH vars have different amounts of missingness. Assuming that you’re OK with handling missingness by listwise deletion, you’ll get results that can’t (easily) be compared, since the two samples vary by the difference in missingness. That’s bad! -lrtest- should complain, but Stata may honor other kinds of comparisons and lead you toward faulty inferences. Also bad!

You could do as I did, which was to

gen notmissing=.
replace notmissing=1 if var1~=. &var2~=. & ... & varetc~=.

And then

reg outcome var1 var2 varetc if notmissing==1

It’s that -replace- step that’s messy, and it turns out unnecessarily so. -mark- and -markout- make it pretty easy.:

local rhvars rh_1 rh_2 rh_etc
local rh_mod1 rh_1 rh_etc
local rh_mod1 rh_2 rh_etc
mark nonmissing
markout nonmissing `rhvars'
reg outcome `rh_mod1' if nomiss==1
reg outcome `rh_mod2' if nomiss==1

You can alternately use e(sample) after running a model comprising the superset of the variables in your individual models.This is handy when you want to think about models on subsets, too:

local rhvars rh_1 rh_2 rh_etc
local rh_mod1 rh_1 rh_etc
local rh_mod1 rh_2 rh_etc

reg outcome `rhvars'
estimates store mfull
gen infull=e(sample)

reg outcome `rh_mod1' if infull==1
estimates store m1

reg outcome `rh_mod2' if infull==1
estimates store m2

cf http://www.stata.com/help.cgi?mark

MYSTERY ROOM

Dear OSU College of Dentistry:

I thought you folks were pretty straight-laced and boring. Please forgive me for misjudging.

Open a bunch of do-files for editing

I’m getting better about keeping a consistent directory structure across projects, and that means I can assign some directory for the project (say, “thisproject”):

global basedir /Users/odden2/projects/thisproject

and know that my logs are always in `logdir’, which is
global logdir $basedir/logs
None of this is very exciting, but it saves hassle. Still, I often start projects with a bunch of do-files that do small things and call each other, and it’s nice to have them all open for editing instead of fishing around ad hoc. What’s more, the do-file editor doesn’t seem to get bogged down even with ten files open at once.

Let’s say I put all my do-files in $basedir/do:

/* file editall.do */
local dodir /Users/odden2/projects/thisproject/do
cd `dodir'
local dofiles: dir . files "*.do"
foreach file of local dofiles {
doedit `file'
}

I can’t sell this on glamour, but it’s One Command to Open Them All. Handy enough. I rarely want to open all my do-files when I’m running a project, so that first line isn’t referencing a global $basedir. That’s just a little less handy than I wish it were, as I’d like to not hard-code a path. Stata doesn’t -cd- to the directory in which the file is located when I open it from Finder/Explorer, etc.

split a piece of wood and you will find resistance there

“There are now many historians who study popular culture, lowbrow entertainment, and the people of the streets, but I am always dismayed to find that they treat every saloon, high-heel shoe, or rock song as something else. If they are sympathetic to the people who consumed them, such things are remade into ‘resistance’ against oppression or ‘collective alternatives’ to capitalist individualism. God forbid they could be simply and only ‘fun.’”
– Thaddeus Russell, A Renegade History of the United States

D=R*T

We’re driving south on I71. 70mph. A car is parked at the side of the road… Driver looks female, head tilted in a way that might be benign, maybe asleep, maybe injured. Maybe just fine? Only one second to examine before we’re too far away to easily stop or do anything. Cars behind us press on, and we press on behind and drivers that may have looked and wondered a second before.

Small white house, probably early 1900s, looking neglected and all windows missing, with some outbuildings in a similar state of decay. What happened? Are the folks who live there, or live there still, OK? We’ll pass within a few hundred feet of hundreds of thousands of people on this drive. What kind of ethics can we construct from so much contact, and so much potential contact, and at such speed?

“No Name”

We gave our daughter a first name, but didn’t declare a middle name. When she was five or so, we told her she’d get to pick her middle name when she turned thirteen – a rite of passage. She played around with names for awhile, and a year or so settled on Naomi. We wondered why, and her explanation was totally unexpected:

Instead of just leaving the middle initial or middle name field blank on forms and in databases, schools and hospitals inserted a value: No Name. However, even ‘No Name’ was often mangled either as Noname or No Na or Nme. At best, it would just be an imposed middle initial N. Naomi is a recognizable name that’s the closest match. We turned our failure to come up with a middle name for our daughter into a rite of passage and way for her to further set her own identity, and bureaucracy transformed that rite of passage.

I won’t go so far as to say that her choice was made for her, nor that she didn’t ultimately make the decision. Who knows what she might have chosen, though, had there been a blank rather than some obligatory “No Name” inserted just because we didn’t pick a middle name for our kid.

1998 Town Meeting on Iraq, Ohio State University

1998 Town Meeting on Iraq, Ohio State University

OK, so that image doesn’t directly reference the town meeting, but as we rattle sabres to bomb/invade/whatever Iran I’m reminded how about 14 years ago thousands of us watched a not-staged-carefully-enough pep rally for bombing the hell out of Iraqi civilians because our relationship with Saddam had gone south some years back.

H and I seated ourselves away from the largest block of protesters, who were roundly criticized for being uncivil. Their purpose, however, was to overwhelm any opportunity for a cheers-filled soundbite. That required being obnoxious since, hell, we were in a sports arena. The Nuremberg rallies couldn’t have happened in a boardroom or in a meadow. It takes a place whose space is configured to evoke mass response — where a few voices are amplified to booming, and a face firm with conviction and fist firm with zeal is projected large — to get the needed reaction. It’s a precarious situation, though; signs of weakness are amplified, too, and as Clinton’s National Security Team repeatedly responded to our questions with platitudes and evasion, the crowd shifted, and those around us who had cheered began to boo and jeer and some who were silent spoke up. That’s an awfully romantic characterization, I know, but H and I mostly watched, were far more audience than participant.

Rick Theis, a former OSU student body president grabbed the microphone at one point, complaining that his request to ask a question had been declined, and demanding to be allowed to speak. He ranted for maybe a minute (which probably means 20 seconds of secular time) and was dragged off. Strike One for resistance.

If there was a decisive turn, and it’s facile to identify one in hindsight, it was when Jon Strange, a substitute teacher, asked why the US applies its foreign policy so unevenly and, when Madeline Albright gave yet another content-free response, emphasized that she had not answered his question. A cacophany of gasps, laughter and general “oh no he didunt!” made the air crackle — now the event was getting interesting for those looking for a reason to be engaged.

How did Jon get onto the floor of an event with pre-screened questions and a ‘VIP’ section packed with veterans in uniform and smiling white faces with great teeth? He was wearing a shirt and tie in the “protest section.” A CNN representative approached and asked if the bank of protesters would kindly shut the hell up if they’d let one of them ask one question, and Jon was the cleanest-looking of the bunch. I think even in 1998 we knew better than that. If we had been in power and wanted to keep it, we would have picked the woman (a good friend, by the way) with dreadlocks and a flowing skirt in earthtones and very likely big earrings who had smuggled in the NO WAR banner. Pick the person with whom the median is least likely to identify, and put her/him in opposition to authority. That’s how you tell people what side they’re on. Instead, Jon, a good-looking, straight-looking guy with a clean outfit and a haircut you could set your watch to, was allowed to show that clean-looking, straight-looking, nice-haircut folks also think you’re full of shit and are seriously not fooled by a damn word you lying liars are saying. He was calm. No fist-shaking. Stood up straight, body language said I respect the system even if I don’t respect your position. All the things that make for effective protest if you’re going to be public about it.

As we walked out, we thought “they’ll never come back to Columbus.” So far…

Stata Intro Slides

Since 2007 or so I’ve given incoming graduate students in my department a two-hour introduction to Stata. Every year the class slides get a little fancier and a little longer. Here’s the 2011 version.

Not News, Exactly: How Bush ‘won’ Ohio in 2004

Read it at Mark Crispin Miller’s site, and have a look at the zipped Plaintiff’s Brief.

Overlay and Combine Stata graphs with loops and -graph-’s || operator

I was trying to get the following:

  • Line graphs of median birth interval for pairs of countries, one line per country with a legend indicating country name,
  • Four of these graphs combined (so, four pairs comprising eight countries, one combined graph)

Easy, right?

Full disclosure: a lot of the cleverness here comes from my adviser, who had somewhat similar code from another project.

First, given data comprising variables country, med_2nd, med_3rd, med_all, rate, year, evaluating the relationship between a pair of variables for 2…n subsets of observations (in this case, a pair of countries) requires either using -reshape wide- to create country-variables, or you can use the handy || operator to overlay graphs, eg

scatter var1 var2 if var3=="this" || scatter var1 var2 if var3=="that"

This is great, but I wanted to specify a lot of options, making it painful to stay on one line. No matter, I’d just use #delimit:

#delimit ;

to make a semicolon the command delimiter; that way I could use as many lines as I needed without having to specify /// after every line. However, #delimit and || didn’t play nicely together. As far as I can tell, I’m stuck with the default newline delimiter and ///.

How to iterate through a list in pairs? We can -tokenize- the local macro containing the list of countries, then loop through with -while-:

tokenize `C' // `C' is my country list
while "`1'" ~= ""  // as long as the current node isn't empty, keep going

How to proceed by twos?

macro shift 2

All fine and good, but I’ve got eight countries and they’re paired, so I’m making four graphs. How to name them sequentially?

*Outside, the loop, initialize a counter:
local cstep = 1
while "`1'" ~= "" {
   ** stuff happens inside here, including saving my graph with name "something_`cstep'"
   local cstep = `cstep' + 1  // this increments my counter
}

I’ve got three variables of interest, for each of which I want to compare to some other variable. I also want to do this for my four pairs of countries. I’ll do this with a nested loop, storing my variable names in a local macro, using -foreach- to iterate over them. It’ll be handy for me to address my saved graphs with a counter when I’m using -graph combine-, though, so I want to save with something numeric rather than using my variable name. So, I’ll create a variable ‘vstep’ and save my graphs named for the ‘index’ of that variable plus my ‘country-step’ counter:

local cstep = 1
local vars var1 var2 var3
while "`1'" ~= "" {  // outside loop is country-pair, but it doesn't have to be
	local vstep = 1
	foreach v of local vars { // inside loop iterates over my reference variables
		scatter `v' othervar if country=="`1'"  ///
		|| ///
		scatter `v' othervar if country=="`2'"  ///
		, saving("`v'_`cstep", replace)
		local vstep = `vstep' + 1
	}
}

So far so good. With eight countries across three variables, I’ve got twenty-four comparisons yielding twelve graphs. I’ll add the -name- option to my graph command, giving it the same name as in my -saving- option, so the graphs are stored in memory, too. The default legend only indicates the variable name, which isn’t very helpful — I need to indicate the country for the distinction to be relevant. The following amendment to -scatter- adds a legend inside the plot area (ring(0)) and makes the legend components my comparison countries `1′ and `2′:

legend(label(1 `1') label(2 `2') ring(0))

Now I need to combine graphs. I can use one loop to do this, counting across my three variables (sure, I could have stored my variable names in a local macro and used -foreach- rather than using -forvalues-). The numeric values for each pairing (1…4) are hardcoded, and it’s possible/likely there’s a more clever way to handle that. I want to make a subtitle indicating which comparison variable (in this case, median birth interval) is present in the graph, so I’ve got a few -if- statements):

forvalues v = 1/3 {    // from values of `num': 2nd, 3rd, all, respectively
	if `v'==1 {
		local n "Second Birth Interval"
	}
	if `v'==2 {
		local n "Third Birth Interval"
	}
	if `v'==3 {
		local n "All Birth Intervals"
	}
	graph combine `v'_1 `v'_2 `v'_3 `v'_4,    ///
	iscale(*1.2)    ///
	imargin(l=-3)    ///
	ycommon    ///
	title("Comparing, uh, stuff I'm interested in...")    ///
	subtitle("`n'", margin(zero) size(medlarge))    ///
	name(combined_`v', replace)    ///
	saving(combined_`v', replace)
	window manage close graph
}

Here’s the full code:

local C    ///
	"Colombia Peru Ghana Kenya Bangladesh Egypt Indonesia Philippines"
local vars med_2nd med_3rd med_all	//these are 2nd interval, 3rd, all intervals
local cstep = 1		// cstep is a 'file counter', one file per country-pair

tokenize `C'

while "`1'" ~= ""  {
	local vstep = 1
	foreach v of local vars{
		scatter  rate `v'            if country=="`1'",    ///
		||    ///
		scatter  rate `v'            if country=="`2'",    ///
		legend(label(1 `1') label(2 `2') ring(0))    ///
		name(lenvstfr_`cstep'_`vstep', replace)
		window manage close graph
		local vstep = `vstep' + 1
	} // end of vars loop
	local cstep = `cstep' + 1
	mac shift 2	// move to the next pair -- `C' has to be ordered by pairs for this to work
} //end of country-pair loop
*

/* now, combine each of the graphs from 'var' */

forvalues v = 1/3 {    // from values of `num': 2nd, 3rd, all, respectively
	if `v'==1 {
		local n "Second Birth Interval"
	}
	if `v'==2 {
		local n "Third Birth Interval"
	}
	if `v'==3 {
		local n "All Birth Intervals"
	}
	graph combine lenvstfr_1_`v' lenvstfr_2_`v' lenvstfr_3_`v' lenvstfr_4_`v',    ///
	iscale(*1.2)    ///
	imargin(l=-3)    ///
	ycommon    ///
	title("Title Here")    ///
	subtitle("`n'", margin(zero) size(medlarge))    ///
	name(combined_`v', replace)    ///
	saving(combined_`v', replace)
	window manage close graph
}