I have two variables in Stata which are string (MAIDCW1 and strboth).
I would like to check for each observation whether the value of MAIDCW1 exists somewhere in the whole of strboth. If this value exists in the variable, I would like to set the variable TreatmentFinal to 1, otherwise to 0.
I tried the following code variants but I always receive the error code "invalid syntax":
**Variant 1:**
gen TreatmentFinal = 0
forvalues i = 1 / _N {
quietly count if strboth == MAIDCW1
if r(N) > 0 {
replace TreatmentFinal = 1 in `i'
} else {
replace TreatmentFinal = 0 in `i'
}
}
**Variant 2:**
gen TreatmentFinal = 0
forvalues i = 1 / _N {
qui gen match = cond(index(strboth, MAIDCW1[`i']) > 0, 1, 0)
replace TreatmentFinal = match in `i'
drop match
}
I also tried to first format strboth and MAIDCW1 as strings, but this does not change the result and I still receive the error code "invalid syntax".
Addendum:
1.) I use an Excel dataset and randomize a market ID to a treatment (1) & control group (0). So, the variable Treatment is either 1 or 0.
2.) I merge the market ID (e.g., 046914YBB) and Treatment which becomes strboth (e.g., 046914YBB1).
3.) Now I look at a different column which has several entries of the same market ID (for different calender weeks) and add "1" to all of them in order to compare it to strboth, which would indicate whether it belongs to the treatment group (as the market ID entry in strboth also has a "1" at the end) or to the control group (if the entry has a "0" at the end). This variable then is MAIDCW1. Entries with the same marked IDs should always be in the same group (either treatment or control).
4.) Then, for all entries of MAIDCW1 I check whether the value also exists in strboth in order to define for all market IDs of the different calendar weeks whether they belong to the treatment or control group (TreamtentFinal). -> And here the problem of Nick's code appears: The same entries of market IDs for different calendar weeks (entries in MAIDCW1) receive different values in TreatmentFinal (0 or 1) even though they all should have the same value.
Here is the complete code including Nick’s suggestion:
clear
import excel "Example.xlsx", sheet("Example") firstrow
set seed 123
bysort MarktName: gen double rand1=rnormal() if _n==1
egen max1=max(rand1), by(MarktName)
replace rand1=max1
bysort AbteilungName: gen double rand2=rnormal() if _n==1
egen max2=max(rand2), by(AbteilungName)
replace rand2=max2
sort rand1 rand2
// Randomization pattern
gen Randomisierung =
"111000011100001110000111100011110001111000011100001110000111100011"
// Assignment to treatment (1) or control group (0)
gen Treatment = substr(Randomisierung, _n, 1)
drop Randomisierung
//Here the relevant part begins:
gen TreatmentFinal = 0
gen strboth = MAID + Treatment
gen long obsno = _n
egen long group = group(strboth)
su group, meanonly
local G = r(max)
quietly forval g = 1/`G' {
su obsno if group == `g', meanonly
count if strboth[r(min)] == MAIDCW1
if r(N) > 0 replace TreatmentFinal = 1 if group == `g'
}
This might work. I doubt it's very good Stata style, and someone else may think of a better approach, but let's get you closer to something that works. Some people would see this as a problem for
merge.What we should do is loop over the distinct values of
strbothand NOT repeat comparisons for each observation with the same value.EDIT: Note the emphasis here. If 42 occurs repeatedly in one variable, then checking whether it occurs in the other variable need only be done once. You should read the help for
group()inhelp egen: it maps the distinct values of its argument(s) to integers 1 up. So distinct stringsfrog newt toadwould be mapped to1 2 3.EDIT: Don't be misled by the option name
meanonly.summarizeis used here to get the minimum as a safe way to look up the value of one variable indexed by each distinct value ofgroup.Mistakes in your code:
forvalueswon't evaluate_Non the fly.Looping over observations doesn't itself imply focus on a particular observation each time around the loop.
The display format of strings is not relevant to this problem.
Other points:
There is no need to
replace0 with 0.is equivalent to