stata - How to overwrite a duplicate observation -
i conducted phone survey , here prototype of dataset:
var1 var2 6666 1 6666 2 7676 2 7676 1 8876 1 8876 2 89898 1 89898 2 9999 1 9999 2 5656 1 5656 2 2323 1 2323 2 9876 1 7654 1
var1
unique identifier each case in survey (in case, phone numbers).
var2
outcome of survey: 1 (successful), 2 (not successful).
i want keep observations each var1
var2 == 1
, yet retaining observations each var1
whosevar2 == 2
if there no case var2 == 1
.
i have tried
duplicates drop var1 if var2 == 2, force
but not getting desired output
the question wrongly titled: don't want overwrite anything.
your syntax doesn't work wish because not want. asking whether there duplicates of var1 if var2 == 2
, command pays no attention whatsoever observations var2 == 1
.
your example includes no observations var2 == 2
there no corresponding observation var2 == 1
. have added 1 such.
here's 1 way of meeting goal. show in passing duplicates
command have nothing example; nor expected anything.
. clear . input var1 var2 var1 var2 1. 6666 1 2. 6666 2 3. 7676 2 4. 7676 1 5. 8876 1 6. 8876 2 7. 89898 1 8. 89898 2 9. 9999 1 10. 9999 2 11. 5656 1 12. 5656 2 13. 2323 1 14. 2323 2 15. 9876 1 16. 7654 1 17. 42 2 18. end . duplicates list var1 if var2 == 2 duplicates in terms of var1 (0 observations duplicates) . bysort var1 (var2) : assert _n == 1 | _n == 2 . var1 : drop if _n == 2 & var2[2] == 2 (7 observations deleted) . list, sepby(var1) +--------------+ | var1 var2 | |--------------| 1. | 42 2 | |--------------| 2. | 2323 1 | |--------------| 3. | 5656 1 | |--------------| 4. | 6666 1 | |--------------| 5. | 7654 1 | |--------------| 6. | 7676 1 | |--------------| 7. | 8876 1 | |--------------| 8. | 9876 1 | |--------------| 9. | 9999 1 | |--------------| 10. | 89898 1 | +--------------+
another way
. bysort var1 (var2) : keep if _n == 1 & var2[2] == 2
in fact
. bysort var1 (var2): keep if _n == 1
keeps observations var2 == 1
if there , otherwise keep singletons var2 == 2
.
the hidden assumptions seem include @ 2 observations each distinct var1
. note use of assert
checking assumptions dataset.
Comments
Post a Comment