****************************** ** Application 2 ************* ****************************** ** Mar 26, 2021 ************** ****************************** ** change directory ** cd "E:\My Drive\Heal STATA Trainings\Training 2\data" ** import data ** import excel using Titanic.xlsx, first sh("data") clear ** question 1 - number of passengers ** count // this will count the total number of observations in the data set, but passengers might have missing values // so you could double check with the following command, and look at the number of observations sum PassengerId ** question 2 - how many were male ** count if Sex=="male" tab Sex ** question 3 - how many passengers were male and >25 ** tab Sex if Age>25 sum PassengerId if Sex == "male" & Age > 25 ** question 4 - average fare female passenger ** sum Fare if Sex=="female" ** question 5 - average for for queenstown OR southampton ** sum Fare if Embarked == "Q" | Embarked == "S" ** question 6 - create dummy variable ** gen female=0 replace female=1 if Sex=="female" //or you could do the following if you wanted to create one for male gen male=0 replace male=1 if Sex=="male" //because female is a binary variable, if you create that one, you do not need to create another binary for male ** question 7 group variables ** // first create three variables with a value of 0 gen class1=0 gen class2=0 gen class3=0 //then replace them to equal one if the condition is met replace class1=1 if Pclass==1 replace class2=1 if Pclass==2 replace class3=1 if Pclass==3 ** question 8 - crosstab check ** // whenever you create dummy variables, it is always a good idea to check tab Pclass class1 tab Pclass class2 tab Pclass class3 // the way that this is a check is that if Pclass=1 then class1 should be =1. ** question 9 - city variables ** gen city = "Queenstown" replace city = "Southhampton" if Embarked == "S" replace city = "Cherbourg" if Embarked == "C" ** question 10 - who survives ** //percentage of males who survived sum Survived if female==0 //percentat of female who survived sum Survived if female==1 //Now conduct the ttest, which will give you the same answers as the code above ttest Survived, by(female) ** question 11 - survival rate by age ** //first generate variable for old gen old=0 replace old=1 if Age>35 //check accuracy of variable tab Age old // notice that old equals one only after the individual is older than 40 // you will see that the youngest age is 0.42, is that an error? // not if the youngest was a baby that was only 5 months old (.42 of a year) ttest Survived, by(old)