Click here to Skip to main content
15,887,350 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
Question 2:
Suppose the raw data file Survey.txt (not available in Canvas) contains a recent household survey result. The filename reference Surref has already been referred to in SAS Studio.
The data file is hierarchical in structure. It consists of a header record for a household and is immediately followed by one record (row) for each household member, if applicable. For example, a household of three members will have three records exactly after its header record. A household may have no identified member at the time of the survey. In that case, the household will have only a header record in the file. There is no missing value in the
data file. The fields in each record are separated by commas. A subset of records in Survey.txt is displayed below:
The fields are arranged in the following order in each record:
  • Header record of a household:
  • Field Format
  • Household identity 10-15 characters, always begin with an alphabetical
    letter in uppercase, such as A, B, etc.
  • Type of housing Character ‘A’ for private, and ‘B’ for public
  • Household member’s record (if present):
  • Field Format
  • Date of birth 10-11 characters in the form dd/mon/yyyy where dd is
    the day value which may be in the form of 1 or 2 digits.
  • Householder indicator Character ‘Y’ for yes, and ‘N’ for no.
  • Gender Characters of ‘Female’, or ‘Male’.
    Marital status Characters of ‘Married’, ‘Single’, ‘Divorced’, or ‘Unknown’ .
  • Achieved education level Standard numeric with 0 for none, 1 for primary, 2 for
    secondary, 3 for tertiary.
  • Employment status Characters of ‘FT’, ‘PT’, ‘UE’, or ‘NA’ .
  • Monthly income Standard numeric.

Write a DATA step that will perform the following activities:
  • Create a SAS data set named Members. Members must be stored in the Work library of SAS Studio.
  • Read the records from Survey.txt into Members.
  • Create only one observation for each household in Survey.txt.
  • Each created observation in Members must contain only these variables but not necessarily in the given order:

    • The identity of the respective household
    • The type of housing of the respective household
    • The number of members in the household (0 for a household without members)
    • The number of male members(0 for a household without members)
    • The maximum achieved education level among all members in the household (0
      for a household without members)
    • The total monthly income of the household (0 for a household without members)
    • The number of members in the household is at least 18 years old as of 1 January 2023 (0 for a household without members) {Hint: Use function INTCK to compute:
      A1234567BC012,A
      15/FEB/1980,Y,Male,Married,3,FT,55000
      3/JUN/1982,N,Female,Married,3,UE,0
      24/JAN/2005,N,Male,Unknown,2,NA,0
      D135EG023456789,B
      19/OCT/1950,Y,Female,Divorced,0,PT,5000
      X123A567F9,A
      B2345234CC,A
      21/MAY/1975,N,Male,Married,2,FT,30000
      30/JUN/1978,Y,Female,Married,1,PT,10000
      the number of years between two dates. For example, INTCK('YEAR', '31Jan2010'd,
      '1Jan2012'd, 'c') returns a value of 1 because there is 1 full year between the two dates.}

The first few observations of Members are shown below for reference:
The labels displayed in the above picture are for illustration purposes only. You do not need to include them in your DATA step.
You must accomplish the above activities with only one DATA step. Using any other procedures is not allowed. The activities may be carried out in any order within the DATA step. You may name the variables in any way you want as long as they are valid SAS variable names and meaningful. The variables in the SAS data sets can be arranged in any order.

What I have tried:

data members;
infile '/home/u62270096/MS3251/Raw Data/Survey.txt' dlm=',';
length Household_ID $15;

input Household_ID Type_of_housing $;

NMembers = 0;
NMale = 0;
Max_Edu = 0;
Total_Income = 0;
N18above = 0;

if Type_of_housing ne ' ' then do;
input
DOB date10.
HH_Indicator $
Gender $
Marital_Status $
Edu_Level 2.
Employment_Status $
Income 8.;

NMembers + (HH_Indicator='Y');
  NMale + (Gender='Male');
  Max_Edu = max(Max_Edu,Edu_Level);    
  Total_Income + Income;
  
  N18above + (INTCK('YEAR',today(),DOB)>=18);
  
end;

if Household_ID ne '' then output;
keep Household_ID Type_of_housing NMembers NMale Max_Edu Total_Income N18above;

run;
Posted
Updated 5-Nov-23 0:13am
v2
Comments
Richard MacCutchan 30-Oct-23 7:26am    
OK, so do you have an actual question? If so then please use the Improve question link above, and add complete details of what is not working.
Dave Kreskowiak 30-Oct-23 10:11am    
Did you have a question or a problem you forgot to describe?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900