How to create 2 new columns with appropriate prefix based on values in columns with same prefix in SAS Enterprise Guide / PROC SQL?

210 views Asked by At

I have table in SAS Enterprise Guide like below:

ID   | COUNT_COL_A | COUNT_COL_B | SUM_COL_A | SUM_COL_B
-----|-------------|-------------|-----------|------------
111  | 10          | 10          | 320       | 120
222  | 15          | 80          | 500       | 500
333  | 1           | 5           | 110       | 350
444  | 20          | 5           | 670       | 0

Requirements:

  • I need to create new column "TOP_COUNT" where will be name of column (COUNT_COL_A or COUNT_COL_B) with the highest value per each ID,

    • if some ID has same values in both "COUNT_" columns take to "TOP_COUNT" column name which has higher value in its counterpart with prefix SUM_ (SUM_COL_A or SUM_COL_B)
  • I need to create new column "TOP_SUM" where will be name of column (SUM_COL_A or SUM_COL_B) with the highest value per each ID,

    • if some ID has same values in both "SUM_" columns take to "TOP_SUM" column name which has higher value in its counterpart with prefix COUNT_ (COUNT_COL_A or COUNT_COL_B)

It is not possible to have only 0 in columns with prefix _COUNT or only 0 in columns with prefix _SUM

There is not null in table

Desire output:

ID   | COUNT_COL_A | COUNT_COL_B | SUM_COL_A | SUM_COL_B  | TOP_COUNT   | TOP_SUM
-----|-------------|-------------|-----------|------------|-------------|---------
111  | 10          | 10          | 320       | 120        | COUNT_COL_A | SUM_COL_A 
222  | 15          | 80          | 500       | 500        | COUNT_COL_B | SUM_COL_B  
333  | 1           | 5           | 110       | 350        | COUNT_COL_B | SUM_COL_B  
444  | 20          | 5           | 670       | 0          | COUNT_COL_A | SUM_COL_A 

How can i do that in SAS Enterprise Guide or in PROC SQL ?

2

There are 2 answers

0
Tom On

Just do the same thing as your other question. But because you want to transpose two sets of variable it is probably going to be easier to a data step and arrays to do the first transform.

data tall;
  set have;
  array counts count_col_a count_col_b;
  array sums sum_col_a sum_col_b;
  do index=1 to dim(sums);
    length type $5 name $32 ;
    type='COUNT';
    name=vname(counts[index]);
    value1=counts[index];
    value2=sums[index];
    output;
    type='SUM';
    name=vname(sums[index]);
    value1=sums[index];
    value2=counts[index];
    output;
  end;
run;

Now sort and take the last per ID/TYPE combination to find the largest.

proc sort;
  by id type value1 value2 name;
run;

data top;
  set tall;
  by id type value1 value2;
  if last.type;
run;

And then transpose and re-merge.

proc transpose data=top out=want(drop=_name_) prefix=TOP_;
  by id;
  id type;
  var name;
run;


data want;
  merge have want;
  by id;
run;

Result:

              COUNT_    COUNT_     SUM_     SUM_
Obs     ID     COL_A     COL_B    COL_A    COL_B     TOP_COUNT      TOP_SUM

 1     111      10        10       320      120     COUNT_COL_A    SUM_COL_A
 2     222      15        80       500      500     COUNT_COL_B    SUM_COL_B
 3     333       1         5       110      350     COUNT_COL_B    SUM_COL_B
 4     444      20         5       670        0     COUNT_COL_A    SUM_COL_A
0
Reeza On

Use an array with loops methodology:

  • Declare an array of the count variables
  • Set the maximum value to 0
    • Loop through the array
  • Check if each value is more than current maximum
  • If yes, assign value to current maximum value and store name
    • If no, keep looping

Non looping, function methodology:

  • Use MAX to find the maximum value of the array
  • Use WHICHN() to find the location of the array
  • Use VNAME to get the variable name based on the location
*for count - you can extend for max;
data want;
    set have;

   array _count(*) count_col_:;

    *looping methodology;
    top_count_value=0;
    do i=1 to _count;
       if _count(i) > top_count_value then do;
          top_count = vname(_count(i));
          top_count_value = _count(i);
        end;
      end;

   /*or function methodology*/
   top_count_max = max(of _count(*));
   index_top_count = whichn(top_count_max, of _count(*));
   top_count_name_2 = vname(_count(index_top_count);

run;