how to difference values within several panels

58 views Asked by At

Suppose the data is:

data <- structure(list(country = c("Poland", "Poland", "Poland", "Poland", 
                             "Poland", "Poland", "Portugal", "Portugal", "Portugal", "Portugal", 
                             "Portugal", "Portugal", "Spain", "Spain", "Spain", "Spain", "Spain", 
                             "Spain"), Code = c("POL", "POL", "POL", "POL", "POL", "POL", 
                                                "PRT", "PRT", "PRT", "PRT", "PRT", "PRT", "ESP", "ESP", "ESP", 
                                                "ESP", "ESP", "ESP"), year = c(1950, 1951, 1952, 1953, 1954, 
                                                                               1955, 1950, 1951, 1952, 1953, 1954, 1955, 1950, 1951, 1952, 1953, 
                                                                               1954, 1955), IV = c(3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 
                                                                                                          1, 1, 1, 1, 1, 1)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
                                                                                                                                            8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L), class = "data.frame")

How to difference IV values within each panel (country)? In other words, I want to substract IV values for the year of 1951 from 1950; 1952-1951; 1953-1952; 1954-1953; 1955-1954. And so for every country. In the outcome dataset (let's name it 'newdata'), IV value for each year has to show its difference with an IV value from a previous year; 1950 year should be just empty.

Any suggestions, guys? Hope my question is not confusing.

2

There are 2 answers

1
jay.sf On BEST ANSWER

Using diff in ave perhaps.

> transform(data, dif=ave(IV, Code, FUN=\(x) c(NA, diff(x))))
    country Code year IV dif
1    Poland  POL 1950  3  NA
2    Poland  POL 1951  3   0
3    Poland  POL 1952  1  -2
4    Poland  POL 1953  3   2
5    Poland  POL 1954  3   0
6    Poland  POL 1955  1  -2
7  Portugal  PRT 1950  1  NA
8  Portugal  PRT 1951  1   0
9  Portugal  PRT 1952  1   0
10 Portugal  PRT 1953  1   0
11 Portugal  PRT 1954  1   0
12 Portugal  PRT 1955  1   0
13    Spain  ESP 1950  1  NA
14    Spain  ESP 1951  1   0
15    Spain  ESP 1952  3   2
16    Spain  ESP 1953  1  -2
17    Spain  ESP 1954  3   2
18    Spain  ESP 1955  1  -2

To eliminate 1950 thereafter, pipe in a subset.

> transform(data, dif=ave(IV, Code, FUN=\(x) c(NA, diff(x)))) |> 
+   subset(year != 1950)
    country Code year IV dif
2    Poland  POL 1951  3   0
3    Poland  POL 1952  1  -2
4    Poland  POL 1953  3   2
5    Poland  POL 1954  3   0
6    Poland  POL 1955  1  -2
8  Portugal  PRT 1951  1   0
9  Portugal  PRT 1952  1   0
10 Portugal  PRT 1953  1   0
11 Portugal  PRT 1954  1   0
12 Portugal  PRT 1955  1   0
14    Spain  ESP 1951  1   0
15    Spain  ESP 1952  3   2
16    Spain  ESP 1953  1  -2
17    Spain  ESP 1954  3   2
18    Spain  ESP 1955  1  -2

Data:

set.seed(42)
data$IV <- sample(data$IV)  ## to better demonstrate the result
0
asd-tm On

Though I think diff() and IV - lag(IV) will give you the expected result in the specific case it might be risky if there are missing values for certain years for some countries and or if there is any wrong order of years in the country subset of the data frame. So I offer a slower but more reliable tidyverse solution:

data %>% 
  pivot_wider(names_from = year, values_from = IV) %>% 
  mutate(`1950_1951` = `1951`-`1950`,
         `1951_1952` = `1952`-`1951`,
         `1952_1953` = `1953`-`1952`,
         `1953_1954` = `1954`-`1953`,
         `1954_1955` = `1955`-`1954`
         )