I am attempting to create a flag for when transaction IDs are sequential. For reasons that I will not get into here, these can be a red flag. The problem that I am having is that the IDs are not standardized. Some can be numeric only and others are alphanumeric. I want to identify cases where a vendor's transactions are sequential for X number (how many in a row TBD) of transactions. The data will be grouped by vendor and I'd like to extract all rows and columns for instances where a sequence is present. A much simplified example below:
df <- read.table(text=
"Vendor 'Transaction ID'
ACME 1
ACME 2
ACME 3
JDOE A1
JDOE A6
JDOE A10
XYZ B12
XYZ B13
XYZ B14", header=TRUE)
In this instance I would want to extract the Vendor and Transaction ID for all ACME and all XYZ rows since they are in a sequence. I've done some research but not found an approach that seems to fit what I'm trying to do.
One option is to convert the
Transaction_IDto a number usingparse_number, then you can create a grouping column for consecutive values within each group. Then, you canfilterdepending on how many consecutive numbers you want to have. Here, I use greater than 1, but you can update thefilterstatement depending on your desired criteria. Then, I ungroup and remove the extra columns -Transaction_ID2andgrp.Output