I have a pandas data-frame that contains a column of sentences with pattern: row 1 of column : "ID is 123 or ID is 234 or ID is 345" row 2 of column : "ID is 123 or ID is 567 or ID is 876" row 3 of column : "ID is 567 or ID is 567 or ID is 298".
My aim is to extract the numbers in each row and save them in a list or numpy array. Since there is a pattern (the number always comes after "ID is", I thought that regex might be the best way to go for it (but I am not sure how to use regex for multiple extractions in 1 string.
Any advice?
Standard module
recan use'\d+'to get list
[123,234,345].To make sure you can also use
'ID is (\d+)'In
DataFrameyou can use.str.findall()to do the same for all rows.Result:
If you need only column
resultasnumpy arraythen you can getdf['result'].values.And if you need as nested list:
df['result'].values.tolist().