I need to match the payment of invoices in the DocN column of df1 with the data in the TXT column in df2. Print the document (DocN) + the amount (DocSum) and the details of the corresponding payment (DocP, Date) in accordance with the matching article in both datasets
import numpy as np
import re
data1 = {
"DocN": ['140111038-001', '7314560', '169233301-001','ЕКТ01886853','ЕКТ02126350','30262-19',
'27283-19','746'],
"DocSum": ['358,80', '1487,45', '7458,78','2478,12','9624,95','3247,32',
'3224,25','32587,22'],
"DocArt1" : ['85647', '85475', '21457', '12746', '25472', '58123', '74185', '82274']
}
df1 = pd.DataFrame(data1)
data2 = {
"TXT": ['payment by document 30262-19, 30317-19, 30329-19, 31270-19, 32038-19, 26713-19,26715-19, ЕКТ01886853 ',
'payment by document 26721-19, 26748-19, 29835-19, 31112-19, 26746-19, 30041-19, 23150-19, ',
'payment by document 23525-19, 25050-19, 26244-19, 27997-19, 28032-19,30278-19, ЕКТ01886853',
'payment by document 29227-19, 29713-19, 27283-19, 32003-19, 29235-19, 29888-19, 7314560',
'payment by document 175634096-001, 175634109-001, 175623281-001,175638863-001, 140111038-001, 7314560'],
"DocP": [112, 113, 114, 115, 116],
"Date": ["25.01.2022", "26.01.2022", "27.01.2022", "28.01.2022", "29.01.2022"],
"DocArt2" : ['12746','74585','25489','85475','85875']
}
df2 = pd.DataFrame(data2)
print(df1)
print(df2)
i'm trying to apply:
df1.join(df1.DocN.apply(lambda x: pd.Series(df2.loc[df2['TXT'].str.contains(fr'\b{x}\b')& (df1['DocArt1'] == df2['DocArt2']),['DocP','Date']].to_dict('list'))))
i'm expecting:
| index | DocN | DocSum | DocArt1 | DocP | Date |
|---|---|---|---|---|---|
| 0 | 140111038-001 | 358,80 | 85647 | 116 | 29.01.2022 |
| 1 | 7314560 | 1487,45 | 85475 | 115 | 28.01.2022 |
| 2 | 169233301-001 | 7458,78 | 21457 | ||
| 3 | ЕКТ01886853 | 2478,12 | 12746 | 112 | 25.01.2022 |
| 4 | ЕКТ02126350 | 9624,95 | 25472 | ||
| 5 | 30262-19 | 3247,32 | 58123 | 112 | 25.01.2022 |
| 6 | 27283-19 | 3224,25 | 74185 | 115 | 28.01.2022 |
| 7 | 746 | 32587,22 | 82274 |
First solution is with
Series.str.findallandDataFrame.explode, if need match byDocArtandDocNboth columns:If need match only by
DocNwith aggregation by sameDocN:because else get duplicated new rows: