i am trying to write one function which will create a new column called last_created_date, and appened the value in that column as per condition.
so this is the data.
| rounded_geo_lat | rounded_geo_lng | created_date | distance_1_lat_lng | distance_2_lat_lng | distance_3_lat_lng | distance_4_lat_lng | distance_5_lat_lng | last_created_date |
|---|---|---|---|---|---|---|---|---|
| 26.11 | 74.26 | 16-01-2024 11:29 | NaN | NaN | NaN | NaN | NaN | null |
| 25.77 | 73.66 | 16-01-2024 12:29 | 70.91359357 | NaN | NaN | NaN | NaN | 16-01-2024 13:29 |
| 25.23 | 73.23 | 16-01-2024 13:29 | 142.2333872 | 0 | NaN | NaN | NaN | 16-01-2024 15:20 |
| 24.67 | 72.94 | 16-01-2024 14:29 | 207.8935555 | 142.1494871 | 0 | NaN | NaN | 16-01-2024 15:43 |
| 24.41 | 72.65 | 16-01-2024 15:20 | 248.8830913 | 182.2445736 | 0 | 0 | NaN | 16-01-2024 15:43 |
| 24.41 | 72.65 | 16-01-2024 15:43 | 248.8830913 | 182.2445736 | 108.3518041 | 4.1 | 0 | |
| 24.21 | 72.27 | 16-01-2024 17:28 | 291.1047773 | 222.9644721 | 149.2166506 | 84.94979054 | 44.46789779 |
so over here i want to derive last created date column per rake.
every pair of rounded_geo_lat, rounded_geo_lng pair corrospondes to distance column first pair will have distance_1_lat_lng, 2nd pair will have distance_2_lat_lng, 3rd pair will have distance_3_lat_lng and so on..
note: when we iterating distance column corrosponding to lat,lng pair we always will exclude self value in distance column and will go down till last value of that rake device.
now ,
case 1: while iterating down thorugh distnace column, there is no value less then 5 (<5) then last_created_Date will be null. (look at distance_1_lat_lng) and break the loop else continue
case 2: while iterating down thorugh distnace column from current pair of lat,lng when there is only ONE value less then 0.1 (<0.1) (look at distance_2_lat_lng) then pick up the created_date for that corrosponding value and add it to the last created date column. and break the loop else continue
case 3: while iterating down thorugh distnace column from current pair of lat,lng when there is multiple value less then 0.1 (<0.1) (look at distance_3_lat_lng) then pick up the created date of last occuring value and put it in last_Created_Date for that pair of lat_lng and break the loop else continue
case 4: while iterating down thorugh distnace column from current pair of lat,lng when there is value in the column which is > 0.1 and < 5 then put the created date of first occurence of that value as last_Created_Date.
i have written this function but while writing this above condition it is not giving expected result.
def find_and_append_created_dates(df_0):
df_0["last_created_dates"] = None # Add the new column with initial values of None
# Iterate over unique rake devices
for rake_device in df_0['rake_device'].unique():
rake_device_df = df_0[df_0['rake_device'] == rake_device]
# Iterate over distance columns for the current rake device
for i in range(len(rake_device_df)):
distance_column = f"distance_{i+1}_lat_lng"
# Iterate over rows below the current row
for j in range(i + 1, len(rake_device_df)):
distance = rake_device_df[distance_column].iloc[j]
# Iterate over distance columns
for distance_column in rake_device_df.filter(like='distance_').columns:
# Iterate downward from the row after the current row
has_value_less_than_5 = False
for j in range(i + 1, len(rake_device_df)):
distance = rake_device_df[distance_column].iloc[j]
if pd.isna(distance):
continue
# Condition 1: No value less than 5 in the entire column
if distance < 5:
has_value_less_than_5 = True # Mark that a value less than 5 exists
# Condition 2: First occurrence of value less than 0.1
if distance < 0.1 and not found_date:
found_date = rake_device_df.loc[rake_device_df.index[j], "created_date"]
break # Stop iterating if you find a value less than 0.1
# If no value less than 5 was found, assign NULL and break the loop
if not has_value_less_than_5:
found_date = None
break
# Assign the found_date (or None if not found) to the last_created_dates column
df.at[rake_device_df.index[i], "last_created_dates"] = found_date
return df
find_and_append_created_dates(df)
print(df)
Column last_created_date in the table i gave u is the expected output, we want to derive that column only, based on table above