I use Python. There are two CSV files, the team name column in the first CSV file “PES TEAM 2021 A” is the reference. By using the reference, compare and find team name in the second CSV file, find if the team names in the CSV file 2022 exists in the first reference file 2021.
PES TEAM 2021 A

CSV file 2022 A

I use the fuzzy string match to do the partial match. For example, team “Manchester United” in file “2022” matches “Manchester United FC” in file “2021”. And then label “TRUE” beside “Manchester United” in file “2022”. Another example is “Alanyaspor” in file “2022 matches “Aytemiz Alanyaspor” in file “2021”. And then label “TRUE” beside “Alanyaspor” in file “2022”. Another example is “Ajax Amsterdam” in file “2022 matches “AFC Ajax” in file “2021”. And then label “TRUE” beside “Ajax Amsterdam” in file “2022”.
However, when doing matching, my output is all true in the picture below:

I tries this script below.
However, when doing matching, my output is all true in the picture below:

import pandas as pd
from fuzzywuzzy import fuzz
# Replace 'reference.csv' and 'second_file.csv' with your actual file names
reference_file_path = r'C:\Users\pes TEAM DEF\PES TEAM 2021 A.csv'
second_file_path = r'C:\Users\pes TEAM DEF\PES TEAM 2022 A.csv
# Read the reference and second CSV files
df_reference = pd.read_csv(reference_file_path)
df_second = pd.read_csv(second_file_path)
# Extract unique team names from the first CSV file
unique_team_names = set(df_reference['2021Name'].str.lower())
# Function to check if the similarity is more than the threshold
def is_similarity_above_threshold(name, reference_names, iterabl_team_names, threshold):
return any(fuzz.partial_ratio(name, reference_names) > threshold for name in iterabl_team_names)
# Add a new column to the reference file indicating if there's a 70% partial match
df_second['PartialMatch'] = df_second['2022Name'].str.lower().apply(
lambda x: is_similarity_above_threshold(x, unique_team_names, second_team_names, 70)
)
# Display or further process the DataFrame with highlighted duplicates
#print(df_second[df_second['is_duplicate']])
output_path = "C:/Users/Downloads/pes TEAM DEF/output/abcd.csv"
df_second.to_csv(output_path, index=False, encoding='utf-8-sig') #add utf-8-sig or japanese and latin cannot display normally