Python Openpyxl/Pandas program saving issues

88 views Asked by At

I am writing a code to automate some reports that I have to perform on a weekly/monthly basis.

Here is my code:

import pandas as pd
import openpyxl
from datetime import datetime, timedelta
from openpyxl.styles import Alignment
from openpyxl.worksheet.table import Table, TableStyleInfo
from openpyxl.utils.dataframe import dataframe_to_rows
​
# Defines Workbooks 'CW.xlsx', and 'RAW.xslx'
# Defines worksheets for "PRBs", "Raw Data" from "CW and "Page 1" from "RAW"
wb = openpyxl.load_workbook('CW.xlsx')
raw_wb = openpyxl.load_workbook('RAW.xlsx')
prb_sheet = wb["PRBs"]
raw_sheet = wb["Raw Data"]
sheet_from_raw = raw_wb["Page 1"]
​
​
# Defines the dates and path for the "closed_filter" function
start_date = 'yyyy-mm-dd'
end_date = 'yyyy-mm-dd'
file_path = 'test_metrics.xlsx'
​
# Defines the report date for future substitutions:
report_date = "m/dd/yyyy"
​
# Table number incrementation
table_num = 1 
​
# Problem Team concatetenation and Org mapping
team_map = {
        'Example Team ': ('Ex Team', 'Department'),
        'Example Team': ('Ex Team', 'Department'),
              
    }
​
# Deletes the following sheets: Metrics - Weekly | Weekly Trend - AMS | Weekly Trend - Infra 
# | EPIs | EPI Trending (Wkly) - Open | EPI Trending (Wkly) - Closed | Created | Closed | Open | PPM Created | PPM Closed |
​
delete_sheets = ['Metrics - Weekly', 'Weekly Trend - Overall', 'Weekly Trend - AMS', 
                    'Weekly Trend - Infra', 'EPIs', 'EPI Trending (Wkly) - Open ', 
                    'EPI Trending (Wkly) - Closed', 'Created', 'Closed', 'Open', 
                    'PPM Created', 'PPM Closed']
​
# Defines the formulas for Column AF-AN, f string so that the formula matches the row it is in            
AF_formula = 'excel formula'
AG_formula = 'excel formula'
AH_formula = 'm/dd/yyyy'
AI_formula = 'excel formula'
AJ_formula = 'excel formula'
AK_formula = 'excel formula'
AL_formula = 'excel formula'
AM_formula = 'excel formula'
AN_formula = 'excel formula'
​
def replace_empty_cells(dest_sheet):
    """
    Takes in a worksheet object, and replaces any empty cells in columns 27-30
    with the string 'TBD'.
    """
    for row in range(2, dest_sheet.max_row + 1):
        for col in range(27, 31):
            cell = dest_sheet.cell(row=row, column=col)
            value = cell.value
            # Check if the value is None, NaN, or NaT
            if value is None or (isinstance(value, float) and value != value) or pd.isna(value):
                cell.value = 'TBD'
                
# Loops through the sheets in "delete_sheets" and deletes them
for sheet in delete_sheets:
    del wb[sheet]
​
# Loops through all cells in "PRBs" and deletes them
# Future: prb_sheet.delete_rows(2, prb_sheet.max_row), messes with row formatting
for row_num in range(2, prb_sheet.max_row + 1):
    for col_num in range(1, prb_sheet.max_column + 1):
        prb_sheet.cell(row = row_num, column = col_num).value = None
​
# Loops through all cells in "Raw Data" and deletes them
for row_num in range(2, raw_sheet.max_row + 1):
    for col_num in range(1, raw_sheet.max_column + 1):
        raw_sheet.cell(row = row_num, column = col_num).value = None
​
# Copy everything from "RAW.xlsx" "Page 1" except the header/row1 without formatting
# Append that coppied data into the "copied_rows" list
copied_rows = []
for row in sheet_from_raw.iter_rows(min_row=2, values_only=True):
    copied_rows.append(row)
​
# Paste the copied rows into cell A2 of "raw_sheet"
for row_idx, row in enumerate(copied_rows, start=2):
    for col_idx, value in enumerate(row, start=1):
        raw_sheet.cell(row=row_idx, column=col_idx).value = value
​
wb.save("test_metrics.xlsx")
​
​
#################################################CLOSED FILTER#################################################################
def closed_filter(file_path, start_date, end_date, dest_sheet):
    # Load data from Excel file
    df = pd.read_excel(file_path, sheet_name='Raw Data')
    
    # Select the first 31 columns of the DataFrame
    df = df.iloc[:, :31]
    
    # Convert 'Closed' column to datetime format
    df['Closed'] = pd.to_datetime(df['Closed'], errors='coerce')
​
    # Filter data for dates between start_date and end_date
    mask = ((df['Closed'] >= start_date) & (df['Closed'] <= end_date)) | (df['Closed'].isna())
    
    filtered_data = df.loc[mask]
    
    # Convert filtered data to rows
    rows = dataframe_to_rows(filtered_data, index=False, header=False)
    
    # Append rows to destination sheet
    for row_idx, row in enumerate(rows, 1):
        for col_idx, value in enumerate(row, 1):
            dest_sheet.cell(row=row_idx+1, column=col_idx, value=value)
            
    wb.save("test_metrics.xlsx")
​
​
closed_filter(file_path, start_date, end_date, prb_sheet)   
​
# Reasigned values since they werent working
wb = openpyxl.load_workbook('test_metrics.xlsx')
prb_sheet = wb["PRBs"]
raw_sheet = wb["Raw Data"]
​
# Loop through each row in "State" column and update corresponding values in other columns
for row in range(2, prb_sheet.max_row + 1):
    # Get the value in "State" column
    state_val = prb_sheet.cell(row=row, column=6).value
    
    # Update values in columns AB, AA, AD, and AE based on the value in "State" column
    if state_val == 'Resolved/Closed':
        prb_sheet.cell(row=row, column=28).value = 'Resolved'
    elif state_val == 'Cancelled/Closed':
        prb_sheet.cell(row=row, column=27).value = 'N/A'
        prb_sheet.cell(row=row, column=28).value = 'Cancelled'
    elif state_val == 'Known Error/Closed':
        prb_sheet.cell(row=row, column=27).value = 'N/A'
        prb_sheet.cell(row=row, column=28).value = 'KE/WO'
​
    # Checks to see if there is a value in col1 and if col31 is "TBD" or empty it replaces it with "Other"    
    if prb_sheet.cell(row=row, column=1).value != None and (prb_sheet.cell(row=row, column=31).value == "TBD" or prb_sheet.cell(row=row, column=31).value is None):
        prb_sheet.cell(row=row, column=31).value = 'Other'       
        
    # Iterate "Problem Team" and replace with "team_map" values
    long_team_name = prb_sheet.cell(row=row, column=30).value
    if long_team_name in team_map:
        concat_team_name, org_name = team_map[long_team_name]
        prb_sheet.cell(row=row, column=30).value = concat_team_name
        prb_sheet.cell(row=row, column=31).value = org_name
        
# Call the function with the correct arguments
replace_empty_cells(prb_sheet)
    
# Checks through values in column "AA", and assigns them to a "float" if possible. If the values are not none. 
for cell in prb_sheet['AA']:
    if cell.value != None:
        try:
            cell.value = float(cell.value)
        except (TypeError, ValueError):
            cell.value = cell.value
            
# Loop through each row in the worksheet
for row in range(2, prb_sheet.max_row + 1):
    # Check if column A is not None type
    if prb_sheet[f'A{row}'].value is not None:
        # Apply the formulas to the respective columns
        prb_sheet[f'AF{row}'] = AF_formula.format(row=row)
        prb_sheet[f'AG{row}'] = AG_formula.format(row=row)
        prb_sheet[f'AH{row}'] = AH_formula.format(row=row)
        prb_sheet[f'AI{row}'] = AI_formula.format(row=row)
        prb_sheet[f'AJ{row}'] = AJ_formula.format(row=row)
        prb_sheet[f'AK{row}'] = AK_formula.format(row=row)
        prb_sheet[f'AL{row}'] = AL_formula.format(row=row)
        prb_sheet[f'AM{row}'] = AM_formula.format(row=row)
        prb_sheet[f'AN{row}'] = AN_formula.format(row=row)
​
for row in range(2, prb_sheet.max_row + 1):
    # Check if column A is not None type
    if prb_sheet[f'A{row}'].value is not None:            
        prb_sheet.cell(row=row, column=34).value = report_date      
​
​
# Defines sheets and workbooks for "sheet_created_closed" and "sheet_open" functions
created_sheet = wb.create_sheet("Created",0)
closed_sheet = wb.create_sheet("Closed",1)
open_sheet = wb.create_sheet("Open",2)     
​
# Save so that PRB is the same as Created,Closed,Open
wb.save("test_metrics.xlsx")
        
    
#########################################CREATED AND CLOSED SHEETS#############################################################
def sheet_created_closed(file_path, start_date, end_date, dest_sheet, column_name):  
    
    # Load data from Excel file
    df = pd.read_excel(file_path, sheet_name='PRBs')
    
    # Convert 'Created' column to datetime format
    df[column_name] = pd.to_datetime(df[column_name], errors='coerce')
​
    # Filter data for dates between start_date and end_date
    mask = (df[column_name] >= pd.Timestamp(start_date)) & (df[column_name] <= pd.Timestamp(end_date))
    filtered_data = df.loc[mask]
    print(filtered_data.iloc[:, 26:28])
    
    # Convert filtered data to rows
    rows = dataframe_to_rows(filtered_data, index=False, header=True)
    
    # Append rows to destination sheet
    for row_idx, row in enumerate(rows, start=1):
        for col_idx, value in enumerate(row, start=1):
            dest_sheet.cell(row=row_idx, column=col_idx).value = value
            
    # Determine the range of populated cells
    max_row = dest_sheet.max_row
    max_column = dest_sheet.max_column
    min_row = dest_sheet.min_row
    min_column = dest_sheet.min_column
   
    # Set cell alignment, wrapping and row height
    for row in dest_sheet.iter_rows(min_row=min_row, max_row=max_row, min_col=min_column, max_col=max_column):
        for cell in row:
            cell.alignment = Alignment(horizontal='left', vertical='top', wrap_text=True)
        dest_sheet.row_dimensions[cell.row].height = 55
        
    # Set column width for all columns
    for col in dest_sheet.columns:
        dest_sheet.column_dimensions[col[0].column_letter].width = 15
    
    # Set the first row height to "30"
    dest_sheet.row_dimensions[1].height = 30   
    
    # Adjust specific column widths for best viewing
    for col in ["A", "B", "C", "F"]:
        dest_sheet.column_dimensions[col].width = 13
    for col in ["E", "J", "K", "T"]:
        dest_sheet.column_dimensions[col].width = 45
    for col in ["AJ", "AK", "AL", "AM", "AN"]:
        dest_sheet.column_dimensions[col].width = 30
        
    dest_sheet.column_dimensions["D"].width = 8
    dest_sheet.column_dimensions["H"].width = 17
​
    # Create a table
    # Update the "ref" parameter to include the determined range of cells
    table_ref = f"{dest_sheet.cell(row=min_row, column=min_column).coordinate}:{dest_sheet.cell(row=max_row, column=max_column).coordinate}"
  
    # Incremement table numbers FUTURE: implement in a create table function
    global table_num
    table_name = f"Table{table_num}"
    table = Table(displayName=table_name, ref=table_ref)
    table_num += 1  

    # Apply the "Blue, Table Style Medium 2" format
    style = TableStyleInfo(name='TableStyleMedium2', showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=False)
    table.tableStyleInfo = style

    # Add the table to the worksheet
    dest_sheet.add_table(table) 
    
    # Set sheet zoom to 90
    dest_sheet.sheet_view.zoomScale = 90
    
    # Freeze columns A-F
    dest_sheet.freeze_panes = 'G1'
    
    # Loop through each row in the worksheet
    for row in range(2, dest_sheet.max_row + 1):
        dest_sheet.cell(row=row, column=32).value = AF_formula.format(row=row)
        dest_sheet.cell(row=row, column=33).value = AG_formula.format(row=row)
        dest_sheet.cell(row=row, column=34).value = AH_formula
        dest_sheet.cell(row=row, column=35).value = AI_formula.format(row=row)
        dest_sheet.cell(row=row, column=36).value = AJ_formula.format(row=row)
        dest_sheet.cell(row=row, column=37).value = AK_formula.format(row=row)
        dest_sheet.cell(row=row, column=38).value = AL_formula.format(row=row)
        dest_sheet.cell(row=row, column=39).value = AM_formula.format(row=row)
        dest_sheet.cell(row=row, column=40).value = AN_formula.format(row=row)

    for row in range(2, dest_sheet.max_row + 1):
        # Check if column A is not None type
        if dest_sheet[f'A{row}'].value is not None:            
            dest_sheet.cell(row=row, column=34).value = report_date   

    # Save the changes to the Excel file
    wb.save("test_metrics.xlsx")

sheet_created_closed(file_path, start_date, end_date, created_sheet, "Created")
sheet_created_closed(file_path, start_date, end_date, closed_sheet, "Closed")
sheet_open(file_path,open_sheet, "State")

Forewarning this I'm brand new with python and this is more of a passion project. I've had every piece of the program work at some point so I believe all of the code is functional. I am having issues when creating sheets from other sheets for example in the "sheet_created_closed" function where rows are being pulled from my "PRBs" sheet, but with different data. For example in my PRB sheet col 27, 28 were replaced with "TBD" but when grab that data and put it into new sheets ("Created"/"Closed"), some of the values are blank.

I am almost certain this issue has to do with saving at the right times in the right way, but for almost a week and hours of troubleshooting, I have not been able to figure it out. Essentially my problem is that I think I'm not saving correctly but I'm not sure how.

UPDATE: the issue is that the "PRBs" sheet has the correct data, but when it is used to create a data frame, it is using a version of "PRBs" from before the "prb_sheet_edit" and "replace_empty_cells" functions are run. I still don't know how to fix

0

There are 0 answers