I am trying to export a list of Reddit comments I web scrapped to an Excel file. The file is created but gives me this error in Excel:
"Excel cannot open the file 'test1.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension mates the format of the file."
My code so far:
url = "https://www.reddit.com/r/bayarea/comments/17v80j5/do_you_see_yourself_using_the_ca_high_speed_rail/"
submission = reddit.submission(url=url)
posts = []
for top_level_comment in submission.comments:
if isinstance(top_level_comment, MoreComments):
continue
posts.append(top_level_comment.body)
posts = pd.DataFrame(posts,columns=["body"])
indexNames = posts[(posts.body == '[removed]') | (posts.body == '[deleted]')].index
posts.drop(indexNames, inplace=True)
print(posts)
writer = pd.ExcelWriter('test1.xlsx', engine = 'openpyxl')
posts.to_excel(writer, sheet_name='data', index = False)
Should i use a different library to export to excel?
because i didn't use reddit scrapped before, so i just try the dataframe save code. here is my try:
or you can just use posts.to_excel('test1.xlsx', sheet_name='data', index = False) to inplace
by reading your code, I don't think there is a problem caused by appending Excel data, as you only use to_excel once at the end
that's all. wish to help u.