With the help of atlassian api, i managed to get the content stored inside the table on the confluence page and write into a dataframe. Finally i load that into a json file. output looks like below
"table_name":"emp",
"created_date":"2/2/2024"
values present inside the table in the same way.
Now along with other fields, i have one more field called query which holds the sql query inside it
select * from table
where priority=1 -- filtering records with more priority 1
and owner="marie" -- maries' content
the problem here is, im writing the content directly from html to dataframe like shown below
confluence = Confluence(url=server, token=api_key)
page = confluence.get_page_by_id(page_id, expand="body.storage")
body = page["body"]["storage"]["value"]
#Writing the page content into dataframe
df = pd.read_html(body)
and json dump follows. I need to remove the comments preceded by -- inside the query. In the beginning itself all the values inside query turn into a string with single line, regular expression is not solving the issue. If query was delimited by new lines, the regex would have done its job.
Need help on removing the comments and before the json creation