we have a requirement to extract dark data from unstructured sources such as letters, rad reports, etc. Please suggest azure resource to extract data from common document formats: DOC, DOCX, PDF, RTF, TXT, HTML, etc and then to do analysis on the extracted data.
Azure resource to handle unstructured data sources
536 views Asked by 191180rk At
1
There are 1 answers
Related Questions in AZURE
- How to update to the latest external Git in Azure Web App?
- I need an azure product that executes my intensive ffmpeg command then dies, and i only get charged for the delta. Any Tips?
- Inject AsyncCollector into a service
- mutual tls authentication between app service and function app
- Azure Application Insights Not Displaying Custom Logs for Azure Functions with .NET 8
- Application settings for production deployment slot in Azure App Services
- Encountered an error (ServiceUnavailable) from host runtime on Azure Function App
- Implementing Incremental consent when using both application and delegated permissions
- Invalid format for email address in WordPress on Azure app service
- Producer Batching Service Bus Vs Kafka
- Integrating Angular External IP with ClusterIP of .NET microservices on AKS
- Difficulty creating a data pipeline with Fabric Datafactory using REST
- Azure Batch for Excel VBA
- How to authenticate only Local and Guest users in Azure AD B2C and add custom claims in token?
- Azure Scale Sets and Parallel Jobs
Related Questions in AZURE-COGNITIVE-SERVICES
- Long Loading Times for HoloLens 2 Unity App
- How to get a media stream of the speaker's output to transfer it over the network to microsoft cognitive services for real time speech to text
- Getting error runing Azure Text-to-speech in Google Function
- How to return images in a chunked Azure AI search index
- Process audio from Byte Stream or file without saving to disk Azure Speech SDK Python
- Action failed due to a Cognitive Services authentication error. Please check your authorization input and ensure it is correct
- Microsoft Cognitive Services Speech SDK JavaScript and C# Quickstart samples both giving error while enrolling profile
- Problem with running Azure spatial-analysis container
- Can Microsoft Azure Translator API translate text that has HTML tags?
- Using Azure AI Immersive Reader in ReactJS app
- Why does Azure Cognitive Search Indexer Create Base64 names unnecessarily?
- What is the estimated maximum time that is taken by Azure Document Intelligence (formerly Azure Form Recognizer) to Analyze an ID Document?
- Azure Neural Voice: Invalid deploymentId
- Speaker Identity using azure speech recognition
- What is causing a discrepancy in the time calculation in Azure's speech service?
Related Questions in AZURE-ANALYSIS-SERVICES
- Microsoft.AnalysisServices.AdomdClient.NetCore.retail.amd64 not working on windows server
- How to list the databases/models under Azure Analysis Services using Powershell?
- :Unable to obtain authentication token using the credentials provided
- Error when Refreshing Power BI Gateway - Mashup error
- Request parameter RefreshType is not in correct format in ADF
- Function app not connecting to analysis services server getting Specified method is not supported
- Does Microsoft.AnalysisServices.Tabular.NetCore have support for SQL Server user
- what is the correct way to format a connection string for azure analysis services using Microsoft.AnalysisServices.NetCore
- Azure powershell function app getting no parameter defined in the script or function for the input binding 'Timer'
- How to Kill sessions running for last 5 minutes automatically in azure analysis services?
- Automate creation of service principal and adding admin rights
- how to link user managed identity to azure analysis services
- does Azure Analysis Services have support for SystemAssigned Manged identity througth bicep template
- format for request to azure analysis services rest api
- Trying to refresh data model using pyadomd but getting namespace cannot appear under Envelope/Body/Execute/Command
Related Questions in AZURE-ANALYTICS
- Azure KQL list filtering
- Azure DevOps Server 2020.1 analytics issue related to iteration creation
- Clone a custom table in Azure Analytics
- How to use Azure Web App and Microsoft Extension Logging to log Scopes in to Azure Analytics?
- Unable to configure Azure Insights on Azure VM running Windows
- Power BI Real-Time Streaming with SQL Data using Azure Stream Analytics Job
- Is there any Lync Services or connectors available for Azure Sentinel or Azure Log Analytics to connect Azure Data Factory
- Find SQL query from Azure Analytics Log error message
- Azure DataFactory - A database operation failed with "Invalid object name"
- finding detailed error information in azure data factory pipeline
- ADF- Define dynamic triggers
- Azure Analysis service - Firewall off deny policy
- How to use s skalar stored in 'let' in a where clause with '!contains' in Kusto Query Language
- How to access an azure Database containing data from Azure Log Analytics Query
- How to track Azure work items that have been added into the sprint after the iteration start date?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
It sounds like you just want to extract raw text or images from these rich text format documents. If only do these, some libraries of parsing different documents is your real needs.
Here is some libraries in Java or Python to do that. If you are using .NET which I'm not familiar with, you can search in Google or Bing to find these alternative for .NET.
Apache POIis a good library for extracting data from MS office files; for Python, there seems to be not any package to do that, except using COM object likeWord.ApplicationorIronPython(Reading/Writing MS Word files in Python) in .NET on Windows.Apache PDFBox,jPDFTextfor Java andPyPDF2for Python.javax.swing.text.rtf.RTFEditorKitwhich you can get some sample code via search; like #1, also seems none for Python.jsoupfor Java andBeautifulSoup&HTMLParserfor Python are best for extracting data from HTML.Stanford NLPfor Java andNLTKfor Python are useful, also using Azure Text Analytics API of Cognitive Service can help doing some like key phrase extraction, and language detection.Tess4Jor others you searched in GitHub.All of above are almost depended on the third party dev kits without Azure resources. However, you can store these documents in Azure Storage and process them on Azure VM or Batch services, even to analyze the extract data in Azure Jupyter Notebook or use Azure ML to do more deeper research.