Introduction to Natural Language Processing for Text
The largest subset of any digital information at hand of any organization (or individual) is human text in myriad logical and physical format. Due to the very nature of this unstructured, free and ambiguous information these documents are quite difficult for computers to comprehend, and consequently, they mostly rest unexploited.
However, despite all the uncertainties, nowadays, thanks to the data science subfield called natural language processing, we could still extract useful piece of information from texts. By the application of machine learning we are able to classify them from different aspects: what is the language it is written in, what is its dominant sentiment, what are the topics it is related to. Apart from such different categorizations we can process the documents further in order to extract essential piece of insights. For instance, we could identify the respective named entities (like persons, institutions, geolocations) and the relationship between them. Alternatively, it is possible to give a short summary of any longer document, so make it more time-effective for human readers to ingest their very essence.
in a long financial report about mergers and acquisitions we could point out which companies are affected, which one of them is the subject and the agent of the acquisition, when did the action happen. Of course, next to this basic information we can extend the inquiry with additional insights from other sources: what industry the companies belong to, what is its major profile, how profitable it is, who are their respective senior officers etc.
Added values (Why AI/ML/DL): automatization of paper document processing for saving cost and time.