Save time and money through semantic data structuring


Due to the constantly growing importance of content for e-commerce and for automated text creation, the relevance of structured data is also increasing. However, suitable data structures are often not yet available, especially in medium-sized businesses. Data structuring is therefore the first important step towards text automation.

Even if data is “buried” in PDF documents that have not yet been indexed yet, semantic PDF analyzers can automatically read the content and put the required information into a structured form. The Data Extractor developed by text2net achieves this in three phases:

SCAS WorkflowThe semantic content analysis does not just simply recognize that a certain term occurs in the PDF document. It also checks the context in which it is used by means of appropriate grammars. This almost guarantees that a certain term is really located in the relevant context and has the meaning that was searched for, e.g. the product weight. Data structured in this way can be made available as JSON and retrieved via a REST API on a web portal for further processing, e.g. for automated import into a PIM or for text automation. This saves time and money.

An important aspect for the success of online content is readability for search engines. The presented solution structures the content according to the standard. This standard is recognized by the most important search engines and is taken into account when displaying search hits. Thus, a text structured on this basis can stand out positively from other content in the ranking.

In conclusion: Semantic data structuring can not only save time and money, but also increase the relevance on the internet.

data extractor