Skip to content

Document Structure Learning

Automatic methods for extracting and learning the hierarchical organizational structure of documents. Rather than relying on pre-defined structural annotations, these approaches learn structures in a data-driven manner from raw text.

Document structure learning encompasses techniques ranging from dependency parsing (extracting grammatical relationships), to rhetorical structure discovery, to higher-level discourse organization. In the fake news detection domain, learned document structures offer a signal for distinguishing authentic from fabricated content, as real news typically exhibits more coherent and well-organized hierarchical structures compared to hastily-written misinformation.

Key papers