What question did this study set out to answer?

This project aims to automate the extraction of public opinion from unstructured hearing documents to enhance data-driven decision-making in urban development.

June 2, 2026Open Access

Specialization Project: Aspect-Based Sentiment Analysis in Municipality Documents

Key Points

This project aims to automate the extraction of public opinion from unstructured hearing documents to enhance data-driven decision-making in urban development.
Structured literature review mapping 26 studies on AI and sentiment analysis
Exploratory data analysis on a pilot dataset of 63 municipal texts
Investigation of span ambiguity gap in bureaucratic text annotations
Human annotators show low agreement on exact text boundaries (κ = 0.21) but high reliability in semantic categorization (κ = 0.93)
Standard extraction pipelines are unsuitable for bureaucratic language
Proposal for a Master's Thesis using large language models to generate synthetic training data

Abstract

Oslo Municipality seeks to automate the extraction of public opinion from large volumesof unstructured hearing documents to enhance data-driven decision-making in urban devel-opment. While the municipality’s ”AI Factory” initiative has successfully implemented sum-marization tools, these fail to provide the structured, quantitative data required to measure thedistribution of support and opposition across policy topics. This project investigates the feas-ibility of implementing Aspect-Based Sentiment Analysis (ABSA) to bridge this gap within theconstraints of a low-resource, privacy-sensitive public sector environment.Through a Structured Literature Review (N = 26), this study maps the transition in the state-of-the-art from discriminative classifiers to Generative AI and structure-aware architectures.Concurrently, an Exploratory Data Analysis on a pilot dataset of municipal texts (N = 63)identifies a critical ”Span Ambiguity Gap”: while human annotators struggle to agree on exacttext boundaries for bureaucratic phrases (κ = 0.21), they exhibit high reliability in identifyingsemantic categories (κ = 0.93).These findings demonstrate that standard extraction pipelines are allegedly ill-suited forthe bureaucratic register. Consequently, this report proposes a strategic pivot to Aspect Cat-egory Sentiment Analysis (ACSA). The study concludes with a proposal for a subsequent Mas-ter’s Thesis centered on a Teacher-Student framework, which utilizes Large Language Models(LLMs) to generate synthetic training data (”Inverse Generation”). This approach aims to en-able the training of capable, privacy-compliant local models, effectively solving the ”Cold Start”data scarcity problem while bypassing the inherent ambiguity of manual span annotation.

Specialization Project: Aspect-Based Sentiment Analysis in Municipality Documents

Key Points

Abstract

Cite This Study