Bachelor Thesis — Feature Model Learning & Dimensionality Reduction

Tech Stack
Description
This bachelor thesis investigates the challenge of automatically learning feature models for Software Product Lines from existing configurations using machine learning techniques.
The core problem is the curse of dimensionality: boolean configuration spaces grow exponentially with the number of features, making standard ML approaches inaccurate and computationally expensive at scale.
Two complementary solutions are explored: manually designed encoding strategies that exploit known feature relationships to produce more compact representations, and automated dimensionality reduction via Linear PCA and Logistic PCA.
The approaches are evaluated empirically across multiple real-world Software Product Line datasets using precision, recall, and runtime — revealing a clear and quantified trade-off between model effectiveness and computational efficiency.
- Analyzed the scalability challenges of Feature Model Learning in large Software Product Lines.
- Designed manual encoding strategies that exploit feature relationships to reduce input dimensionality.
- Applied Linear PCA and Logistic PCA for automated dimensionality reduction of boolean configuration spaces.
- Evaluated learning quality using precision and recall across multiple real-world SPL datasets.
- Quantified runtime behavior and identified scalability trade-offs between methods.
- Combined software engineering domain knowledge with applied machine learning techniques.
Page Info
Feature Models describe the valid configuration space of a Software Product Line — a structured way of representing which combinations of features are permitted. Learning these models automatically from existing configurations is attractive, but the boolean nature of the configuration space causes a curse of dimensionality: as the number of features grows, the space becomes exponentially large and sparse, making standard machine learning techniques unreliable.