Taking Out the Human Factor: Comparing Algorithmic Profiling Against Human Baselines

About this Session

Time

Thu. 16.04. 10:15

Room

Speaker

Public administrations, facing growing demands and limited resources, increasingly turn towards algorithmic decision-making (ADM) systems to enhance efficiency and effectiveness across a range of public services. From criminal justice and child protection to immigration and employment services, algorithmic tools promise consistency and improved accuracy over human judgment in tasks such as triaging cases, allocating interventions, and forecasting risk. At the same time, research in algorithmic fairness pushes back against the use of algorithms in public settings, following reports of systematic prediction errors and false accusations that disproportionally affect minority groups. These risks can be attributed to social biases embedded in the models’ training data: when previous judgments of humans were affected by stereotyping, discrimination and prejudice, algorithms can learn and potentially amplify such patterns. The shift from human to algorithmic decision-making raises fundamental questions regarding the role of human discretion in decision-making and the implications of replacing human judgment with predictions in sensitive domains. Human decision-makers bring contextual understanding and social sensitivity to their work, but also have biases and inconsistencies. ADM systems offer the allure of neutrality and scale, but often lack the nuanced judgment required in complex social contexts and risk amplifying societal biases embedded in training data. Directly comparing human and algorithmic assessments on the same grounds is critical to understand how such trade-offs between both types of decision-making unfold, where they disagree, and how they might complement each other. Yet, research in ADM often operates under implicit assumptions regarding the accuracy and fairness of the non-algorithmic baseline: the decisions made by human caseworkers in real-world operational processes. This paper presents a comparative analysis of human and algorithmic decision-making in the context of public employment services in Germany. Specifically, we study the task of predicting long-term unemployment risk using a unique dataset that combines rich administrative labor market records with contemporaneous caseworker assessments collected at the time of initial intake. This setting enables us to directly compare three approaches to decision-making: (1) human caseworkers at local job agencies, (2) a machine learning model trained on historical administrative data and observed outcomes, and (3) a hybrid model trained to emulate the assessments of caseworkers. By evaluating all three approaches on the same grounds, we show how humans and algorithms systematically diverge in their assessment of social subgroups, providing new empirical evidence on the hidden costs of automation in public domains.