Predictive Analytics to Increase Roster Robustness in an Inbound Call Center

Staff rostering is a crucial task in inbound call centers, as personnel costs usually account for the largest share of operating costs. Uncertainty of capacity, such as the presence of agents is often disregarded during rostering. This paper addresses the problem of uncertainty by using predictive analytics to predict agent absences and thus increase roster robustness. Operational data from four years of a call center serves as a basis for our use case. Predictors include characteristics of the service agents such as attendance history and regular working hours as well as other factors such as the weekday. Of the prediction algorithms tested, decision trees outperform other predictive modeling approaches. Evaluation based on an expected value framework shows that the predictive analytics approach performs best compared to the planned, unchanged roster and a general staff surcharge of 10%.


Introduction
In many service industries, it is particularly important to avoid understaffing. One example is health care where a minimum number of nurses are required and a staff shortage, e.g., due to illness, has to be avoided. Another example is companies that provide full-service availability to customers, and in which an underestimation of demand or an unforeseeable absence of staff leads to a loss in revenue. We use an inbound call center offering secretary service via telephone as an example for the latter within this paper.
The call volume in an inbound call center varies highly between days as well as within a day. Ac-This study examines the use of predictive analytics to create more stable rosters in order to increase the robustness against short-term disruptions. Using predictive analytics, we aim at predicting staff absences and thus the target variable specifies whether an agent is absent or not. Real-world data of a European call center serves as the basis for a case study. Following the CRISP-DM process [20], the data provided are used to test various predictive models. We examine which model fits best for practical application of staff rostering and take accuracy as well as the true positive rate of the prediction results into account, because we have to deal with imbalanced data. In this paper, we discuss only the best performing model in more detail. In addition to the analysis for model selection, we evaluate the resulting costs of the best model using the expected value framework [21]. In our use case, it is the costliest case to predict an absent agent as present, since significant revenue losses are expected. Furthermore, the number of needed changes with and without using a predictive approach is compared in order to discuss the roster robustness. As our main contribution, we identify the use of predictive analytics to forecast employee absences to adjust an initial roster. The potential of such a modified roster results from a reduced number of unforeseen disruptions during roster operation. So far, an approach like this has not been discussed in the literature.
The remainder of this paper is organized as follows. In the next section, we introduce the context and the problem in more detail and shortly present the state of the art in the field of absence predic-tion. In the third section, we discuss our use case of staff rostering in a European call center, the data provided and the process of data preparation. In section 4, we provide a description of our analysis and an evaluation of the best performing prediction model. Finally, we discuss our findings and give an outlook into future research as well as practical implications.

Problem Description and State of the Art
In order to get an overview on predictive analytics used in the area of staff availability, we examine different fields of scientific literature. First, we point out where the problem considered occurs in the staff rostering process in a call center. Regarding this, the focus is on strategies for dealing with uncertainties in capacity. Then, we discuss predictors for the absence of staff from the literature and give a brief introduction to predictive analysis.

Staff rostering in call centers
An inbound call center offers services via telephone for their own or external companies and deals with incoming calls from customers [3]. Due to variable call volume, forecasting the number of incoming calls is enormously important for a call center in order to optimize the staff efficiency and thus personnel costs. The whole staff rostering process can be divided into four sub problems [22] These steps are usually executed sequentially, since the outcome of the previous step serves as input. An initial roster represents the overall result of staff rostering. This roster is adapted to the expected conditions and assumes a deterministic environment. However, in practice, roster disruptions such as short-term agent absences might disturb business operations and lead to changing requirements resulting in staff shortages. In this study, we focus on a subsequent step to initial rostering in order to uses the company's standard surcharge of 10% on staff demand which is traced back to absences due to vacation, training or illness.
Another more sophisticated option is to adjust the roster, more precisely the demand or the allocation of service agents, in such a way that a staff shortage can be avoided even if an agent is absent. Better than a general surcharge are concrete buffers for service agents, which should be defined based on a case specific procedure. Ingels and Maenhout [10] examine how to define optimal buffers by including them in their optimization model to increase roster robustness. Doing this, they create a roster with greater stability against unforeseen disruptions. Predictive analytics is also suitable for this, since such models predict additional staff requirements [9,19]. In this way, possible disruptions are proactively addressed and an alternative roster can be generated that is more robust to short-term absences and minimizes the number of short-term shift changes. In the scientific literature, predictive analytics is not common in the field of uncertainty in capacity and prediction of absences. One example, however, is Wang and Gupta [19] who use the attendance history to predict employee presence in order to achieve an even distribution with regard to the probability of an unplanned absence within a scheduling heuristic. Further predictors and studies are presented in the following section.

Predictors for agent absences
The context in this study is an absent service agent whose presence is needed to cover staffing demand. In literature, various predictors for agent absences are identified. A basic model regarding staff attendance from Steers and Rhodes [13] is based on two main factors: motivation and the possibility to be present. Brooke [12] extends this model by removing demographic as well as motivational variables and adds health-related and organizational characteristics. The study from Steel and Rentsch [15] uses similar factors as in Steers and Rhodes [13], such as job satisfaction and demographic characteristics. Steel and Rentsch [15] figure out that gender and education are good predictors, and state that the length of the period over which absences are accumulated plays a major role. Additional determinants for agent absences are contract specific details and work attitude which seems to be more important than individual characteristics [16]. improve a roster in terms of robustness and to make roster operations more stable to agent absences.
Staff rostering is proven to be extremely difficult, especially when skills are taken into account and there is no fixed shift system [1,23]. Uncertainties are hardly considered in order to increase the roster's robustness and to reduce the number of shift changes during business operations [4,24]. The volatile volume of calls on the one hand and the unforeseeable presence of allocated staff on the other hand, who may be absent due to short-term illnesses, cause a high level of uncertainty in rostering [25]. Following Van den Bergh, Beliën [5], there are three types of uncertainty. For staff rostering in a call center, uncertainty of demand (length of a call) and uncertainty of arrival (time of a call) are important issues. These two aspects are often linked and are considered together in many publications, e.g. Green, Kolesar [26], Atlason, Epelman [22] and Avramidis, Chan [25]. In this paper, we focus on the third type -uncertainty of capacity (staff availability). Van den Bergh, et al. [5] identified only a few publications dealing with this kind of uncertaintywhich is significantly less compared to the first two aspects. These include, among others, Bagatourova and Mallya [27] who use a heuristic combined with a simulation to incorporate uncertain employee presences in scheduling. Strategies for dealing with uncertainty in capacity are explained in the following section.

Solution approaches for dealing with uncertainty in capacity
Uncertainty of capacity refers to the deviation of allocated and actual present staff. There are various strategies for dealing with uncertainties in capacity in staff rostering. These include the use of reserve staff that serves as capacity buffer which is common in the airline industry [7]. Reserve agents are available for a short-term shift allocation, but this procedure is highly cost-intensive and therefore not applicable in every use case. Another proactive possibility is to consider staff substitutability during rostering and thus increase schedule flexibility [8].
In case of a shortage, rerostering is necessary to reassign vacant shifts to free agents. Such a reactive rerostering is time consuming [6], disrupts operations and can lead to service agent dissatisfaction [9]. In call centers, another option is to increase the staff demand by a surcharge [3] which also increases personnel costs [6]. Alfares [28], for example, In a comprehensive study, 70 potential predictors for nurse absenteeism are categorized into eight groups: Attendance history, work attitude, duration of employment, burnout and stress, management traits, staff management practices as well as individual, work and job characteristics [17]. Wang and Gupta [19] investigate unplanned absences by individual characteristics such as attendance history and other factors like shift type, weekday, holiday and weather, of which personal history turns out to be the most suitable predictor. Expected workload is another factor influencing the presence of staff, examined by Green, Savin [18]. The factors that can be used as predictors in practice mainly depend on the data available.
The aim of predictive analytics is to forecast future events that are assessed as risk using historical data and advanced methods such as machine learning [11]. In predictive analytics, a distinction can be made between supervised and unsupervised learning. Supervised Learning regards predicting a specific target variable with historic data. Depending on the datatype, classification or regression are desired methods. Using predictive analytics, possible disruptions can be addressed proactively [9].
To the best of our knowledge, there is no study concerning the prediction of agent absences in a call center in order to increase roster robustness. However, predictive models are used to forecast varying demand for staff rostering in call centers, for example as presented in Gans, Koole [3] as well as in Wooff and Stirling [29]. In this paper, we aim at closing this gap. Therefore, we proceed according to a common approach for data analytics projects, the CRSIP-DM process by Chapman, Clinton [20]. They suggest an iterative process consisting of six major steps which are described within the next section.

Call Center Data
In this paper, the staff rostering process of a European call center serves as use case. CRISP-DM is an iterative process consisting of various phases, starting with business and data understanding in order to determine the prediction goals. Then, the historical data received is prepared for the application of different models. These steps are described within this section. Afterwards, in the modelling phase (section 4), we generate different models and by comparing them, we determine the best of them for prediction of agent absences. This model is then evaluated concerning the originally defined use case and prediction goals.
For the analysis, we use real-world data from a European call center that employs up to 400 agents. This call center handles secretary tasks for companies and is paid by these for each call answered. The revenues per call vary according to contractual agreements and call length. Since the call center provides services for companies all over Europe, for an incoming call a service agent with appropriate language skills has to be available. Therefore, not only the general call volume, but also language skills have to be considered when rostering. Unplanned absences lead to major disruptions in business operations and have a negative impact on revenue. Thus, the call center proactively tries to reduce the effects of such absences: Absences are predicted to allow for adjusting the demand for agents whenever there is a risk for understaffing.
Following Gans, Koole [3], a call center produces a large amount of data which can be divided into operational, marketing, human resources, and psychological data. We use two operational data sets (roster and log-in times) for extracting the executed shifts and absence notes during the period of 2014 to 2017. The roster data consists of 617,695 rows of data and five individual attributes about start and end times, task characteristics, and the assigned agent. Each row represents one specific task, for example, taking the phone calls with a specific language skill. Several tasks can add up to one shift. As a result, the data set contains 59,291 shifts based on 30-minutes intervals for the analysis. The log-in time's data contains information about when an agent was actively available for taking phone calls. The log-in times contain 3,138,303 rows with the same attributes as the roster data. Additionally, we consider a human resource data set with 15 attributes containing information about each agent's seniority, skills, contract agreements and working time schemes as well as demographic characteristics. These attributes can serve as predictors for the analysis in addition to shift properties such as day, time and required language skill.
To prepare the data for predictive analysis and enhance data quality, we perform several steps. First, exact duplicates are removed. Duplicates that can only be detected by domain knowledge are completely removed, since backtracking to the a logistic model. Choosing an appropriate time period for estimating the number of absent agents is a difficult for the linear decision. Since the roster is planned in 30-minute intervals and the staff demand may vary between the intervals, a time period of 30 minutes seems to be useful. But in this way a shift is regarded several times. In addition, all background information of the service agents affected must be combined to achieve the aggregated linear output. Therefore, the personal information taken into account in the literature could not be considered completely. For these reasons, we prepare the target variable for a logistic decision. The binary variable is created as follows: If an agent is sick all day and thus absent for the whole shift, the shift is obviously marked as absent. However, if a service agent leaves during a shift due to illness, the entire shift is marked as absent if more than 50% of the shift duration is affected. This method is adapted from practice. The developed target variable reveals 5,817 absent shifts out of a total of 57,359 shifts, which corresponds to 10.14%.
In the sixth step, the data for modeling is further improved by extracting additional information of certain attributes or combining given attributes into new information. In particular, we examine the date specification further for date characteristics such as weekdays or holidays. The combination of shift and agent information offers additional potential. Doing this, the final list of predictors includes 22 attributes: • Before modeling, we make some further adjustments to the data. Categorical attributes are converted into dummy attributes, as well as a standardization of data for use in some models is done. In this analysis, a standard scaling is used by subtracting the mean and scaling to unit variance. A original entry is not possible. Shifts are treated as duplicates if they affect the same agent in the same time period. Agents are considered duplicates if their ID is identical.
Second, we look at missing values. Some attributes with missing values are treated with specific domain knowledge. For the remaining missing values, their number is checked. Attributes with more than one third missing values are removed due to lack of explanatory power. There are different types of missing values and accordingly proper ways to handle them [30]. This study deals only with values missing completely at random and we replace cardinal scaled attributes with mean, ordinal with median and nominal with mode.
Third, the three data sets are merged to one data set. All information concerning the shifts are in the roster and log-in times data. Each roster shift is checked for a matching log-in and marked accordingly. Doing this, we receive a data set with the actually performed shifts and the planned shifts not performed due to an agent absence. Then, the human resource data adds 15 background attributes for each agent. The resulting data set serves as a basis for the predictive modeling.
Fourth, we remove absences contained in the data, which are known in advance. This analysis only considers several types of illness as unplanned absences, because planned absences such as vacation and overtime compensation need to be agreed upon and are not included in the data. However, the planned roster already contains absences due to sickness, which are mostly long-term illnesses or maternity leave periods. As we want to predict unexpected short-term absences, such cases are not considered in the prediction and thus deleted.
The next vital step is to prepare the target variable. It is therefore necessary to discuss how the target variable is modeled, even if modeling decisions are part of the modeling phase (see section 4). In the raw data, the target variable reveals several types of absences as well as further shift information. There are two options for preparing: Either the target variable is regarded as a logistic decision (whether an agent is absent or not for a certain shift) or the target variable is treated linear (to calculate a surcharge for a particular period of time). Linear modeling provides an aggregated method for the defined goal, but can still be calculated with sis, the outer cross-validation is repeated once with three splits. Within each split on two parts an inner cross validation is executed to determine the best performing model on average, which we then validate using the third part. The inner cross validation is repeated ten times with three splits. Again, we use two splits to train the model and, for evaluation, we use the third split. For each modeling approach tested, the average of the evaluation criteria is calculated. Overall, the RS-NCV is performed twice: First, choosing the best model and, second, identifying the most suitable parameters of the best model.
There are various methods for modeling classification problems such as linear classifications, support vector classifiers, nearest neighbor analysis, decision trees and ensemble learning methods. We select a set of modelling techniques predicting classifications that meet the required criteria. Linear and quadratic discriminant analysis tries to separate the solution space with linear, respectively quadratic decision boundaries [35]. In addition to linear methods, the logistic regression tries to directly predict the class probabilities [35]. SK-Learn offers five different solvers for logistic regression. In this study, we use the solver saga presented by Defazio, Bach [40], because it is faster and offers the "l1" penalty. Using "l1"-penalty can lead to sparser solutions, less attributes are used for classification and the risk of overfitting is decreased. Support vector classifiers (SVC) can be considered as borderline cases as they produce non-linear boundaries by transforming the data set and creating linear boundaries within these transformed features [35]. Using SK-Learn, it is possible to vary between different kernels. This study compares a linear kernel, a radial basis function kernel (rbf.) as well as a polynomial kernel (poly) of third degree. In contrast to linear models, a memory based classifier is used -the nearest neighbor classifier [35]. That classifier maps classes with the help of the majority vote among the k nearest neighbors of the entry to predict [35]. SK-Learn automatically chooses the best algorithm (Brute-Force, K-D Tree [41] or Ball Tree [42]) for calculating the nearest neighbor. Another popular approach is the use of decision trees. Decision trees separate the solution space in dichotomous parts in several iterations [35]. SK-Learn adopts the CART algorithm of Breiman [43], which is quite similar to the C4.5 by Quinlan [44]. As decision criteria SK-Learn uses the Gini impuri-problem for some models is imbalanced data -as in our case study: The majority of the data entries examined show normal presence without absence. Guo, Yin [31] mention different techniques to target this issue. Extensively discussed is the strategy of under-sampling and over-sampling [31][32][33]. We do not use under-sampling, since under-sampling bears the risk of losing important information [31], such as rare events like holidays. Likewise, we do not choose over-sampling, as it increases the probability of overfitting and causes a strong bias in the data set [31]. Another strategy is to manipulate the models by class weights. The analysis described in the following uses class weights along with further evaluation criteria as a strategy for dealing with imbalanced data. In addition, challenges arise in the evaluation of the prediction results due to imbalanced data, e.g., accuracy might not be the best criteria. Therefore, further information is often needed and the use of domain knowledge for intensive checks of the data is recommended [31].

Predictive Modeling and Model Selection
According to the steps of the CRISP-DM process [20], the prepared data is now used for modeling. The modeling is done with python (3.7.1) and the library of scikit-learn (SK-Learn, 0.21.2) 1 [34] -a standard library for data analysis. In order to identify a suitable model for predicting absences, we test different modeling approaches (section 4.1). Afterwards, the best prediction model is further evaluated in order to derive conclusions for the case study (section 4.2).

Predictive analytics for agent absences
In this study, the target variable to predict is binary, indicating whether an agent is absent or not. Based on this outcome, supervised learning techniques are used [35]. Of these, classification is the most appropriate based on the prediction intended. An important issue in classification is overfitting, which is why we pay special attention to it. Applying a cross-validation can lead to overfitting because the information might leak into the models [36,37]. Based on Stone [38], Leave-One-Out-Cross-Validation Nested-Cross-Validation is derived. This is further extended by Krstajic, Buturovic [39], who recommend repeating validations and ensuring equal class distributions (stratified splits). Following this, to avoid overfitting, a Repeated-Stratified-Nested-Cross-Validation (RS-NCV) is used. In this analy- Table 1 shows the accuracy and true positive rate based on the inner cross-validation results of the ten best performing models. The SVC models each have a very high true positive rate, but due to their low accuracy with an average of 72% they are not suitable for a prediction. The other models all have a high accuracy (between 90.11 and 91.97%), which is higher than that of the majority classifier. Accordingly, the true positive rate is now decisive for the selection between these eight predictive models. The true positive rate varies greatly, the remaining models are sorted by ascending rate from the fourth row of Table 1. The best performing model is the decision tree without classifier weights, because it has an average true positive rate of 56.35%. The results show that even though the decision tree has the highest true positive rate, it lacks accuracy compared to ada boost or random forest. However, the true positive rate is much higher and thus the imbalanced data problem is solved appropriately.
The predictions based on the resulting decision trees may still suffer from overfitting [35]. In order to improve the generalizability and reduce complexity, we use the maximum tree depth, the maximum number of leaves, and the maximum number of attributes per split as stopping criteria. Using RS-NCV, we compare decision trees of varying complexity. The outer iteration of the RS-NCV is repeated twice, since the models are much more similar. The RS-NCV ensures that the reduced decision tree with the best generalization performs best. The best reduced tree is generated with a maximum ty. Furthermore, we exploit methods belonging to the group of ensemble learners. Ensemble learners combine several simpler models [35]. One of them is random forest like Breiman [45] proposes, that are constructed upon different trees. Unlike the original implementation, SK-Learn considers the probability average for estimating the class. Further ensemble learners that are tested are boosters, such as the ada boost of Freund and Schapire [46] and the gradient boosting classifier.
In addition to multiple kernels, we compare some other variations. For the k nearest neighbors' model, different values for k are used. Regarding the ada boost, a series of base estimators with different complexities are compared. Finally, for some models the possibility to use class weights is exploited to address imbalanced data. The use of this specification is marked with 'bal.'. Table 1 summarizes the ten best performing predictive models tested and an evaluation of those described in the following.
In order to select the best prediction model, we look at the models' accuracies. Since accuracy is not sufficient to evaluate the prediction based on imbalanced data [31], we also examine the true positive rate in the following. Due to the huge attendance-rate, the majority classifier's accuracy is 89.86%. We aim at identifying a model that has a higher accuracy than the majority classifier and a high true positive rate at the same time. This allows us to find a precise model, which avoids misclassifications that would be expensive in roster operations of our use case. To assess the economic consequences of the prediction, we use the expected value framework [21]. More precisely, we estimate the costs per hour for each of the cases in the confusion matrix in consultation with the call center's management. In the call center, costs are calculated per hour and not per shift, since the duration of the shifts varies greatly. Accordingly, the classified shifts with their respective duration in hours are considered in this evaluation step. The number of classified hours as well as the estimated additional costs per hour are summarized in Table 4. The costs correspond to the extra personnel costs due to additional allocat-depth of 40, a maximum of 85 attributes and a maximum of 3,500 leaves. These maximum values are not reached by every decision tree, but more complex trees with less generalization are avoided. Stopping even earlier decreases the results with no increase in generalization as the RS-NCV shows.

Evaluation of the best performing decision tree
In the following, we take a closer look at the results of the best performing predictive modela decision tree with the mentioned properties. In terms of feature importance, there is a distinct outcome regarding the impact on prediction. In Table  2 the feature importance of the ten highest ranked features with an importance greater than 1.0% is presented. The attributes age with 29.09% and duration of employment with 26.87% have the biggest impact, followed by the individual attendance history and weekly working hour difference. Overall, most attributes have an importance of up to 2%. Table 3 shows a confusion matrix that presents

Conclusion
A robust roster is essential for undisrupted business operations in an inbound call center. To increase robustness against short-term absences, staff rostering should take uncertainty in capacity into account. The presence of a service agent is a major uncertainty factor and therefore its consideration is important. This paper discusses whether the use of predictive analytics is suitable for predicting agent absences. Different predictive analytics approaches are examined. An evaluation based only on the classification accuracy is not sufficient for imbalanced data, which is why we additionally use the true positive rate to select the most suitable predictive model. A decision tree has proven to be the best model in our computational study. To increase generalization, stopping criteria for reduced decision trees are derived.
The best model is further evaluated and compared with two baseline cases: The majority classifier and a 10% staff surcharge. Using the expected value framework, we apply additional costs per planned hour from practice to get an insight into practical economic effects. Compared to the baseline cases, our prediction model leads to significantly lower additional costs. In addition, the number of disruptions that lead to understaffing can be reduced and thus the roster robustness increased. Our results show that using predictive analytics for prediction of agent absences is useful in order to achieve a more efficient staff rostering. This is meaningful both for research and practice.
In this paper, we do not consider possible correlations and the impacts of the prediction of absences on the entire staff rostering process. An inclusion of prediction models in the determination of work requirements and consecutive steps is an issue that could be addressed in subsequent studies. To validate the results and findings, predictive analysis based on further data is required. In addition, other methods such as neural networks and linear modeling may have potential for predicting agent absences.
4. If an absent agent is not detected and therefore no replacement is planned, additional costs in the amount of 50 €/h are assumed.
The expected value is then calculated based on the hours of each shift and its classification. For this model, the expected value reveals costs of 2.97€ per planned hour.
In order to evaluate this value and thus the benefit of the developed model, we also calculate the expected value per planned hour for two comparable cases. First, a baseline performance using the majority classifier is estimated. Due to the high number of hours in which an agent is actually present and thus the baseline prediction applies, the accuracy of the majority classifier is 89.86%. However, there are obviously no absent cases predicted correctly and 1,939 shift misclassifications result which corresponds to 12,162.5 hours. The expected value of the baseline model reveals costs of 4.97€ per planned hour. Second, we adapt a general staff surcharge of 10%, like it is practiced in the call center. This approach matches the identified absence-rate of 10.14% quite well. This second baseline model performs with a classification accuracy of 81.89% not as good as the other two. Many agents are predicted to be absent even though they are not. The true positive rate is 10%. Using such a general surcharge of agents results in 3,463.2 misclassifications and the expected value shows costs of 5.67€ per planned hour.
Comparing these three approaches, it becomes clear that across all evaluation criteria the decision tree developed in this study performs best. The prediction accuracy is higher and the expected costs as well as the number of misclassifications are lower. Hence the developed model is appropriate for its application.