Explainable machine learning model for load-carrying capacity prediction of FRP-confined corroded RC columns

Huihui Li; Haoran Li; Shuwen Deng; Qian Chen

doi:10.70465/ber.v2i1.18

Authors

Huihui Li Department of Civil & Environmental Engineering, Hong Kong Polytechnic University, Hong Kong, China
Haoran Li College of Civil Engineering, Chongqing University, Chongqing, China
Shuwen Deng College of Water Resources & Civil Engineering, Hunan Agricultural University, Changsha, China
Qian Chen Thornton Tomasetti Inc., New York 10031, NY, USA

DOI:

https://doi.org/10.70465/ber.v2i1.18

Keywords:

FRP-confined corroded RC columns; deteriorating effect; machine learning; XGBoost algorithm; SHAP technique; empirical models

Abstract

This paper proposed a novel explainable machine learning (ML) model to predict the axial load-carrying capacity (P_max) of FRP-confined corroded RC columns utilizing the eXtreme Gradient Boosting (XGBoost) algorithm and Shapley Additive exPlanations (SHAP) technique. The XGBoost predictive model was constructed based on thorough database of experimental tests for 285 FRP-confined corroded RC columns collected from existing studies and those performed by the authors. Twenty parameters were taken into account as critical input variables to develop the predictive model. SHAP technique was employed for performing the importance evaluation and interpreting the prediction performance of XGBoost model. Additionally, feasibility and effectiveness of the constructed XGBoost model were assessed by using several empirical design models and some other ensemble ML algorithms. The results indicated that, (i) the suggested XGBoost model was validated to be feasible to predict P_max of FRP-confined corroded RC columns; (ii) the SHAP technique provided good explainability and interpretability to the XGBoost predictive model; (iii) the input variables could be comprehensively studied concerning the feature importance through SHAP technique, and the most important ones affecting the determination of P_max of FRP-confined corroded RC columns were the gross sectional area of column, FRP thickness, elastic modulus of FRP, eccentricity ratio, corrosion rate, and concrete compressive strength; (iv) the prediction effectiveness and feasibility of the proposed XGBoost model were significantly superior to those of the existing empirical models and other ML algorithms, and the mean values of R², RMSE, MAE, and MAPE of the XGBoost model were 0.978, 122 kN, 703.6 kN, and 7.7%, respectively; and (v) the recommended XGBoost model could offer the alternative approach to determine P_max of FRP-confined corroded RC columns for design practices, in addition to the current mechanics-based design models.

Downloads

Download data is not yet available.

Introduction

Due to their superior structural resistance, RC structures are widely applied for the protective design of civil infrastructures nowadays.¹^,² However, they are prone to numerous deterioration mechanisms because of environmental effects and climate change, such as erosion, carbonation, freeze-thaw cycles, fatigue, and chloride-induced corrosion (CIC).²^–⁸ Among these deterioration effects, CIC could lead to significant corrosion of steel bars and has been recognized as one of the primary causes impairing the mechanical properties and durability of aging RC structures. Numerous studies have focused on the deterioration impacts of CIC on the degraded structural response and load-carrying capacity of aging RC structures.²^,⁹^–¹¹ RC columns are critical structural members of many highway bridges and buildings, and the tragic damage caused by CIC could trigger progressive collapse.¹^,²^,¹²^,¹³ In addition, the structural redundancy of RC columns is generally weaker than their beams and slab counterparts.²^,¹² Thus, it is significant to improve the deteriorated structural resistance and structural performance of corroded RC columns, reduce tremendous social and economic losses, and, more importantly, mitigate human casualties.¹^,²^,¹³^,¹⁴ This also necessitates investigations regarding how to improve the deteriorated resistance and residual strength of corroded RC columns, which is one of the primary research focuses of this study.

Fiber-reinforced polymer (FRP) has been widely employed to strengthen and retrofit corroded RC structures because of its inherited advantages of high strength, lightweight, superior corrosion resistance, simple on-site construction, and lower maintenance expense.¹⁵^–¹⁹ The advantages of FRP in strengthening the corroded RC structures mainly depend on the following aspects.²⁰^,²¹ Firstly, FRP-wrapped structures could apply the confining pressure to offset the expansive forces generated by corrosion products. Secondly, FRP composites could act as the physical diffusion barrier to prevent the ingress of chloride ions and oxygen into RC structures, delaying corrosion of steel bars and thus protecting them from CIC.²⁰^,²¹ Thus, strengthening or rehabilitation of corroded RC columns by wrapping FRP composites has been extensively investigated both experimentally¹⁶^,¹⁷^,²²^,²³ and theoretically.²⁴^,²⁵ Moreover, significant efforts were dedicated to investigating the structural response and mechanical properties of FRP-strengthened RC columns, i.e., stress–strain behavior,²⁶^–²⁸ seismic performance,¹¹^,²⁹^–³³ and axial and eccentric compression behavior.¹⁶^,³⁴^–³⁷ These studies indicate that additional confinement provided by the wrapped FRP composites could significantly enhance the structural resistance of corroded RC columns.

However, numerous studies mainly focused on the mechanical performance of uncorroded RC columns. In contrast, limited ones were performed to examine the strength prediction of corroded RC columns confined by FRP composites (FRP-confined corroded RC columns). Zhou et al.¹¹ experimentally studied the seismic behavior of several corroded RC columns strengthened by FRP. They found that corrosion of steel bars could significantly deteriorate the strength and ductility of columns. Bae and Belarbi¹⁶ experimentally studied the corrosion of steel bars on the bearing capacity of CFRP-wrapped corroded RC columns. They found that CFRP wrapping was helpful in decreasing the steel corrosion rate and reducing the degradation of stiffness and bearing ability of columns. Dai et al.³³ studied the deformation ability of FRP-retrofitted corroded RC columns and suggested an improved prediction model for the evaluation of the yield rotation of columns. In addition, Chotickai et al.³⁴ experimentally examined the influence of corrosion damage and volumetric CFRP ratio on the eccentric compressive behavior of CFRP-strengthened corroded RC columns. They suggested that the effectiveness of CFRP jacketing in enhancing the ultimate compressive strength of the corrosion-damaged columns depended on the volumetric CFRP ratio, and CFRP jacketing with a higher volumetric CFRP ratio could achieve a more effective confinement contribution and restore a more effective cross-sectional area of the cracked concrete. Li et al.³⁷ studied the effects of corrosion-induced damages under different corrosion rates of steel rebar on the structural behavior of several LRS-FRP-confined corroded RC columns. They observed that steel rebar corrosion could accelerate the steel rebar’s bucking and concrete deterioration, thus reducing the ultimate compressive strength of columns. Also, compared with the unconfined corroded ones, the load-carrying capacity (P_max), ductility, and energy-absorbing capacity of LRS-FRP-confined corroded RC columns were much superior, indicating the effective confinement provided by LRS-FRP composites.

Based on the above-mentioned literature review, most of the previous studies mainly considered corrosion of steel bars through the degradation of the steel rebar’s cross-sectional area. However, in practical situations, the deterioration effects of CIC are more complicated. In addition to the degradation of the rebar’s area, the deterioration effects of CIC should be non-uniform. Non-uniform CIC could lead to many other secondary effects, such as (i) degradation of the yield and ultimate strengths of rebar and (ii) degradation of the compressive strengths of cover and confined concrete.²^,⁴^,⁸^,¹⁰ The accumulation of the corrosion products could also lead to the cover concrete being cracked and spalled off, which would further impair the bond strength between the steel rebar and concrete.¹¹^,³⁸^,³⁹ Besides, apart from its deterioration impacts on the degradation of material properties, non-uniform CIC could also result in the degradation of stiffness, ductility, and P_max of corroded RC columns, particularly those under compression.³⁴^,³⁶^,⁴⁰^–⁴² Moreover, corrosion of steel bars induced by CIC could affect the strain distributions of FRP composites and, thus, further impairing the confinement efficiency for FRP-confined RC columns.²⁹^,³¹^,³²^,³⁹ Thus, because of the combined action of FRP confinement and corrosion-induced damages, it is difficult to predict the P_max of FRP-confined corroded RC columns accurately. Although several existing available empirical models suggested by some scholars⁴³^–⁴⁵ could be employed to predict P_max of FRP-confined RC columns, the feasibility should be further validated. Additionally, since these empirical models were developed based on predefined formulas, and a limited number of test results, there should exist significant discrepancies in predicting the P_max of columns.¹⁸^,⁴⁶ Therefore, it is necessary to develop an accurate model for predicting the P_max of FRP-confined corroded RC columns for the safe design and retrofitting purposes.

Recently, with the rapid development of computing technology, data-driven and machine learning (ML) algorithms have emerged as robust and powerful techniques to address many complicated civil engineering problems.¹⁸^,¹⁹^,⁴⁶^–⁵⁶ Compared to conventional empirical models, the featured merits of ML algorithms are primarily attributed to their capability to assess the relationship between the input critical variables and output parameters without the requirements of the prior setting of assumptions and the predefined mathematical or physical models.⁴⁶ Hence, many scholars have applied ML algorithms as one of the primary alternative techniques to determine the compressive strength,⁴⁷^–⁴⁹^,⁵⁷ stress–strain model,⁵²^,⁵⁴ and load-carrying capacity or failure modes of RC members with superior accuracy.¹⁸^,⁴⁶^,⁵¹^,⁵³^,⁵⁶

Owing to the superior computation efficiency and strong capability in modeling datasets, eXtreme Gradient Boosting (XGBoost) is known as one of the most advanced ML algorithms.¹⁸ Thus, the XGBoost algorithm has been widely applied in civil engineering.¹⁸^,¹⁹^,⁴⁶^,⁵⁸^–⁶² For instance, to predict the P_max of FRP-RC columns, Bakouregui et al.¹⁸ developed the XGBoost model based on 283 experimental results for FRP-RC columns, and the effectiveness and feasibility of the model were evaluated through several code-based design models and empirical equations. They suggested that the XGBoost predictive model outperformed the numerical equations and code-based design models. Liu et al.¹⁹ developed an XGBoost model to predict the life-cycle mechanical performance of the pultruded FRP composites. They suggested that the XGBoost model could provide a good prediction interpretability, and its prediction results agreed well with the test data. Similarly, to develop the predictive model in determining the flexural capacity of FRCM-strengthened RC beams, Wakjira et al.⁴⁶ assessed the prediction performance of the XGBoost algorithm and the other six ML models. They suggested that the XGBoost model outperformed other ML algorithms and exhibited optimal accuracy. Likewise, based on a comprehensive experimental database, Ma et al.⁶² proposed a novel XGBoost algorithm for predicting the P_max of CFRP-confined CFST columns with superior efficiency and accuracy. Thus, the aforementioned studies have confirmed that the XGBoost model has high computational efficiency, and a well-trained XGBoost predictive model could achieve reasonable prediction results with excellent accuracy. Therefore, this paper proposes to employ the XGBoost algorithm to predict the P_max of FRP-confined corroded RC columns.

However, the XGBoost algorithm also has several inevitable limitations. For example, similar to other typical ML models, the XGBoost algorithm is considered as “black boxes” owing to it is usually impossible to explain the involved mechanisms.¹⁸^,⁴⁶ Thus, the explainability of ML models should be an imperative step to support a desirable prediction. In this regard, to achieve the interpretable and explainable XGBoost model, the Shapley Additive exPlanations (SHAP) technique⁶³ could be utilized. However, to date, very limited research has focused on the interpretability and explainability of ML algorithms using the SHAP technique.¹⁸^,¹⁹^,⁴⁶^,⁵⁵^,⁶⁴^,⁶⁵

Therefore, this study aims to propose a novel, explainable predictive model to achieve an alternative and robust prediction of P_max for FRP-confined corroded RC columns. Firstly, the XGBoost predictive model is constructed based on the thorough test results of 285 FRP-confined corroded RC columns, including 231 experimental tests gathered from the existing studies reported in the literature and 54 from the authors. Then, through the correlation analysis, twenty parameters are selected as the critical input variables to construct the XGBoost model. Subsequently, the SHAP framework is applied to assess the feature significance of the input variables and interpret the XGBoost model. In addition, the capability and prediction performance of the model are compared and validated through several empirical design models reported in the literature and some widely used ML algorithms, such as the decision tree (DT), random forest (RF), and gradient boosting decision tree (GBDT). Finally, some major conclusions and possible future investigations are summarized.

Methodology

XGBoost algorithm

Fig. 1 shows the schematic information of the XGBoost algorithm. As seen from Fig. 1, the XGBoost framework mainly consists of several root nodes, a number of internal nodes, branches, and leaf nodes. Besides, the XGBoost algorithm is known as an advanced implementation, and it employs an additive strategy, which can be mathematically represented by Eq. (1) below.¹⁸^,¹⁹

where ${\hat{y}}_{i}$ is the predicted response with respect to the input X_i; M is the total number of classifications and regression trees (CARTs) (i.e., m = 1, 2, ···, M); and f_m (X_i) is the predicted response of each CART. After the prediction results are attained, the objective function (L) is required to assess the performance and accuracy of the results. During the development of XGBoost model, L can be expressed by,¹⁹

(1)

{\hat{y}}_{i} = \sum_{m = 1}^{M} f_{m} (X_{i})

(2)

L = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

As given in Eq. (2), n is the total number of datasets (i.e., i = 1, 2,···, n), K is the total number of trees (as illustrated in Fig. 1) (i.e., k = 1, 2,···, K), and L contains two different parts, including (i) loss function $l (y_{i}, {\hat{y}}_{i})$ and (ii) regularization item $Ω$ , which can be represented by, where T is the number of leaf nodes of a CART (i.e., j = 1, 2, ···, T); ω_j is the predicted value of the j^th leaf node; γ and λ are the hyperparameters of the model. To minimize L and attain the optimized predictions, the XGBoost model training is generally required. Such a training process is an optimization problem, which should be performed in a step-by-step manner. During each step, a new CART is developed based on the existing CARTs, so L can be further minimized. Thus, the objective function of the t^th step can be determined by,

(3)

Ω (f) = γ T + \frac{1}{2} λ \cdot \sum_{j = 1}^{T} ω_{j}^{2}

(4)

\begin{aligned} L^{(t)} & = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(t)}) + \sum_{i = 1}^{t} Ω (f_{i}) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})) \\ + \sum_{i = 1}^{t - 1} Ω (f_{i}) + Ω (f_{t}) \end{aligned}

During the t^th step, the existing (t − 1) CART is usually known and it can be considered as a constant. Thus, the objective function L⁽^t⁾ can be further simplified as,

(5)

L^{(t)} = \sum_{i = 1}^{n} l [y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})] + Ω (f_{t}) + c

In addition, the second-order Taylor approximation can be employed to optimize the L⁽^t⁾, so Eq. (5) can be further transformed into Eq. (6).

(6)

\begin{aligned} L^{(t)} & = \sum_{i = 1}^{n} l [y_{i}, {\hat{y}}_{i}^{(t - 1)} + g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] \\ + Ω (f_{t}) + c \end{aligned}

In which,

(7)

g_{i} = \frac{\partial l [y_{i}, {\hat{y}}_{i}^{(t - 1)}]}{\partial {\hat{y}}_{i}^{(t - 1)}}

(8)

h_{i} = \frac{\partial^{2} l [y_{i}, {\hat{y}}_{i}^{(t - 1)}]}{\partial {[{\hat{y}}_{i}^{(t - 1)}]}^{2}}

Moreover, for the loss function l (·), the only requirement is that it should permit the second-order derivative.¹⁹ Additionally, because the input variables X_i should be projected to the leaf nodes of the CARTs, f_k (X_i) can be represented by, where q(X_i) is a map function; ω is the leaf node value; d is the attribute number of the input X_i; and R^T and R^d are the T-dimensional and d-dimensional vectors, respectively. Submitting Eqs. (3), (7)–(9) into Eq. (6), L⁽^t⁾ can be determined by,

(9)

f_{k} (X_{i}) = ω_{q (X_{i})}, ω \in R^{T}, q : R^{d} \to {1, 2, \cdot \cdot \cdot, T}

(10)

\begin{aligned} L^{(t)} & \approx \sum_{i = 1}^{n} [g_{i} ω_{q (X_{i})} + \frac{1}{2} h_{i} ω_{q (X_{i})}^{2}] + γ T + \frac{1}{2} λ \sum_{j = 1}^{T} ω_{j}^{2} + c \\ = \sum_{j = 1}^{T} [(\sum_{i \in I_{j}} g_{i}) \cdot ω_{j} + \frac{1}{2} \cdot (\sum_{i \in I_{j}} (h_{i} + λ) ω_{j}^{2})] \\ + γ T + c \end{aligned}

Letting $G_{j} = \sum_{i \in I_{j}} g_{i}$ and $H_{j} = \sum_{i \in I_{j}} h_{i}$ , Eq. (10) can be further simplified as,

(11)

L^{(t)} = \sum_{j = 1}^{T} [G_{j} ω_{j} + \frac{1}{2} \cdot (H_{j} + λ) ω_{j}^{2}] + γ T + c

To obtain L_min, the first derivative of Eq. (11) can be acquired, and hence L_min can be determined by using Eq. (12).

(12)

L_{min} = \frac{1}{2} \cdot \sum_{j = 1}^{T} \frac{G_{j}^{2}}{H_{j} + λ} + γ T + c

Additionally, L_min can be achieved when ω_j is represented by,

(13)

ω_{j} = - \frac{G_{j}}{H_{j} + λ}

Explaining the XGBoost model using the SHAP technique

Owing to the difficulty within the interpretation and explanation of the involved mechanisms of ML models, they are usually considered as “blacked boxes.” Both the interpretability and explainability of the models are important in understanding the complicated nonlinear relationships between the input and output variables of ML algorithms.¹⁸^,⁵¹ In which, interpretability is usually defined as the ability to explain or to provide meaning in understandable terms to a human.¹⁸ Besides, explainability is associated with the notion of explanation as an interface between humans and a decision-maker, that is, at the same time, both an accurate proxy of the decision-maker and comprehensible to humans.¹⁸ Explanations supporting the output of an ML model are crucial, especially in civil engineering. The Shapley Additive exPlanations (SHAP) technique proposed by Lundberg and Lee⁶³ is one of the explainable artificial intelligence (XAI) tools that can be used to explain these complex models. The SHAP technique is a unified approach to explain the output of any ML model. The SHAP technique aims to provide local explainability by building surrogate models based on the ML models. The SHAP technique has a fast implementation for tree-based models, and it is very popular in interpreting ML models.¹⁸ Thus, in this study, the SHAP technique is employed to interpret and explain the developed XGBoost predictive model.

The SHAP algorithm calculates the contribution of each input variable to the prediction for each observation. This contribution is calculated by using the input variables and the prediction. SHAP values are based on conditional expectation and Shapley game theory, whose aims are to investigate how each feature affects the prediction. The Shapley game theory aims at distributing the total gain or payoff among players, depending on the relative importance of their contributions to the final outcome of a game.¹⁸ In order to generate an interpretable and explainable predictive model, the SHAP technique employs an additive feature attribution, e.g., an output model is defined as a linear addition of the input variables. Assuming a model with input variables x = (x₁, x₂, …, x_n), where n is the number of input variables, the explanation model g(x′) with simplified input x′ for an original model f(x) can be expressed as¹⁸^,⁵¹^,⁵⁶: where N is the number of all input features; φ₀ is a constant when all input variables are missing; and φ_j is the contribution of the j^th feature to the model output, which is the core computed SHAP value. The input variables x and x′ are correlated through a mapping function, x = h_x(x′). Generally, Eq. (14) can be illustrated by Fig. 2, in which φ₀, φ₁, φ₂, and φ₃ increase the predicted value of g(x′), while φ₄ decreases this value. According to Lundberg and Lee,⁶³ a unique solution should exist for Eq. (14), which has three desirable features, i.e., (i) local accuracy, (ii) missingness, and (iii) consistency.⁵¹^,⁵⁶ In specific, local accuracy ensures that the output of the function is the sum of the feature attributions and requires the model to match the output of f (·) for the simplified input x′. The local accuracy happens when x = h_x(x′). Missingness ensures that no importance is assigned to missing features. As x_i′ = 0 implies ϕ_i = 0 (i.e., ϕ_i is the Shapely value), missingness is satisfied. Through consistency, changing a larger impact feature will not decrease the attribution assigned to that feature. For a setting z′\i when z_i ′ = 0, $f_{x}^{'} (z^{'}) - f_{x}^{'} (z^{'} \ i) \geq f_{x} (z^{'}) - f_{x} (z^{'} \ i)$ implies $ϕ_{i} (f^{'}, x) \geq ϕ_{i} (f, x)$ . Thus, the only possible model that satisfies these properties can be determined by,⁵¹ where $| z^{'} |$ is the number of non-zero entries in z′; Lundberg and Lee⁶³ suggested a solution to Eq. (15) where $f_{x} (z^{'}) = f (h_{x} (z^{'})) = E [f (z) | z_{S}]$ ; and S is the set of non-zero indices z′, which is known as SHAP values.

(14)

f (x) = g (x^{'}) = φ_{0} + \sum_{j = 1}^{N} φ_{j} x_{j}^{'}

(15)

ϕ_{j} (f, x) = \sum_{z^{'} \subseteq x^{'}} \frac{| z^{'} |! \cdot (N - | z^{'} | - 1)!}{N!} \cdot [f_{x} (z^{'}) - f_{x} (z^{'} \ j)]

Based on the aforementioned introductions, the SHAP technique can provide good explanations for local and global models. SHAP values can be approximated by various methods, such as Kernel SHAP, Deep SHAP, and Tree SHAP.¹⁸ Among these methods, Tree SHAP, a version of SHAP for tree-based ML models (e.g., decision trees, random forest (RF), and gradient-boosted trees (i.e., XGBoost and CatBoost)), is used in this study. Tree SHAP considers tree-based models alongside an input dataset X of size N×M and produces an N×M matrix with the SHAP values. The SHAP interaction values guarantee consistency in explaining the effects of interaction on individual predictions. The two unique advantages of SHAP values are its global and local interpretability. Contrary to the existing important features in ML models, the SHAP technique can identify whether the contribution of each input feature is positive or negative. Also, each observation can get its SHAP value. Thus, the SHAP can help interpret the model globally as well as locally. A more detailed description and application of the SHAP technique in civil engineering practice could be referred to several previous researches.¹⁸^,⁵¹^,⁵⁶

Determination of the XGBoost Predictive Model

Experimental database

To establish the XGBoost predictive model, a comprehensive database of experimental tests for 285 FRP-confined corroded RC columns was collected from 16 previous studies (231 specimens) and those conducted by the authors (54 specimens), as summarized in Tables 1 and 2, respectively. As per the collected columns, 202 were circular and 83 were square or rectangular specimens. Additionally, the collected specimens consisted of 225 and 60 columns under concentric and eccentric compression, respectively. As summarized in Table 3, the experimental database included 20 critical parameters. In addition, Table 4 summarizes the statistical information. As seen from Table 4, the tensile strength of FRP (F_frp), load-carrying capacity (P_max), and gross cross-sectional area of the collected columns tended to exhibit the largest variations.

Table 1. Summary of the existing studies used to develop the experimental database
No.	References	Number of specimens and type of loading
		Concentric compression	Eccentric compression
1	Bae and Belarbi¹⁶	7	—
2	Li et al.³⁷	16	—
3	Tastani and Pantazopoulou⁶⁶	11	—
4	Jayaprakash et al.⁶⁷	—	15
5	Chotickai et al.⁶⁸	—	12
6	Maaddawy⁶⁹	—	12
7	Radhi et al.⁷⁰	8	—
8	Nematzadeh et al.⁷¹	—	9
9	Shaikh and Alishahi⁷²	4	12
10	Bai et al.⁷³	6	—
11	Shan⁷⁴	34	—
12	Yu⁷⁵	16	—
13	Li et al.⁷⁶	3	—
14	Wen⁷⁷	10	—
15	Chen⁷⁸	28	—
16	Gao⁷⁹	28	—

Table 2. Detailed information and experimental results of the specimens tested in this study
No.	SpecimenID	D(mm)	H(mm)	b(mm)	h(mm)	A_g(mm²)	Circular	ρ(%)	f_c(MPa)	FRP _type	N _frp	t_frp(mm)	E_frp(GPa)	F_frp(MPa)	L _type	ρ_s (%)	E_bar(GPa)	F_bar(MPa)	η (%)	P_max(kN)
1	A0-1	150	300	–	–	17662.5	Yes	1	25.3	CFRP	0	0.167	265	4525	4NO.12	2.56	201	445	0	513.1
2	A0-2	150	300	–	–	17662.5	Yes	1	25.3	CFRP	0	0.167	265	4525	4NO.12	2.56	201	445	0	525.1
3	A0-3	150	300	–	–	17662.5	Yes	1	25.3	CFRP	0	0.167	265	4525	4NO.12	2.56	201	445	0	482.6
4	A12.5-1	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	16.32	319.5
5	A12.5-2	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	14.87	388.2
6	A12.5-3	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	14.22	426.5
7	AF0-1	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	0	1094.7
8	AF0-2	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	0	1044.5
9	AF0-3	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	0	1023.4
10	AF5-1	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	8.38	1068.9
11	AF5-2	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	7.11	1064.8
12	AF5-3	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	7.34	1062.5
13	AF12.5-1	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	14.74	972.5
14	AF12.5-2	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	15.67	946.2
15	AF12.5-3	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	15.01	944.8
16	AF20-1	150	300	–	–	17662.5	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.12	2.56	201	445	23.24	802.5
17	AF20-2	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	445	24.87	892.9
18	AF20-3	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	445	23.05	906.5
19	B0-1	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	0	903.4
20	B0-2	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	0	929.4
21	B0-3	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	0	808.7
22	B12.5-1	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	15.44	746.5
23	B12.5-2	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	15.13	764.3
24	B12.5-3	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	16.07	662.7
25	BF0-1	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	0	1801.1
26	BF0-2	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	0	1688.4
27	BF0-3	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	0	1724.8
28	BF5-1	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	7.12	1685.5
29	BF5-2	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	7.42	1674.5
30	BF5-3	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	8.14	1443.3
31	BF12.5-1	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	14.19	1371.2
32	BF12.5-2	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	15.20	1300.5
33	BF12.5-3	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	15.56	1306.7
34	BF20-1	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	23.61	1179.0
35	BF20-2	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	24.22	1117.6
36	BF20-3	200	400	–	–	31400.0	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.16	2.56	230	460	23.06	1212.8
37	C0-1	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	0	1538.9
38	C0-2	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	0	1485.6
39	C0-3	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	0	1519.9
40	C12.5-1	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	14.57	1081.7
41	C12.5-2	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	13.88	1124.9
42	C12.5-3	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	13.61	1112.4
43	CF0-1	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	0	2350.1
44	CF0-2	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	0	2202.2
45	CF0-3	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	0	2274.1
46	CF5-1	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	7.24	2261.2
47	CF5-2	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	7.33	2256.4
48	CF5-3	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	6.98	2290.9
49	CF12.5-1	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	14.25	1970.2
50	CF12.5-2	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	14.06	1999.9
51	CF12.5-3	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	15.11	1918.3
52	CF20-1	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	23.37	1719.9
53	CF20-2	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	22.49	1636.6
54	CF20-3	250	500	–	–	49087.4	Yes	1	25.3	CFRP	1	0.167	265	4525	4NO.20	2.56	220	455	22.82	1694.5

Table 3. Descriptions and representations of the input/output variables
Variables	Parameters (units)	Notation
Input	Diameter of circular cross-section (mm)	D
	Column height (mm)	H
	Width of rectangular cross-section (mm)	b
	Height of rectangular cross-section (mm)	h
	Column gross cross-sectional area (mm²)	A _g
	Column section type	Section _type
	Corner radius (mm)	r
	Corner radius ratio	ρ
	Compressive strength of concrete (MPa)	f _c
	Type of fiber-reinforced polymer	FRP _type
	Layer number of FRP	N _frp
	Thickness of FRP	t _frp
	Elastic modulus of FRP	E _frp
	Tensile strength of FRP	F _frp
	Type of longitudinal reinforcement	L _type
	Longitudinal reinforcement ratio (%)	ρ _s
	Elastic modulus of steel reinforcement (GPa)	E _bar
	Yield strength of the steel reinforcement (MPa)	F _bar
	Eccentricity ratio (%)	e _r
	Corrosion rate (%)	η
Output	Load-carrying capacity (kN)	P _max

Table 4. Summary of the statistical information of the input variables
Input and output variables	Minium	Mean	Standard deviation	Maximum
D (mm)	100	158.85	38.26	203
H (mm)	150	504.45	261.44	1375
b (mm)	120	150.96	27.17	200
h (mm)	120	150.96	27.17	200
A_g (mm²)	7850	21750.21	9132.59	40000
r (mm)	0	7.32	15.47	75
ρ	0	0.78	0.35	1
f_c (MPa)	17.7	33.06	7.35	47
t_frp (mm)	0	0.23	0.23	1.68
E_frp (GPa)	0	162.65	111.87	280
F_frp (MPa)	0	2826.66	1664.18	4900
ρ_s (%)	0.89	2.75	1.29	6.79
E_bar (GPa)	199.1	205.80	11.73	237
F_bar (MPa)	210	412.57	82.53	550
e_r (%)	0	0.19	0.52	3.44
η (%)	0	10.02	9.62	51
P_max (kN)	36.9	915.59	628.67	2536.11

Determination of the input variables

Reasonable determination of the input variables is significant to accurately predict the P_max of FRP-confined corroded RC columns. Thus, a comprehensive investigation of the constructed experimental database was conducted by determining the correlation coefficient (φ_k) and corresponding statistical importance.¹⁸ The primary aim of correlation analysis is to investigate the potential association relationship between the independent input parameters and output response. The concept of φ_k was proposed by Baak et al.,⁸⁰ and it has several advantages. The statistical importance is usually utilized to determine the accuracy and relevance of φ_k. Indeed, a high coefficient of correlation might be statistically significant or insignificant. On the other hand, a small correlation might be very significant. The statistical significance of each correlation was based on a hybrid method of Monte Carlo simulations (MCS) and adjustments of Pearson’s χ² test.¹⁸^,⁸⁰ The significance is obtained by converting the p-value of the hypothesis test to a normal Z-score. The significance is defined as follows: where Z is the significance in 1-sided Gaussian standard deviations and $Φ^{- 1}$ is the quantile of the standard Gaussian. It should be noted that the input variables were simplified before the correlation analysis. For example, D, H, b, and h were integrated by using A_g. r was simplified by ρ, which is more suitable in practical situations. For the input material properties of FRP composites, FRP_type could be represented by E_frp and F_frp, as well as N_frp could be represented by t_frp. Similarly, for the input material properties of steel bars, L_type could be represented by ρ_s. Fig. 3 shows the φ_k and statistical significance matrixes of the input parameters. φ_k varies between 0 and 1, where 0 means no association and 1 means complete association, respectively.

(16)

Z = Φ^{- 1} (1 - p)

(17)

Φ (z) = \frac{1}{\sqrt{2 π}} \cdot \int_{- \infty}^{z} e^{- t^{2} / 2} d t

As illustrated in Fig. 3, a darker color means a more pronounced correlation. For the selected variables, P_max of the specimens exhibited a strong correlation with f_c (φ_k = 0.81, significance = 8.44), A_g (φ_k = 0.77, significance = 8.45), t_frp (φ_k = 0.76, significance = 5.87), E_bar (φ_k = 0.76, significance = 6.86), F_frp (φ_k = 0.75, significance = 6.93), and F_bar (φ_k = 0.75, significance = 9.38), respectively. Likewise, P_max also correlated well with ρ_s (φ_k = 0.71, significance = 7.38), e (φ_k = 0.71, significance = 7.17), E_frp (φ_k = 0.67, significance = 7.13), and ρ (φ_k = 0.63, significance = 4.8), respectively. Fig. 4 shows the linear regression analysis results of P_max of the columns with different input variables. Obviously, as seen from Fig. 4, the P_max of the specimens exhibited an increasing trend with the increase of A_g, ρ, f_c, t_frp, E_frp, F_frp, E_bar, and F_bar, but it decreased with the increase of ρ_s, e, and η. Based on the above-mentioned preliminary correlation analyses, the following function was considered as the XGBoost predictive model for predicting the P_max of FRP-confined corroded RC columns.

(18)

P_{m a x} = f (A_{g}, ρ, f_{c}, t_{f r p}, E_{f r p}, F_{f r p}, ρ_{s}, E_{b a r}, F_{b a r}, e, η)

Model training and performance evaluations

In this study, the experimental database was randomly categorized into two different parts, including (1) the training datasets and (2) the testing datasets. In specific, 80% and 20% of specimens were used to construct the training and testing datasets, respectively. The former was employed to train the model and parameter evaluation, whereas the latter was taken for model assessment. Thus, in the present developed XGBoost predictive model, the training and testing datasets had 228 and 57 FRP-confined corroded RC columns, respectively. As per the model training, the effectiveness and capability of the XGBoost model were assessed by using several crucial measures, including (i) the coefficient of determination (R²); (ii) root mean square error (RMSE); (iii) mean absolute error (MAE); and (iv) mean absolute percentage error (MAPE).¹⁸^,¹⁹ Their mathematical expressions are given in the following equations. where m is the number of data points; P_max and ${\hat{P}}_{m a x}$ are the experimental and predicted ultimate strengths of columns, respectively; and ${\bar{P}}_{m a x}$ is the mean value of test results, which can be determined by Eq. (23). Among these statistical measures, a larger R² (i.e., close to 1.0) and the smaller values of RMSE, MAE, and MAPE indicate superior prediction accuracy of the model.

(19)

R^{2} (P_{m a x}, {\hat{P}}_{m a x}) = 1 - \frac{\sum_{i = 1}^{m} {(P_{m a x} (i) - {\hat{P}}_{m a x} (i))}^{2}}{\sum_{i = 1}^{m} {(P_{m a x} (i) - {\bar{P}}_{m a x})}^{2}}

(20)

RMSE (P_{m a x}, {\hat{P}}_{m a x}) = \sqrt{\frac{1}{m} \cdot \sum_{i = 0}^{m - 1} {(P_{m a x} (i) - {\hat{P}}_{m a x} (i))}^{2}}

(21)

MAE (P_{m a x}, {\hat{P}}_{m a x}) = \frac{1}{m} \cdot \sum_{i = 0}^{m - 1} | P_{m a x} (i) - {\hat{P}}_{m a x} (i) |

(22)

MAPE (P_{m a x}, {\hat{P}}_{m a x}) = \frac{100}{m} \cdot \sum_{i = 0}^{m - 1} | \frac{P_{m a x} (i) - {\hat{P}}_{m a x} (i)}{P_{m a x} (i)} |

(23)

{\bar{P}}_{m a x} = \frac{1}{N} \cdot \sum_{i = 1}^{N} P_{m a x} (i)

Model tuning and cross-validations

The performance and effectiveness of the XGBoost prediction model could be enhanced by determining the optimal combination of hyperparameter values. Grid search, random search, and Bayesian optimization methods are the most common techniques to tune machine-learning models.¹⁸ In this study, the hyperparameters of the model were optimized through k-fold cross-validations combined with randomized and grid searches. The initial hyperparameters were determined by the randomized search, and then the acquired ones were further optimized using the grid search. Subsequently, the training dataset was randomly divided into k folds, in which (k − 1) folds were utilized for the model training, and 1-fold was used for performance assessment during the k-fold cross-validation process. Such a process would be repeated k times, where each of the k subsamples was employed once as the validation data. 10-fold cross-validation was used in this study. Hence, during each cross-validation process, 90% of the dataset was used as the training set, while the remaining 10% was used for performance assessment of the model. Results of the randomized and grid searches are summarized in Table 5.

Table 5. Randomized and grid search values investigated by the hyperparameter tuning and cross-validations
Hyperparameters	Description	Lower limit	Upper limit	Best hyperparameters
				All data	Concentric	Eccentric
n_estimators	Number of gradient-boosted trees	0	200	150	150	150
max_depth	Maximum tree depth for base learners	1	13	8	7	6
learning_rate	Step size shrinkage used in the update to prevent overfitting	0.1	1	0.11	0.25	0.2
subsample	Subsample ratio of the training instances	0	1	1	1	1
colsample_bytree	Subsample ratio of columns when constructing each tree	0	1	1	0.95	0.75
alpha	L1 regularization term on weights	0	1	1	0.85	0.85

Prediction Results and Discussions

Performance evaluation of the XGBoost model

As introduced in Sect. 2, the XGBoost algorithm builds the sequential trees. With this regard, a single XGBoost decision tree from the training model is presented in Fig. 5. As shown in Fig. 5, the root node was A_g, and the second layers were e and t_frp, respectively. These observations were consistent with the correlation analyses as presented in Sect. 3. In addition, the regression error and residual values of the predicted P_max using the XGBoost model could be obtained, and the predicted P_max was illustrated in Fig. 6. As seen from this figure, the XGBoost-predicted P_max was generally close to the experimental results, with the R² of 0.994. Thus, the developed XGBoost model could provide the acceptable P_max for FRP-confined corroded RC columns.

Table 6 summarizes the performance metrics of the XGBoost predictions of the training and testing datasets for different models, respectively. As seen from Table 6, for different models, the accuracy of the training dataset was generally superior to the testing one. For example, for the model of all data, values of R², RMSE, MAE, and MAPE with the model for the training process were 0.993, 56 kN, 13.4 kN, and 2%, respectively, whereas those for the testing procedure were 0.978, 122 kN, 70.6 kN, and 7.7%, respectively. This suggests that the developed XGBoost model exhibited both good learning and predicting capacity. Additionally, values of MAPE for all the ML models were smaller than 10%, indicating the prediction accuracy of the developed XGBoost model was excellent.¹⁸ Thus, this further demonstrated that the developed XGBoost model showed superior effectiveness and accuracy in determining P_max of FRP-confined corroded RC columns.

Table 6. Performance metrics of XGBoost models
Models	Training dataset				Testing dataset
	R²	RMSE (kN)	MAE (kN)	MAPE (%)	R²	RMSE (kN)	MAE (kN)	MAPE (%)
All data	0.993	56	13.4	2	0.978	122	70.6	7.7
Concentric	0.991	64.6	15	1.3	0.975	99.7	65.9	7.2
Eccentric	0.999	9.57	4.75	3.8	0.984	31.3	18.5	8

Fig. 7 shows the feature importance based on the developed XGBoost predictive model. This figure indicates how each input variable affected the XGBoost model’s predictions. The feature importance was automatically calculated by the XGBoost algorithm. F scores of the predictive model could be determined by three different evaluation criteria, including (i) weight, (ii) gain, and (iii) cover scores.¹⁸ In specific, the F scores were obtained based on the number of times a feature appeared in a tree (XGBoost weight score), the average gain of splits using the feature (XGBoost gain score), or the average coverage of splits using the feature with coverage being defined as the number of samples affected by the split (XGBoost cover score).¹⁸ There is a direct relationship between feature importance and the value of the F score. As observed from Fig. 7, the feature importance determined by using different evaluation criteria was inconsistent. For example, by using the weight score as the evaluation criterion, the five most significant parameters influencing the predictions of P_max of the columns were η, t_frp, A_g, f_c, and ρ_s, whereas that were A_g, F_bar, e, E_frp, and t_frp; as well as η, E_bar, t_frp, f_c, and F_bar, respectively, by employing the gain and cover scores as the evaluation criteria, respectively. Such an inconsistence in the predicted feature importance from the XGBoost model based on different evaluation criteria could lead to the interpretation and explanations of the model’s predictions being contradictory. However, this is inevitable because the traditional XGBoost models could have inconsistent assessments of feature importance; similar observations were also found in several previous studies.¹⁸^,⁵¹^,⁵⁶^,⁸¹ Thus, an additional analysis of the significance of feature parameters was conducted and presented in the following subsection.

Explanation of the XGBoost model

Fig. 8 shows the SHAP summary plot and the relative feature importance of the input variables. As shown in Fig. 8, the SHAP plot illustrates the SHAP value for each variable, and the color represents the feature value from low (blue) to high (red). In addition, as shown in Fig. 8, the six most significant parameters influencing the prediction of P_max of the columns were A_g, t_frp, E_frp, e, η, and f_c, respectively. This observation agreed well with the correlation analysis in Sect. 3.2, indicating P_max of FRP-confined corroded RC columns mainly relied on these featured parameters (i.e., A_g, t_frp, e, E_frp, f_c, and η, respectively). In addition, as shown in Fig. 8(a), a high value of A_g, t_frp, E_frp, and f_c tended to boost the predictions of P_max of columns up, while low values could decrease the predictions. However, a high value of e and η tended to decrease the predictions, whereas a small value of e and η could increase the predictions.

Fig. 9 presents the explanation of predictions for specimens No. 2 and No. 47, respectively, which were experimentally tested under the concentric and eccentric loads, respectively. As illustrated in Fig. 9, the red arrows indicate the positive SHAP values and features that push up the model’s predictions, whereas the blue arrows denote the negative SHAP values and features that push down the predictions. The base value was the average predicted P_max of the columns over the whole training dataset. As seen from Fig. 9, the XGBoost model’s predicted P_max of specimens No. 2 and No. 47 were 720.74 and 102.78 kN, respectively. The corresponding experimental test results were 720.60 and 101.65 kN, respectively. Hence, the XGBoost model’s predicted P_max of these two specimens agreed well with the test results, indicating the superior prediction effectiveness of the XGBoost model. For specimen No. 2, F_bar and e were the most critical parameters that pushed up the base value, while f_c, t_frp, η, A_g, and E_frp decreased the base value. Similarly, for specimen No. 47, E_frp was the most crucial input variable, increasing the base value, whereas A_g, e, t_frp, F_bar, η, and f_c decreased the predictions.

Verification of the XGBoost predictive model

To further validate the effectiveness and feasibility of the XGBoost model, the predicted P_max of FRP-confined corroded RC columns was compared to those predicted by the empirical models available in several previous studies.⁴³^–⁴⁵ To date, there are many empirical models in predicting P_max of FRP-confined RC columns, but this paper only selected three representative ones⁴³^–⁴⁵ for analysis, which are summarized in Table 7. As seen from Table 7, the impacts of steel rebar corrosion on the mechanical performance of columns were not considered in these selected empirical models. Thus, to consider the corrosion effects on the degradation of mechanical properties of steel bars and FRP confining pressure, the design models in determining the P_max of FRP-confined corroded RC columns should be modified accordingly. According to several previous studies,³⁷^,⁸² degradations of the mechanical performance of steel bars and FRP confining pressure of FRP-confined corroded RC columns can be considered based on the corrosion rate (η): where A_s₀ and ε_rup are the initial cross-sectional area of steel bars and rupture strain of FRP before corrosion, whereas those after corrosion are represented by A_s^* and ε_rup^*, respectively. The statistical results of XGBoost and the empirical models⁴³^–⁴⁵ are presented in Table 8.

(24)

{A_{s}}^{*} = (1 - η) \cdot A_{s 0}

(25)

{ε_{r u p}}^{*} = (1 - 0.462 η) \cdot ε_{r u p}

Table 7. Summary of existing models for predicting the axial ultimate strength of FRP-confined RC columns
Selected models	Cross-section	Model	Supplementary notation
Youssef et al.⁴⁴	Circular	$\frac{f_{c u}}{f_{c o}} = 1 + 2.25 \cdot {(\frac{f_{l}}{f_{c o}})}^{1.25}$	f_l is the lateral confining stress at the ultimate condition of the FRP jacket, which is represented by: $f_{l} = \frac{2 E_{f r p} t_{f r p} ε_{f u}}{D or b}$ k_e is the confinement effectiveness coefficient, which is represented by: $k_{e} = \frac{1 - [\frac{{(b - 2 r_{c})}^{2} + {(h - 2 r_{c})}^{2}}{3 b h}] - \frac{A_{s}}{b h}}{1 - \frac{A_{s}}{b h}}$
	Rectangular	$\frac{f_{c u}}{f_{c o}} = 0.5 + 1.225 \cdot {(\frac{k_{e} f_{l}}{f_{c o}})}^{0.6}$
Wei and Wu⁴⁴	Circular & rectangular	$\frac{f_{c u}}{f_{c o}} = 1 + 2.2 \cdot {(\frac{2 r_{c}}{b})}^{0.72} \cdot {(\frac{f_{l}}{f_{c o}})}^{0.94} \cdot {(\frac{h}{b})}^{- 1.9}$	$f_{l} = \frac{2 E_{f r p} t_{f r p} ε_{f u}}{D \begin{array}{l} \end{array} or \begin{array}{l} \end{array} b}$
Cao et al.⁴⁵	Circular & rectangular	$\frac{f_{c u}}{f_{c o}} = 1 + 8.34 \cdot {(\frac{E_{l}}{E_{c}})}^{1.03} {(\frac{2 r_{c}}{b})}^{0.81} {(\frac{30}{f_{c o}})}^{0.54} {(\frac{h}{b})}^{- 1.9} {(\frac{ε_{f u}}{ε_{c o}})}^{0.82}$	E_l is the confinement stiffness, which is represented by: $E_{l} = \frac{2 E_{f r p} t_{f r p}}{D or b}$

Table 8. Statistics performance metrics of the XGBoost model and existing empirical models
Models	Average	R²	RMSE (kN)	MAE (kN)	MAPE (%)
XGBoost model	1.005	0.994	70.3	46.9	7.2
Youssef et al.⁴³	1.148	0.897	338	250.1	23
Wei and Wu⁴⁴	1.287	0.898	395.5	305	30.6
Cao et al.⁴⁵	1.285	0.895	388.3	298.3	30.4

As seen from Table 8, values of R², RMSE, MAE, and MAPE of the XGBoost model were 0.994, 70.3 kN, 46.9 kN, and 7.2%, respectively, whereas that of the empirical model suggested by Wei and Wu⁴⁴ were 0.898, 395.5 kN, 305 kN, and 30.6%, respectively. Hence, the XGBoost model showed the best prediction results with the largest value of R² and the smallest prediction errors (RMSE, MAE, and MAPE). This suggests that the feasibility and effectiveness of the XGBoost model in predicting P_max of FRP-confined corroded RC columns outperformed these empirical models.⁴³^–⁴⁵

In addition, Fig. 10 displays the comparative performance results of the XGBoost and existing empirical models within a discreteness range of ±10%. As observed from Fig. 10, the XGBoost model exhibited the best prediction performance compared to existing empirical models.⁴³^–⁴⁵ Among these considered empirical models, the one suggested by Youssef et al.⁴³ tended to exhibit better predictions than Wei and Wu et al.⁴⁴ and Cao et al.,⁴⁴ but the most of prediction points generated by using these models were outside the desirable discreteness range (±10%), indicating the significant dispersions of the prediction results. Moreover, it is worth noting that the P_max of FRP-confined corroded RC columns calculated by these empirical models was higher than the experimental ones. This is probably because the modeling of the corrosion effects on FRP-confined corroded RC columns is a complex problem. The simplified analysis of the corrosion effects through degradation of the cross-sectional area of steel bars (Eq. (24)) and reduction of the rupture strain of FRP composites (Eq. (25)) could not be effective and accurate enough.

Additionally, the XGBoost model was also compared with several other ML algorithms, such as the decision tree (DT), random forest (RF), and gradient boosting decision tree (GBDT). Fig. 11 compares the prediction results of the XGBoost model and the other three ML algorithms within a discreteness range of ±10%. It could be observed from Table 9 and Fig. 11 that the RF and GBDT models exhibited good predictive performance in predicting the P_max of the columns. Still, the prediction results of the DT algorithm showed significant discreteness. Also, as seen from Fig. 11, the majority of the prediction points generated through the developed XGBoost model were inside the desirable discreteness range (±10%), whereas the prediction points of the other three ML models showed relatively pronounced dispersions. This further validated the superior effectiveness and capability of the XGBoost predictive model in predicting P_max of FRP-confined corroded RC columns, compared to the other three ML algorithms.

Table 9. Statistics performance metrics of XGBoost and other ML models
Model	Training dataset				Testing dataset				All data
	R²	RMSE (kN)	MAE (kN)	MAPE (%)	R²	RMSE (kN)	MAE (kN)	MAPE (%)	R²	RMSE (kN)	MAE (kN)	MAPE (%)
XGBoost	0.998	56	13.4	2	0.978	122	70	7.7	0.994	70.3	46.9	7.2
DT	0.901	196.8	138.9	17.5	0.878	240.2	152.5	15.4	0.949	206.2	137.9	17.1
RF	0.967	116.2	68.3	8.8	0.940	167.8	100.6	9.8	0.974	148.5	95.3	11.5
GBDT	0.969	112.6	66.3	10.4	0.946	158.6	90.5	10.2	0.982	123.2	71.1	10.4

Conclusions

This study proposed a novel explainable machine learning (ML) model for the prediction of the axial load-carrying capacity (P_max) of FRP-confined corroded RC columns using the XGBoost algorithm and SHAP technique. The explainable XGBoost predictive model was established based on a thorough database of experimental tests for 285 FRP-confined corroded RC columns subjected to concentric and eccentric loadings. 20 parameters were selected as the critical input variables. Then, the SHAP technique was employed for the important evaluation and interpretation of the prediction performance of the model in predicting the P_max of the columns. Additionally, the effectiveness and accuracy of the developed XGBoost predictive model were validated through several empirical prediction models reported in the literature and some popularly used ML algorithms (DT, RF, and GBDT). Finally, the following conclusions are summarized:

A novel, explainable XGBoost decision tree-based ML method was proposed for quantitatively predicting P_max of FRP-confined corroded RC columns. The developed XGBoost predictive model was demonstrated to be capable and effective with good prediction performance and accuracy.
The proposed XGBoost predictive model could achieve good prediction interpretability using the SHAP technique. The feature importance of the selected critical variables could be quantitatively studied, and the most important ones influencing the prediction of P_max of FRP-confined corroded RC columns were A_g, t_frp, E_frp, e, η, and f_c, among the considered input variables.
The developed XGBoost model exhibited excellent prediction performance and accuracy in predicting the P_max of FRP-confined corroded RC columns. Values of R², RMSE, MAE, and MAPE of the XGBoost model were 0.978, 122 kN, 7036 kN, and 7.7%, respectively. The prediction effectiveness and capability of the model in predicting P_max of the columns significantly outperformed those of the existing empirical models. Also, the developed XGBoost predictive model was able to achieve superior predictions than the DT, RF, and GBDT algorithms.
The proposed XGBoost predictive model could provide new insights for addressing traditional engineering issues involving many critical influential parameters. In addition, if the database could be further enriched in the future, this developed XGBoost predictive model should be continuously updated and thereby making its prediction performance and accuracy superior and more reliable.

Although the developed XGBoost predictive model was suitable for predicting the load-carrying capacity of FRP-confined corroded RC columns, it also has several limitations. For instance, the database was constructed from 285 FRP-confined corroded RC columns collected from the existing studies (231 specimens) reported in the literature and those performed by the authors (54 specimens). The completeness of the experimental data, structural dimensions, environmental conditions, non-uniform corrosion effects, testing quality, and distributions of the input parameters play critical roles in the prediction accuracy and effectiveness of the developed XGBoost models. Thus, to further improve the prediction accuracy and effectiveness of the model, the experimental database, input feature variables, and the interactions among these considered variables should be updated and enriched with more test data. In addition, the generalizability of the SHAP explanations and XGBoost predictive results might be limited to the ranges of the input data tested, in addition to the cross-validation process, more advanced techniques, such as the Grid search, random search, and Bayesian optimization methods could be incorporated, to reduce the risk of overfitting. Moreover, the effectiveness and feasibility of the developed XGBoost predictive model were only validated against several empirical prediction models and some popularly used ML algorithms, such as the decision tree (DT), random forest (RF), and gradient boosting decision tree (GBDT). However, the developed XGBoost predictive model should be verified in the future through more advanced interpretable machine learning or deep learning models. Overall, the measured errors of the XGBoost predictions were very low from the perspective of engineering practice. XGBoost is an accurate tree-boosting system and it is designed as a regularized model formalized to control overfitting. Using the trained XGBoost predictive model, the user can theoretically predict the load-carrying capacity of FRP-confined corroded RC columns and other similar problems based on the assembled experimental database, which would have great application potential in practical engineering practice.

References

Alternate Load Paths and Retrofits for Long-Span Truss Bridges Under Sudden Member Loss and Blast Loads. Ph.D. Thesis. Department of Civil Engineering, The City University of New York, New York, NY. Published online 2021.
Blast fragility assessment of aging coastal RC columns exposed to non-uniform CIC attacks using LBE function. J Build Eng. 2023;71(4). doi:10.1016/j.jobe.2023.106510
Consideration of time-evolving capacity distributions and improved degradation models for seismic fragility assessment of aging highway bridges. Reliab Eng Syst Saf. 2016;154(1):197-218. doi:10.1016/j.ress.2016.06.001
Seismic fragility analysis of deteriorating RC bridge substructures subject to marine chloride-induced corrosion. Eng Struct. 2018;155(1):61-72. doi:10.1016/j.engstruct.2017.10.067
Bridge fragility analysis based on an improved uniform design-response surface methodology. J Vib Shock. 2018;37(22):245-254.
Bridge time-varying seismic fragility considering variables’ correlation. J Vib Shock. 2019;38(9):173-183.
Improved time-dependent seismic fragility estimates for deteriorating RC bridge substructures exposed to chloride attack. Adv Struct Eng. 2021;24(3):437-452. doi:10.1177/1369433220956812
Time-dependent seismic fragility assessment for aging highway bridges subject to non-uniform chloride-induced corrosion. J Earthq Eng. 2022;26(7):3523-3553. doi:10.1080/13632469.2020.1809561
Seismic fragility assessment framework for highway bridges based on an improved uniform design-response surface model methodology. Bull Earthq Eng. 2020;18(5):2329-2353. doi:10.1007/s10518-019-00783-1
Effects of various modeling uncertainty parameters on the seismic response and seismic fragility estimates of the aging highway bridges. Bull Earthq Eng. 2020;18(14):6337-6373. doi:10.1007/s10518-020-00934-9
Seismic performance of large rupture strain FRP retrofitted RC columns with corroded steel reinforcement. Eng Struct. 2020;216(6). doi:10.1016/j.engstruct.2020.110744
Experimental investigation of design and retrofit methods for blast load mitigation-A state-of-the-art review. Eng Struct. 2019;190:189-209. doi:10.1016/j.engstruct.2019.03.088
Performance-based probabilistic deflection capacity models and fragility estimation for reinforced concrete column and beam subjected to blast loading. Reliab Eng Syst Saf. 2022;227(7). doi:10.1016/j.ress.2022.108729
Fragility analysis for performance-based blast design of FRP-strengthened RC columns using artificial neural network. J Build Eng. 2022;52(6). doi:10.1016/j.jobe.2022.104364
A state-of-the-art review: near-surface mounted FRP composites for reinforced concrete structures. Constr Build Mater. 2019;209(3):748-769. doi:10.1016/j.conbuildmat.2019.03.121
Effects of corrosion of steel reinforcement on RC columns wrapped with FRP sheets. J Perf Constr Facil. 2009;23(1):20-31. doi:10.1061/(ASCE)0887-3828(2009)23:1(20)
FRP protection and rehabilitation of corrosion-damaged reinforced concrete columns. Int J Mater Prod Technol. 2005;23(3/4):348-371. doi:10.1504/IJMPT.2005.007735
Explainable extreme gradient boosting tree-based prediction of load-carrying capacity of FRP-RC columns. Eng Struct. 2021;245(93). doi:10.1016/j.engstruct.2021.112836
Long-term performance prediction framework based on XGBoost decision tree for pultruded FRP composites exposed to water, humidity and alkaline solution. Compos Struct. 2022;284(5). doi:10.1016/j.compstruct.2022.115184
Long-term monitoring of carbon fiber-reinforced polymer-wrapped reinforced concrete columns under severe environment. ACI Struct J. 2006;103(6):865-873.
Effectiveness of fiber-reinforced polymer in reducing corrosion in marine environment. ACI Struct J. 2007;104(1):76-83.
Carbon fiber-reinforced polymer wraps for corrosion control and rehabilitation of reinforced concrete columns. ACI Mater J. 2002;99(2):129-137.
Effect of confinement using fiber-reinforced polymer or fiber-reinforced concrete on seismic performance of gravity load-designed columns. ACI Struct J. 2004;101(1):47-56.
Comparison of confinement models for FRP wrapped concrete. ACI Struct J. 2005;102(1):62-72.
Circular columns confined with FRP: Experimental versus predictions of models and guidelines. J Composit Constr. 2006;10(1):4-12. doi:10.1061/(ASCE)1090-0268(2006)10:1(4)
Design-oriented stress-strain model for FRP-confined concrete in rectangular columns. J Reinforc Plast Compos. 2003;22(13):1149-1186. doi:10.1177/0731684403035429
Refinement of a design-oriented stress-strain model for FRP-confined concrete. J Compos Constr. 2009;13(4):269-278. doi:10.1061/(ASCE)CC.1943-5614.0000012
General stress-strain model for steel- and FRP-confined concrete. J Compos Constr. 2015;19(4). doi:10.1061/(ASCE)CC.1943-5614.0000511
An experimental study on the retrofitting effects of reinforced concrete columns damaged by rebar corrosion strengthened with carbon fiber sheets. Cement Concr Res. 2003;33(4):563-570. doi:10.1016/S0008-8846(02)01004-9
Seismic behavior of corrosion-damaged reinforced concrete columns strengthened using combined carbon fiber-reinforced polymer and steel jacket. Constr Build Mater. 2009;23(7):2653-2663. doi:10.1016/j.conbuildmat.2009.01.003
Seismic performance of CFRP-retrofitted large-scale square RC columns with high axial compression ratios. J Compos Constr. 2017;21(5). doi:10.1061/(ASCE)CC.1943-5614.0000813
Seismic performance of CFRP-retrofitted large-scale rectangular RC columns under lateral loading in different directions. Compos Struct. 2018;192(1):475-488. doi:10.1016/j.compstruct.2018.03.029
Deformation capacity of FRP retrofitted reinforced concrete columns with corroded reinforcing bars. Eng Struct. 2022;254(11). doi:10.1016/j.engstruct.2021.113834
Performance of corroded rectangular RC columns strengthened with CFRP composite under eccentric loading. Constr Building Mater. 2021;268. doi:10.1016/j.conbuildmat.2020.121134
Analytical analysis of design-oriented models for forecasting the performance of CFRP-confined corrosion-affected concrete columns. Constr Build Mater. 2021;313(6–7). doi:10.1016/j.conbuildmat.2021.125491
Compressive behavior degradation of FRP-confined RC columns exposed to a chlorine environment. Mar Struct. 2022;86(4). doi:10.1016/j.marstruc.2022.103277
Experimental study on the mechanical properties of corroded RC columns repaired with large rupture strain FRP. J Build Eng. 2022;54(8). doi:10.1016/j.jobe.2022.104413
Experimental study on the bond behavior between corroded rebar and concrete under dual action of FRP confinement and sustained loading. Constr Build Mater. 2017;155:605-616. doi:10.1016/j.conbuildmat.2017.08.049
Cyclic bond behaviors between corroded steel bar and concrete under the coupling effects of hoop FRP confinement and sustained loading. Compos Struct. 2019;224(6). doi:10.1016/j.compstruct.2019.110991
Consequences of steel corrosion on the ductility properties of reinforcement bar. Constr Build. Mater. 2008;22(12):2316-2324. doi:10.1016/j.conbuildmat.2007.10.006
Predicting strength and drift capacities in corroded reinforced concrete columns. Constr Build Mater. 2016;115:304-318. doi:10.1016/j.conbuildmat.2016.04.048
Shear strengthening of corroded reinforced concrete columns using pet fiber-based composites. Eng Struct. 2017;153(10):757-765. doi:10.1016/j.engstruct.2017.09.030
Stress-strain model for concrete confined by FRP composites. Compos B Eng. 2007;38(5–6):614-628. doi:10.1016/j.compositesb.2006.07.020
Unified stress-strain model of concrete for FRP-confined columns. Constr Build Mater. 2012;26(1):381-392. doi:10.1016/j.conbuildmat.2011.06.037
Cross-sectional unification on the stress-strain model of concrete subjected to high passive confinement by fiber-reinforced polymer. Polymers. 2016;8(5). doi:10.3390/polym8050186
Explainable machine learning model and reliability analysis for flexural capacity prediction of RC beams strengthened in flexure with FRCM. Eng Struct. 2022;255(1). doi:10.1016/j.engstruct.2022.113903
Prediction of FRP-confined compressive strength of concrete using artificial neural networks. Compos Struct. 2010;92(12):2817-2829. doi:10.1016/j.compstruct.2010.04.008
Strength enhancement modeling of concrete cylinders confined with CFRP composites using artificial neural networks. Compos B Eng. 2012;43(8):990-3000. doi:10.1016/j.compositesb.2012.05.044
Prediction of strength parameters of FRP-confined concrete. Compos B Eng. 2012;43(2):228-239. doi:10.1016/j.compositesb.2011.08.043
Emerging artificial intelligence methods in structural engineering. Eng Struct. 2018;171:170-189. doi:10.1016/j.engstruct.2018.05.084
Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng Struct. 2020;219(6). doi:10.1016/j.engstruct.2020.110927
Data-driven ultimate conditions prediction and stress strain model for FRP-confined concrete. Compos Struct. 2020;242(4). doi:10.1016/j.compstruct.2020.112094
A machine learning-based time-dependent shear strength model for corroded reinforced concrete beams. J Build Eng. 2021;36(4). doi:10.1016/j.jobe.2020.102118
Development of novel design strength model for sustainable concrete columns: a new machine learning-based approach. J Clean Prod. 2022;357(8). doi:10.1016/j.jclepro.2022.131988
Explainable machine learning models for probabilistic buckling stress prediction of steel shear panel dampers. Eng Struct. 2023;288. doi:10.1016/j.engstruct.2023.116235
Machine learning-based prediction for residual bearing capacity and failure modes of rectangular corroded RC columns. Ocean Eng. 2023;281(1). doi:10.1016/j.oceaneng.2023.114701
An artificial neural networks model for the prediction of the compressive strength of FRP-confined concrete circular columns. Eng Struct. 2017;140(6):199-208. doi:10.1016/j.engstruct.2017.02.047
XGBoost algorithm-based prediction of concrete electrical resistivity for structural health monitoring. Autom Constr. 2020;114(8). doi:10.1016/j.autcon.2020.103155
A novel artificial intelligence technique to predict compressive strength of recycled aggregate concrete using ICA-XGBoost model. Eng Comput. 2021;37(4):3329-3346. doi:10.1007/s00366-020-01003-0
Predicting the compressive strength of concrete from its compositions and age using the extreme gradient boosting method. Constr Build Mater. 2020;260(5). doi:10.1016/j.conbuildmat.2020.119757
Multiparameter identification of bridge cables using XGBoost algorithm. J Bridge Eng. 2023;28(5). doi:10.1061/JBENF2.BEENG-6021
Prediction of axial compressive capacity of CFRP-confined concrete-filled steel tubular short columns based on XGBoost algorithm. Eng Struct. 2022;260(2). doi:10.1016/j.engstruct.2022.114239
A unified approach to interpreting model predictions. ArXiv: 170507874 [Cs, Stat]. Published online 2017.
Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng Struct. 2020;219(6). doi:10.1016/j.engstruct.2020.110927
Explainable machine learning models for punching shear strength estimation of flat slabs without transverse reinforcement. J Build Eng. 2021;39(2). doi:10.1016/j.jobe.2021.102300
Experimental evaluation of FRP jackets in upgrading RC corroded columns with substandard detailing. Eng Struct. 2004;26(6):817-829. doi:10.1016/j.engstruct.2004.02.003
Effect of corrosion-damaged RC circular columns enveloped with hybrid and non-hybrid FRP under eccentric loading. J Compos Mater. 2015;49(18):2265-2283. doi:10.1177/0021998314545187
Performance of corroded rectangular RC columns strengthened with CFRP composite under eccentric loading. Constr Build Mater. 2021;268. doi:10.1016/j.conbuildmat.2020.121134
Post-repair performance of eccentrically loaded RC columns wrapped with CFRP composites. Cement Concr Compos. 2008;30(9):822-830. doi:10.1016/j.cemconcomp.2008.06.009
Analytical analysis of design-oriented models for forecasting the performance of CFRP-confined corrosion-affected concrete columns. Constr Build Mater. 2021;313(6–7). doi:10.1016/j.conbuildmat.2021.125491
Eccentric compressive behavior of steel fiber-reinforced RC columns strengthened with CFRP wraps: experimental investigation and analytical modeling. Eng Struct. 2021;226(2). doi:10.1016/j.engstruct.2020.111389
Behaviour of CFRP wrapped RC square columns under eccentric compressive loading. Structures. 2019;20(4):309-323. doi:10.1016/j.istruc.2019.04.012
Buckling of steel reinforcing bars in FRP-confined RC columns: an experimental study. Constr Build Mater. 2017;140(2):403-415. doi:10.1016/j.conbuildmat.2017.02.149
Studies on the Mechanical Properties of Corroded Reinforced Concrete Columns Confined with CFRP. Dissertation. Harbin Institute of Technology. Published online 2014.
Calculation Method of Compressive Properties and Bearing Capacity of Damaged Reinforced Concrete Columns Reinforced by FRP Strip. Dissertation. Zhenzhou University. Published online 2018.
Experimental study on axial compression of corroded reinforced concrete columns strengthened with FRP strips under erosion environment. Acta Mater Compos Sin. 2020;37(8):2015-2028.
Study on crushing resistance performance of corroded reinforced concrete columns confined by fiber reinforced polymer. Hongshui River. 2013;32(5):36-39.
Experimental Study on the Mechanical Properties of FRP Reinforcement Corroded Reinforced Concrete Column and Corroded Steel Bar Buckling Characteristics. Dissertation. Shenzhen University. Published online 2015.
Performance Investigation of FRP Strengthened Reinforced Concrete Circular Columns Under Axial Compressive Loading. Dissertation. Zhenzhou University. Published online 2014.
A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput Stat Data Anal. 2020;152(2). doi:10.1016/j.csda.2020.107043
Consistent individualized feature attribution for tree ensembles. ArXiv Preprint ArXiv:180203888. Published online 2018.
Residual capacity of corroded reinforcing bars. Mag Concr Res. 2005;57(3):135-147. doi:10.1680/macr.2005.57.3.135

Explainable machine learning model for load-carrying capacity prediction of FRP-confined corroded RC columns

Authors

DOI:

Keywords:

Abstract

Downloads

Introduction

Methodology

XGBoost algorithm

Explaining the XGBoost model using the SHAP technique

Determination of the XGBoost Predictive Model

Experimental database

Determination of the input variables

Model training and performance evaluations

Model tuning and cross-validations

Prediction Results and Discussions

Performance evaluation of the XGBoost model

Explanation of the XGBoost model

Verification of the XGBoost predictive model

Conclusions

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information