A triphenylmethane reductase derived from
Triphenylmethane dyes such as malachite green and crystal violet are extensively applied in the textile industry for dyeing [
A specific enzyme, triphenylmethane reductase (TMR), was first discovered in
Although numerous multienzyme systems have been broadly applied in biosensors [
In the present work, a thermalstable GDH [
A selfsufficient bienzyme biocatalytic system composed of BzGDH, CsTMR, NAD, and glucose was constructed for dye decolorization.
It is worth to point out that the bienzyme catalytic system showed obvious product inhibition, especially when a high proportion of BzGDH was involved (
In general, the multiple linear regression method has been often applied to exploring the linear relationship between independent variables and dependent variables. In this study, the corresponding linear model obtained using the entire dataset with three input variables was as follows:
This linear model was then utilized to predict the dye decolorization efficiency. The value of correlation coefficient R^{2} was calculated as 0.5421 for the best fit of experimental versus predicted values, with an equation of
To determine the best tree number to be adopted in the modeling stage, all samples were modeled using the random forest algorithm with different numbers of trees, which were set from 10 to 1000 with intervals of 10. Mean square error (MSE) and correlation coefficient (R^{2}) were used as criteria to determine the optimum numbers of trees. According to
Since ANN has been proven to be a robust strategy to unravel the relationships among variables, especially for nonlinear relationships, we adopted BPANN in this study to explore the response of decolorization efficiency to the dependent variables of the bienzyme biocatalytic system. The primary goal of network training was to minimize the error function MSE by searching for a weight matrix that could reproduce the predicted outputs as equal or close to the experimental values. To avoid overfitting, the optimum number of hidden nodes was determined by 10fold crossvalidation technique. Both the MSE and R^{2} between the predicted and experimental values of the training, validation and test subsets suggested that network with six neurons in the hidden layer had the best performance, with overfitting avoided (
Generally, a good model should have considerable generalization capability; to evaluate the generalization capability of a model fairly, the test dataset should be independent of the training dataset. As indicated in
The accuracy of the prediction of the constructed models was further estimated by different statistical parameters including MSE and R^{2}, mean absolute error (MAE, Equation (5)), and mean relative error (MRE, Equation (6)), respectively. These statistical parameters also confirmed the best predictive capability of ANN (
The weight between two artificial neurons is analogous to the synapse strength between axon and dendrite in real biological neurons. Consequently, each weight of the neural network determines the percentage of the signal strength of an input neuron that will be transmitted to the output neuron. The neural network weight matrix (
To determine the response profile of the output variable to the input variables, several types of sensitivity analysis have been proposed [
To obtain a panoramic view of the response of the decolorization efficiency to the involved variables including substrate, product, and ratio of the two enzymes, a threedimensional map was generated by MATLAB. As shown in
The gene encoding for CsTMR derived from
The recombinant cells were cultured in a 500 mL flask containing 100 mL of LuriaBertani (LB) medium at 37 °C, with 50 μg/ml of kanamycin added. IPTG (Isopropyl βd1thiogalactopyranoside) of 0.5 mM was added to the medium for induction at 25 °C for 8–12 h when the absorbance at 600 nm of the culture reached 0.5–0.8. The recombinant cells were collected by centrifugation at 10,000×
The activity of BzGDH was determined by measuring the OD_{340} of NADH in 100 mM of phosphate buffer (pH 8.0) with 200 mM of glucose and 1 mM of NAD contained. One unit of BzGDH activity was defined as the amount of the enzyme required to produce 1 μmol of NADH per minute. The activity of CsTMR was assayed by monitoring the OD_{616} of malachite green in 100 mM of phosphate buffer (pH 7.0) containing 200 μM of NADH and 20 of μM malachite green. One unit of CsTMR activity was defined as the amount of the enzyme required to degrade 1 μmol of malachite green per minute. All measurements were conducted at 25 °C.
Fedbatch experiments were conducted in a 50 mL beaker containing 200 mM of glucose, 1 mM of NAD, and 3 µM of malachite green, with a magnetic stirring apparatus at 25 °C. The molar ratios between CsTMR and BzGDH were set at 1:1, 1:5, 1:10, 5:1, and 10:1. Malachite green was provided periodically to reload the same concentration of dyes as the initial concentration. For each reactor, 15 batches were performed, and time intervals between 2 successive batches were recorded. Residual malachite green was measured after every batch reaction. Dye decolorization rate
The molar ratio between CsTMR and BzGDH and the concentrations of substrate and product were treated as independent variables
The random forest (RF) algorithm proposed by Breiman has been extensively used for classification and regression based on ensembles of a large number of individual decision trees [
Finally, all samples were randomly divided into training dataset and test dataset, and the training dataset was employed to train an RF model using the best tree number. The test dataset was employed to evaluate the generalization capability of the final RF model.
To describe the kinetic behavior of this bienzyme system, a threelayered feedforward artificial neural network model using the backpropagation algorithm (BPANN) was adopted to explore the relationship among enzymes, the concentrations of substrate and product, and dye decolorization rate. Seventyfive data sets obtained from fedbatch trials were used to train a BPANN model using MATLAB R2015a (The MathWorks, Inc., Natick, Massachusetts, United States). The datasets were normalized using Equation (5) to generate data in the range of −1.0 to 1.0:
The number of neurons in the hidden layer of ANN was set at 210, and a 10fold crossvalidation technique was applied for determining the best number of hidden neurons, to avoid overfitting. In this method, the whole datasets are divided into 10 subsets randomly, one subset is discarded, and the network is trained with the residual subsets and then applied for predicting the discarded subsets. The procedure was repeated for the entire datasets. MSE and R^{2} were used to estimate the performance of the trained BPANN with different hidden neurons.
Finally, all samples were randomly divided into training dataset and test dataset, and the training dataset was further divided into training (60%), validation (20%), and test subsets (20%) to train a BPANN model using the hidden neuron with the best performance in crossvalidation. The initial test dataset was employed to evaluate the generalization capability of the final ANN model.
The prediction accuracy of the constructed models was evaluated using different statistical parameters including
The visualization method, the neural interpretation diagram (NID) proposed by Özesmi [
The relative importance of the input variables on the output was estimated by Garson’s algorithm [
The weight term between hidden and output layers was eliminated in the simplification process from Equation (8) to Equation (9), which could result in misunderstanding the contribution of the input variables to the outputs. To estimate the importance of variables accurately, a modified Garson’s algorithm [
Since Garson’s algorithm adopts the absolute values of weights and omitted the opposite directions of weights, Olden et al. proposed the connection weight approach (CWA, Equation (11)) to more precisely estimate the contribution of input to the output, which uses raw hiddeninput and hiddenoutput connection weights, providing the most accurate quantification variable importance over other commonly used approaches [
Sensitivity analysis was performed according to Lek’s algorithm [
In the present study, a robust cofactor selfsufficient bienzyme biocatalytic system for dye decolorization was successfully constructed. The performance of the decolorization process was also modeled by employing MLR, RF, and ANN algorithms. Evaluation of these models suggested that a threelayered BPANN model with six hidden neurons was capable of predicting the dye decolorization efficiency with the best accuracy. Weights analysis of the ANN model showed that the ratio between two enzymes seemed to be the most influential factor, with a relative importance of 54.99% in the decolorization process. The modeling results confirmed that the neural networks could effectively reproduce experimental data and predict the behavior of the decolorization process.
H.D. conceived and designed the experiments; H.D. performed the experiments; H.D., W.L., Y.Y. and B.C. analyzed the data; H.D., W.L., Y.Y. and B.C. contributed reagents/materials/analysis tools; H.D. wrote the paper.
This research was funded by National Key R&D Program of China (2018YFC1406704, 2018YFC1406701), National Natural Science Foundation of China (91851201), Qingdao National Laboratory for Marine Science and Technology (QNLM2016ORP0310) and Youth Innovation Fund of Polar Science (201602).
The authors declare no conflict of interest.
Scheme of the bienzyme dye decolorization system constructed in this study.
Performance of the selfsufficient bienzyme biocatalytic system for dye decolorization. (
Mean square error (MSE) and correlation coefficient (R^{2}) of models trained by random forest with the different number of trees. OOB: outofbag.
Training, validation, test, and interpretation of the neural network. (
Comparisons between experimental and predicted values of different models. (
Sensitivity analysis for the variables of modeled neural networks. (
The response of decolorization rate to changes in substrate, product, and bienzyme ratio.
The apparent decolorization rates of the bienzyme catalytic system.
Apparent Decolorization Rates (μmol h^{−1})  

Molar ratio of CsTMR/BzGDH  
1:10  1:5  1:1  5:1  10:1  
Initial  1.65  2.01  1.25  0.37  0.17 
Average  0.28  0.35  0.23  0.23  0.17 
Statistical parameters for comparison of different models ^{a}.
Parameters  MLR  RF  ANN  

Train  Test  Train  Test  Train  Test  
MSE  0.0511  0.0706  0.0383  0.0419  0.0013  0.0090 
MAE  0.1474  0.1708  0.0974  0.1031  0.0270  0.0487 
MRE  48.7130  47.7690  32.9363  24.0703  11.1586  13.2349 
R^{2}  0.5552  0.5725  0.7377  0.7602  0.9867  0.9527 
^{a} MLR, multiple linear regression; RF, random forest; ANN, artificial neural network; MSE; mean square error; MAE, mean absolute error; MRE, mean relative error; R^{2}, correlation coefficient.
Weight matrix of neural network ^{1}.
W_{i}  W_{o}  

Neuron  Variable  Bias  Neuron  Weight  
Ratio  Substrate  Product  
1  −1.0584  −5.3016  −0.5183  6.9997  1  0.0944 
2  −5.3136  −32.2206  −3.3973  −17.0822  2  0.1495 
3  −2.0184  1.6110  −5.7296  1.1000  3  0.0555 
4  −0.7781  −4.5670  13.1172  7.7110  4  −0.3755 
5  −0.7376  −5.0937  −0.4546  −5.3149  5  0.6968 
6  2.2761  1.3407  0.1931  5.0204  6  12.8454 

−12.5429 
^{1} W_{i}: weights between input and hidden layers; W_{o}: weights between hidden and output layers.
Importance of input variables on the output layer.
Variables  Importance  

Garoson ^{1} (%)  Garson_{mod} ^{2} (%)  CWA ^{3}  
Ratio  20.94  54.99  28.01 
Substrate  52.33  37.83  10.16 
Product  26.73  7.19  3.64 
^{1} Garson’s algorithm [