Accurate and controllable regulatory elements such as promoters and ribosome binding sites (RBSs) are indispensable tools to quantitatively regulate gene expression for rational pathway engineering. Therefore, de novo designing regulatory elements is brought back to the forefront of synthetic biology research. Here we developed a quantitative design method for regulatory elements based on strength prediction using artificial neural network (ANN). One hundred mutated Trc promoter & RBS sequences, which were finely characterized with a strength distribution from 0 to 3.559 (relative to the strength of the original sequence which was defined as 1), were used for model training and test. A precise strength prediction model, NET90_19_576, was finally constructed with high regression correlation coefficients of 0.98 for both model training and test. Sixteen artificial elements were in silico designed using this model. All of them were proved to have good consistency between the measured strength and our desired strength. The functional reliability of the designed elements was validated in two different genetic contexts. The designed parts were successfully utilized to improve the expression of BmK1 peptide toxin and fine-tune deoxy-xylulose phosphate pathway in Escherichia coli. Our results demonstrate that the methodology based on ANN model can de novo and quantitatively design regulatory elements with desired strengths, which are of great importance for synthetic biology applications.
Aforementioned quantitative prediction models commonly use linear regression analysis or its derivative methods (e.g., linear correlation of data after logarithm processing) to simplify the complex process for model construction. Thus, it is hard to well reflect the complex non-linear relationship between the sequences and their strengths, which results in a low prediction accuracy and poor generality. In addition, these models are supposed to have the potential, but have not been further developed into in silico methods for de novo design of elements with desired strength. In contrast to the above methods, we introduced a non-linear modelling methodology, artificial neural network (ANN), to address these issues. ANN is essentially a mathematical model constructed by simulation of the structure and function of human brain neural networks [15], [16]. It can be adapted to continuously change the network structure based on input/output information during learning phase, which could reflect the non-linear relationships between quantitative characteristics and related qualitative performance in complex phenomena. Thus, ANNs have been widely used to various biological research fields such as protein structure and stability prediction [17], [18], [19], RNA secondary structure prediction [20], as well as promoter recognition and structure analysis [21], [22], [23], [24], [25], [26], [27], [28]. In this work, we constructed a high-performance ANN model to directly predict the strength of regulatory element from its sequence. Based on this model, we further developed an effective computational platform for quantitative design of novel regulatory elements with desired properties for synthetic biology applications.
introduction to Neural Networks using MATLAB 6.0.rar
Machine learning is a branch of computer science that has been extensively used in pre-diagnosis research [3]. The development of algorithms that can learn from their mistakes and predict future events is quite appealing. Rather than just following preprogrammed instructions, these algorithms build a model from input samples to make predictions or judgments. Machine learning has many subareas, such as artificial neural networks (ANNs), convolutional neural networks (CNNs), and ANN with deep learning architecture (or deep neural networks) [4]. The first is a set of computational models inspired by the nervous system that can learn and recognize patterns [5].
Convolutional neural networks (CNN) have been successfully used in traditional data mining environments [59]. However, a CNN requires a large amount of labeled training data to be effective, which may not be available. The paper by Oquab [81] proposes a transfer learning method of training a CNN with available labeled source data (a source learner) and then extracting the CNN internal layers (which represent a generic mid-level feature representation) to a target CNN learner. This method is referred to as the transfer convolutional neural network (TCNN). To correct for any further distribution differences between the source and the target domains, an adaptation layer is added to the target CNN learner, which is trained from the limited labeled target data. The experiments are run on the application of object image classification where average precision is measured as the performance metric. The Oquab [81] method is tested against a method proposed by Marszalek [73] and a method proposed by Song [110]. Both the Marszalek [73] and Song [110] approaches are not transfer learning approaches and are trained on the limited labeled target data. The first experiment is performed using the Pascal VOC 2007 data set as the target and ImageNet 2012 as the source. The Oquab [81] method outperformed both Song [110] and Marszalek [73] approaches for this test. The second experiment is performed using the Pascal VOC 2012 data set as the target and ImageNet 2012 as the source. In the second test, the Oquab [81] method marginally outperformed the Song [110] method (the Marszalek [73] method was not tested for the second test). The tests successfully demonstrated the ability to transfer information from one CNN learner to another. 2ff7e9595c
Comments